Citrix XenServer

Can’t console to frozen XenServer host but VMs are still running

on

Let’s say a host in your pool won’t restart a VM and freezes half way (that wonderful yellow icon). If you hit the console tab, it might be blank. If you hit the console tab of the host, it might also be blank. If you SSH in it may connect, but you can’t pass any xe commands. It just sits. If you attempt to migrate or stop a VM, it hangs. The host is essentially frozen but VMs are still running on it just fine.

This is all a pretty good sign the XAPI service on the host is hung up. XAPI is the XenServer management toolstack which pretty much controls everything on the XenServer host. If the “XenAPI” toolstack is hosed, XenCenter can’t talk to the host and you probably won’t be able to pass any xe commands. The Xen API is what controls everything at the host layer. Quick way to troubleshoot this:

1. SSH into the host with the issue.

2. Type:

df -h

which will show the disk space usage on the file system. The “-h” switch will display it in gigabytes. Much easier to read. We need to check the root partition and see if it is full. This is typically 4 GB and can be filled up by logs which may cause the XAPI service to stop. If the XenServer root disk is full, you will probably see it drop out of XenCenter because XAPI is stopped. You won’t be able to restart the XAPI service until you free up some space. Here is an example of the root being 100% full:

Extra tip, once you log in to one XenServer host, you can check other hosts remotely without having to SSH into each one in a different terminal. Just type:

ssh df -h

3. If the root is full like above, type:

cd /var/log

then

ls

to list the logs. Type:

du –ksh *.*

to list the logs with the sizes. If you find one that is too big, delete it:

rm .log

From here you can skip ahead below to step 6 and try restarting XAPI.

Also, you might want to consider moving your logs off to a different volume. If you fill your dom0 root, you’re basically hosing the XenServer. Citrix has a good article on how to move the /var/log directory to a different volume here:

http://support.citrix.com/article/CTX130245

or retain fewer logs by editing logrotate.conf here:

http://support.citrix.com/article/CTX131619

4. If your root is not full, the next thing you probably want to do is disable HA. You can do this in the XenCenter console or you can just type:

xe pool-ha-disable

or if you want to disable HA on a host (you’ll have to run this on each host though):

host-emergency-ha-disable force=true

5. After disabling HA, restart the toolstack:

xe-toolstack-restart

This will disconnect all the hosts in the pool in XenCenter but don’t panic. Give it 10-20 seconds, once the toolstack is restarted the hosts will all reconnect to XenCenter. All pending actions like reboots, migrations, etc. will all stop when restarting the tool stack so you have a clean slate.

6. You should be able to console into your host with the issues now. Type:

service xapi status

and see if it is running. If you want to see how taxed XAPI is, type:

top

to see all the running processing. If XAPI is taking up 40% CPU or more, that is a good indication something is hung up on it.

If XAPI is not running or is very taxed, type:

service xapi restart

if it hangs at “Stopping xapi” or “Starting xapi”, you may need to kill the process.

Type:

kill

using the process ID from when you ran “service xapi status” or “top”. Then service xapi status to verify all xapi processes have stopped. Then you can type:

service xapi restart

again if it didn’t automatically try and start already. Eventually it will say:

Starting xapi: ....start-of-day complete. [ OK ]

and you should see the host pop back in your XenCenter console. If you go back and run top, xapi should be taking up around 1% or less CPU.

You can type:

xe task-list

to see all the running tasks which shouldn’t be much at this point. Don’t forget to re-enable HA after you’re done. Hope this helps someone.

About Jason Samuel

Jason Samuel is an Infrastructure Architect in Houston, TX with a primary focus on mobility, virtualization, and cloud technologies from Citrix, Microsoft, & VMware. He also has an extensive background in web architecture and information security. He is certified in several technologies and is 1 of 50 people globally that is a recipient of the prestigious Citrix Technology Professional (CTP) award. He is 1 of 28 people in the world that is an Atlantis Community Expert (ACE). He is a featured author on DABCC which provides the latest IT Community News on Cloud, Data Center, Desktop, Mobility, Security, Storage, & Virtualization. In his spare time Jason enjoys writing how-to articles and evangelizing the technologies he works with.

Recommended for you

13 Comments

  1. kb

    January 19, 2012 at 7:41 PM

    Awesomeness! Thanks for this post, it was exactly what I needed.

  2. Brett

    January 27, 2012 at 12:22 AM

    Great article. thanks!

  3. Robb

    February 24, 2012 at 2:45 PM

    Thanks for posting this. I have, what I am sure, is a disk full issue, but I’m unable to ssh or console into the server so I can’t remove any logs to free up disk space.

    Any thoughts how I can move forward? Will logrotate free up any space when it runs again at 4am, or will it fail as it doesn’t have anywhere to compress the files too?

    And what are the reboot options? Good idea, bad idea? Is there a safe mode in XenServer 6 that might help?

    Thanks in advance for any help/suggestions/thoughts…

    -r

  4. Cliff Hogan

    March 3, 2012 at 7:37 PM

    Excellent article. Although while encountering the issue I diagnosed it correctly based on http://support.citrix.com/article/CTX128316, this blog post provides what is missing from the Citrix KB article, i.e. how to start XAPI, which is essential if the host, the pool master in this case cannot be restarted. The kill command was the one that saved the day

  5. Enzo

    September 17, 2012 at 2:23 PM

    As you said, you won’t be able to issue xe commands, so you won’t be able to stop HA in step 4. I have found that you can sometimes kill the stunnel processes, sometimes those are defunct and then you can restart xapissl and then do a tool stack restart:

    killall -9 xapi
    service xapissl restart
    xe-toolstack-restart

    Back in business.

  6. alyami

    December 26, 2012 at 7:19 AM

    Thanks a million, greate help

  7. Jonathan James

    March 20, 2013 at 3:04 AM

    Excellent post! Many many thanks for rescuing me!

  8. Frank

    June 20, 2013 at 4:17 AM

    It solves my problem.
    Thanks a lot.

  9. Anu Skariah

    May 20, 2014 at 12:03 PM

    Thanks Man. Awesome article.. Solved my problem.

  10. Bill

    August 15, 2014 at 10:15 AM

    Works great, thanks

  11. Danie vi

    September 13, 2014 at 1:53 AM

    How can we extend the root disk ?
    Have any solutions ?

  12. kasch

    October 6, 2014 at 3:33 AM

    Great! Many many thanks for rescuing me!

  13. Kelly M

    May 1, 2015 at 10:36 AM

    You, Sir, are pure awesomeness! I was quite perplexed as to how my server showed to be powered off but yet my VM’s were still running. The logs were not full, but the xapi service had stopped/failed and I couldn’t figure out how to get to the server to do any maintenance. The ssh access, then toolstack reset and restarting xapi solved the issue.

Leave a Reply

Your email address will not be published. Required fields are marked *