Hi,
We have had the following error in one of our production clusters..
Insufficient resources to satisy HA failover level on cluster XXXX in XXXX
Unable to contact a primary HA agent in cluster XXXX
Only 1 host is in maintenance mode. There is sufficient capacity. Further investigation revealed that somehow the HAPrimaryVMHost is the host in maintenance mode!
Get-Cluster XXXX | Get-HAPrimaryVMHost | format-wide
XXXXVMHost.xxxx (this is the host in maintenance mode, go figure!
I have tried to reconfigure each of the remaining hosts in the cluster for HA, this fails on each host with the error 'Unable to contact a primary HA agent in cluster XXXX'. Removing HA from the cluster and then re-enabling this works for a minute then fails and the cluster shows the same problem and all the hosts show the same problem.
I tried accessing the VMware AAM console on these ESXi4.0 systems to no avail as I was going to try promoting a new master, perhaps this is only available on ESX4.1 or later
[Err:2\ fopen failed
[Err:15015} Cannot open file
/opt/vmware/aam/bin
. Seems the 'ftcli' is what is called is /var/log/vmware/aam/aam_config_util_listnodes.log but you don't seem to be able to run this command outside a script, anyhow this file fails to even 'listnodes' with a Trying to taslk to self 'xxxmyhostnamexxx', yet there is no sites file.
I'm kind of at a loss as to how to proceed with this now.