This user asked for advice on the issue of VM’s becoming unreachable due to Horizon Agent issues, experiencing freezing applications and an inability to console or access the machine once it has reached this state. Causes such as port exhaustion and DNS issues were discussed, as well as suggestions on how to access the system and application log files, and to generate an OS crash dump. It was also suggested to review reassess the configuration and consult other Horizon tech communities. ControlUp historical data and the support article https://support.controlup.com/ docs/controlup-agent-temporary-mode-and-how-to-change-it?highlight=IsTemporary#what-is-the-fix-for-this-problem were also included.
Read the entire ‘Investigating Unreachable ControlUp Agent VMs’ thread below:
Hello, this is my first post here š. My organization has recently became ControlUp customers. We have purchased ControlUp to gain more insight into the performance of VMs in our environment. One of the biggest issues we have been encountering are the VM’s becoming agent unreachable. Once they have become unreachable we are not able to remote manage or access any information from that machine, but they remain ping-able. Can anyone suggest a way to view historical data in Solve\ControlUp Console to help us determine the cause of the VM crash?
With āagent unreachableā you are referring to ControlUp Agent?
Did the whole machine crashes when itās unreachable for ControlUp?
Welcome to the ControlUp family š
A common cause for this is port exhaustion… I would check the number of port used with the netstat -anob command at a regular interval to see if a software/tool is taking a bit too much ports
There is a nice MS article about it https://learn.microsoft.com/en-us/troubleshoot/windows-client/networking/tcp-ip-port-exhaustion-troubleshooting
This can also be the gremlin sometimes. https://support.controlup.com/docs/controlup-agent-temporary-mode-and-how-to-change-it?highlight=IsTemporary#what-is-the-fix-for-this-problem
What kind of desktops are these, persistent or non persistent and how do you deploy our agent?
On the other hand can you rdp into the machines or are they really unreachable? Both Luke and I are thinking that it might be an agent issue and want to exclude that
@member The Horizon Agent Becomes Unreachable, sorry for not properly clarifying!
One moment and I will provide a synopsis of our environment š
@member check it it is a DNS issue that can also make the agent become unavailible
ssimple nslookup will tell if it is an issue
We use VMware Horizon Instant Clones. Our clones are provisioned with 6 vCPU and 12 GB RAM, 100 GB Disk, and NVIDIA GRID VGPU. We provide personalization via 20gb writable volume and supplement applications via app stacks. We also use some UEM configs to persist some personalization if the writable volume needs to be recreated. This issue has been happening before the install of ControlUP (Major reason for our purchase. Also, cuAgent is running in service mode ļ) When a machineās Horizon Agent becomes unreachable we are unable to RDP, it acts as if the OS has crashed and we are unable to console to that machine due to NVIDIA being attached. Unfortunately, if you power off the machine to detach NVIDIA the machine deletes from VCenter. We have tried placing the machines in maintenance mode before trying this process and it is hit or miss if the machine remains after powering off. In instances when where weāre able to successfully remove NVIDIA we were not able to sign in via that console it would just spiral at Welcome. Once the machine enters this state all agents become unreachable even the ControlUP agent and you are not able to initiate guestOS shutdown. The only way we can delete the machine is to power off or remove from VCenter. We have updated our Horizon Agent from 7.13.2 to 2206 in hopes we might find some relief, but the issue remains. Iāve run netstat -anob and consulted with one of our Net Admins to confirm Port Exhaustion is not occurring. We donāt believe it is a DNS issue because the IP matches VCenter when you ping the machine by name and it responds back. Before the machine dies users report applications freezing and then the system becoming completely unresponsive. We are hoping to find a way to determine what might be happening when this occurs using historical data on the machines if that is a possibility. Thank you all for your help and warm welcoming!!!!
if the OS is crashed, the Controlup agent will not be running. so will not provide you with any insights into this issue. Can a D drive be added to the VDIs and the windows event logs be moved to the d drive that way you can mount the drive later and see if the logs can give u any insights
also possble windows crash dump might be put on the D drive as well and that can be sent to microsoft to debug for u
@member says the machine is still ping able when this issue occurs, so I donāt think Windows is crashing, but it seems like most subsystems are hanged when the OS reaches this state. There are many possible root causes for such an issue, I would look on ControlUp historical data to see if any process is misbehaving prior to this issue
In addition, you can share the system / application log files in here from an affected machine, noting the āhangā start time
Finally, in such cases, it may be needed to generate a full OS crash dump and ask MS escalation support to analyze it and see what is causing the hang
@member – I guess you can also check in other Horizon tech communities if others are seeing a simile Horizon hang issue with this config?
Also suggest to review this KB https://kb.vmware.com/s/article/90271
Continue reading and comment on the thread ‘Investigating Unreachable ControlUp Agent VMs’. Not a member? Join Here!
Categories: All Archives