Script possibly running out of resources

I have a script which rebalances the hosts in a vCenter environment. It finds the smallest (memory-consuming) VM and then searches for the host with the least memory usage and vMotions the VM to that host.

The script runs really well in most cases, however it tends to fail after a while with the error:

ERROR: 1/05/2025 4:42:17 PM     Move-VM         Operation is not valid due to the current state of the object.
ERROR: At C:\Users\jmilano\Documents\PowerShell_Scripts\Hosts\Rebalance-Cluster.ps1:280 char:7
+             Move-VM -VM (Get-VM -Name $targetvm) -Destination (Get-VM …
+             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Then, after this error comes up, it keeps erroring for all the next VMs in the list. If I close the Powershell 7 window and open a new one, it runs again and then after a while fails again.

I can manually vMotion the VMs which the script is erroring on and I can even see the request for vMotion in the vCenter tasks even when it fails and the VM is actually migrated in vCenter without issues!?

Is there a way I can check for memory usage in my PowerShell window after each script run?

vCenter 7.0u3 Build 24322018
$PSVersionTable.PSVersion

Major Minor Patch PreReleaseLabel BuildLabel


7 5 0

Define “… a while …”!

Have you tried to handle the error with a try-catch-block?

Error handling is via a Try-Catch. A “while” is probably running for about 5 minutes. The Try-Catch receives the error after the Move-VM command runs eventhough the Move-VM command was successful.

:thinking: … wait … what? … so you’re handling the error? … how? … if it’s still breaking the script?

OK … I expected something way bigger …

The cause may be something in vCenter - not in PowerShell.

What is the current state of the object in that moment? Have you tried to insert a waiting time?

It’s not a vCenter issue- the VM vMotions when the Move-VM command is issued by the script. I can manually move the VMs in vCenter. There are no errors in vCenter.
Once the error occurs in the script, no more VMs will vMotion via the script, it keeps giving the same error for all VMs. I have to close the current PS window and open a new one and the script runs OK for the next few minutes.

This is the section of my main code:

Try
    {
    CheckvCenterConnection
    Write-Host "Retrieving cluster"
    # Get the cluster object.
    $oCluster = get-Cluster -Name $Cluster -ErrorAction Stop
    Write-Host "Will try to remediate the imbalance on hosts in the cluster"
 
    # Find VMs that can fill the delta from the host with the most usage.
    Write-Host "Gathering VM data from the cluster."
    # Get all VMs from the cluster. The script will check this list and remove any vMotioned VMs until it runs out.
    $AllClusterVMs = Get-VM -Location $oCluster
    Write-Host "VM data has been collected."
    Write-Host "I will try to balance the cluster until the hosts are within $($HostBalanceP)% of each other or"
    Write-Host "until I have tried to move all $($AllClusterVMs.Count) VMs in this cluster."
    # Run through VMs and migrate the smallest ones
    $i = 1
    # This variable is used to exit the loop when set to False.
    $nomigration = $true
    Write-Host "Now to balance the load in the cluster." -ForegroundColor Blue
    while ($nomigration)
        {
        Write-Host
        Write-Host "+-----------------------------------------------------------------------------------+" -ForegroundColor Yellow
        Write-Host "| Try Number: $i" -ForegroundColor Yellow
        Write-Host "+-----------------------------------------------------------------------------------+" -ForegroundColor Yellow
        # Get all the required stats.
        $HostMinMem, $HostMaxMem, $RAM_Max_Val, $RAM_Min_Val, $ClusterIsBalanced = Get-HostMemUsage -Cluster $oCluster
 
        Write-Host "---------------------------------------------------------------" -ForegroundColor Cyan
        Write-host "Host with max mem: $HostMaxMem. Memory value= $RAM_Max_Val)" -ForegroundColor Cyan
        Write-host "Host with min mem: $HostMinMem. Memory value= $RAM_Min_Val)" -ForegroundColor Cyan
        Write-Host "---------------------------------------------------------------"
 
        If ($ClusterIsBalanced -or $AllClusterVMs.Count -lt 2)
            {
            # Cluster looks OK.
            Write-Host "Cluster seems to be balanced. Exiting." -ForegroundColor Green
            $nomigration = $false
            }
        Else
            {
            # Cluster needs to be rebalanced.
            #
            # Get all VMs on the "host with MAX mem usage".
            Write-Host "Retreiving all VMs on host [$HostMaxMem]." -ForegroundColor Cyan
            $VMsOnHostWithMaxRAM = Get-VMsFromHost -Hostname $HostMaxMem
            Write-Host "Retreived $($VMsOnHostWithMaxRAM.count) VMs from the host." -ForegroundColor Cyan
             
            # Get the VM on the list with the least memory usage.
            #
            # If there was a problem with the last VM migrating to a new host, the following loop will
            # pick up that same VM however since it was removed from $AllClusterVMs, the loop will choose
            # the next VM in the list, effectively bypassing the problematic VM(s).
            $iLoop =0
            Do
                {
                # Get the next VM on the host with the least memory usage.
                $otargetvm = ($VMsOnHostWithMaxRAM.GetEnumerator() | Sort-Object Value)[$iLoop]
                $iLoop += 1
                }
            # Repeat the above until we find a VM which is in the list ($AllClusterVMs).
            While  ([String]::IsNullOrEmpty(($AllClusterVMs | Where-Object Name -eq $otargetvm.Name)))
            $targetvm = $otargetvm.Name
            Write-Host "[$targetvm] was chosen to be migrated. Memory size = $([MATH]::Round($otargetvm.Value / 1024,2))." -ForegroundColor Blue -BackgroundColor Cyan
            # Remove the VM name from the list of ALL VMs so that it does not get migrated more than once.
            $AllClusterVMs = $AllClusterVMs | Where-Object {$_.Name -ne $targetvm}
            $SourceHost = $HostMaxMem
            $TargetHost = $HostMinMem
            Write-Host "Moving $targetvm from $SourceHost to $TargetHost." -ForegroundColor Magenta
            If ($Test.IsPresent)
                {
                Write-Host "VM Name: $targetvm" -ForegroundColor Yellow
                Write-Host "Destination: $TargetHost" -ForegroundColor Yellow
                }
            Move-VM -VM (Get-VM -Name $targetvm) -Destination (Get-VMHost -Name $TargetHost) -VMotionPriority:High | Out-Null
            # Wait 15 seconds for the cluster to stabilise.
            Start-Sleep -Seconds 15
            $i += 1
            }
        }
        # Display the final Cluster situation:
        Write-Host ""
        Write-Host "Final Cluster state:"
        $HostMinMem, $HostMaxMem, $RAM_Max_Val, $RAM_Min_Val, $ClusterIsBalanced = Get-HostMemUsage -Cluster $oCluster
        Write-Host "---------------------------------------------------------------" -ForegroundColor Cyan
        Write-host "Host with max mem: $HostMaxMem. Memory value= $RAM_Max_Val)" -ForegroundColor Cyan
        Write-host "Host with min mem: $HostMinMem. Memory value= $RAM_Min_Val)" -ForegroundColor Cyan
        Write-Host "---------------------------------------------------------------"
    }
 
Catch
    {
    Write-Host "ERROR: $($PSItem.ToString())" -ForegroundColor Red
    Write-Host "ERROR: $($PSItem.InvocationInfo.PositionMessage)" -ForegroundColor Red
    Write-Host "Some debug info:"
    Write-Host "VM Name: $targetvm" -ForegroundColor Yellow
    Write-Host "Destination: $TargetHost" -ForegroundColor Yellow
    }

OK … without digging too deep into your code …

How do you know what command exactly caused the error when you put all of them into the same try block? … and to have try catch really work and catch errors you have to make sure all commands throw terminating errors. Does Move-VM do this by default?

And the (extensive) use of Write-Host is at least considered bad style and in some cases can cause harm.

Instead you should use Write-Verbose or Write-Debug. This way you can turn on the console output if needed while normally your script does not pollute the console.
Here you can read more about:

Without having any experience with vCenter I will ask again:

The error message is …

So … what is the current state of the object before you run the command. There is obviously something wrong. And it is your job to figure out what it is. :man_shrugging:

And regardless of all that … when your current solution is to restart the script then you could still do exactly this inside your catch block … :man_shrugging:

I’d recommend to split your try catch block into much smaller ones.

Is there is a reason you can’t use vSphere DRS? It sounds like you may be reinventing the wheel, but perhaps there’s a good reason why that built-in functionality is inadequate for your scenario.

2 Likes

This is what I came here to ask about. This is exactly what DRS is for, but I don’t know VMware licensing well enough to say if/when this is an available feature.

This is high-level on how the script works:

  • Get all VMs in the cluster.
  • Find 2 hosts using custom function- one with the lowest used memory and one with the highest used memory. Call these $TargetHost & $SourceHost.
  • Compile a list of all VMs on the host with the most used memory. Call this $AllClusterVMs.
  • Of these VMs, find the one with the lease used memory (smallest footprint). Call this $otargetvm.
  • While $AllClusterVMs is not empty…
  • – Remove $otargetvm from $AllClusterVMs
  • – Issue the Move-VM command to request vCenter to migrate the VM from its current host to the host $SourceHost. Note: Only one VM is migrated at a time.
  • – Wait 15 seconds.

The Try..Catch will thus capture any errors in PowerShell which happen while the loop is vMotioning one VM at a time. This is how I know which VM is in scope.

The current state of the VMs in vCenter are OK. Once the script can no longer vMotion any VMs on the vCenter, I close and re-open the PowerShell session and the script then continues like nothing happened. If I keep doing this close/open operation, eventually all VMs will be vMotioned. I’ve also tested the vMotion in vCenter itself and there are no issues.

The fact that vCenter does the vMotions when the script asks it to, with no errors, indicates a problem in the scripting environment.

DRS is not an option. We have around 100 VMs in each cluster. We need to perform host maintenance so we put the host in maintenance. In this case, DRS vMotions maybe 20 or so VMs at the same time, this floods our vMotion network and causes PING drops on the VMs. Our customers have very sensitive applications which can handle one or 2 PING drops but no more. If DRS was to migrate all the VMs at the same time, we would experience outages.

And this may be our setup, I don’t know and I cannot change our setup at this point so I have to live with it until the next refresh.

The script thus vMotions only ONE VM at a time and automates the process.

Oh, we have DRS enabled on our clusters, and in most cases it’s good at balancing one or two VMs at a time. But when we are taking a host out of maintenance, this is where we experience the problems of too many vMotions in parallel.

In a lot of our clusters we have had to resort to DRS automation set to Manual due to the sensitivity of customer VMs in those clusters.

And the fact that you get an error stating that current state of the OBJECT is an issue indicates that you should try to figure out what the current state of the OBJECT is before you run the command. :man_shrugging: :man_shrugging:

Good thinking @Olaf.

A suggestion would be to add to the script a Get-VM before the Move-VM that outputs all the relevant properties of the VM object - which would hopefully include useful information to troubleshoot.

Also, running your Move-VM with -Verbose might capture more useful information in the log about what that command is doing under the hood.

Getting an understanding of the state of the object before moving might help determine what condition that VM is in that’s causing the issue. Then you could either deal with that condition before moving or skip that VM when you realize it’s in that condition.

1 Like

It might not be about the VM per se. Instead it could be about the OBJECT reprensenting the VM.

1 Like

If I’ve read the thread correctly the error is:

Operation is not valid due to the current state of the object.

vSphere will give this error if you try an operation on an object (a vm) while it is busy processing a task on that same object. For example, issue the shutdown guest command and then while the VM is shutting down but still powered on issue it again and you’ll get the same error. We use Zerto and Nutanix, I see these errors often and once the initial task completes the API’s that threw the error try again and complete the task.

So, if your script is just looping and trying again that would explain the error.

I don’t know how to query the tasks and their state, but that would be more reliable…

maybe consider tuning vMotion to limit the max concurrent operations. That way you can just put a host in maintenance mode and not have to fuss with anything else:
Resource Manager Settings (Broadcom)

I still cannot see any issues on the vSphere side. The error coming from my script happens on different VMs and in ALL cases, I check the vCenter an NO errors/events are logged AND the VM in question is ALWAYS vMotioned. So if there was INDEED an issue at the vSphere side, there would be:

  • An error/event.
  • The VM would NOT vMotion

Furthermore, as I’ve stated, once I get the error in my PowerSHell session, ALL vMotions from the script result in the same error, for EVERY VM, and the ONLY way to get the script running again is to close the PS session, start a new one, log back into the vCenter and run the script which then runs for another say 80 VMs and then the error happens again in the script, although the VMs in the vCenter are still vMotioning as the script issues the Move-VM commands.

So in summary:

  • The script’s Move-VM starts the VM vMotion in vCenter.
  • No errors are logged in the vCenter.
  • The script keeps erroring with each successive Move-VM, however the respective VM still gets vMotioned in vCenter without error/events.
  • The only way to get the script working again is to close and re-open the PS command window.

Are you actually reading the answers you get? :man_shrugging:

It does not help you any further to state that there are no errors on the vCenter side.

So - YES - it is probably the script having an issue. :roll_eyes: But we do not have access to your environment. So we cannot test or play around. So you have to debug it yourself.
I’d recommend to streamline your code. Remove all unnecessary output. Wrap single commands in try catch blocks. Use verbose output where suitable. And check and maybe log the state of objects BEFORE you use them.

3 Likes

As Olaf has stated, break things up a bit (and use try/catch blocks). I would also not direct stdout to /dev/null (Out-Null) as the output might give you some clues. Maybe asign variables to the two function calls you have on a single line, so turn the command into something like:

$sourceVM = Get-VM -Name $targetvm
$destVM = Get-VMHost -Name $TargetHost

Move-VM -VM $sourceVM -Destination $destVM -VMotionPriority:High
2 Likes

I would even consider looking at each object as it is accessed:

$sourceVM = Get-VM -Name $targetvm
$sourceVM | Out-String | Write-Host
$destVM = Get-VMHost -Name $TargetHost
$destVM | Out-String | Write-Host

$moveVM = Move-VM -VM $sourceVM -Destination $destVM -VMotionPriority:High
$moveVM | Out-String | Write-Host

Then all the properties of those three objects will be in your output for review. Hopefully that will give you some clue about why things are misbehaving.

Bonus tip: Look at Start-Transcript and Stop-Transcript for capturing the output - it could be a useful way to review what’s happening.