LCM stuck in busy state after the DSC setscript resource returns an error.

We are trying to automate an install process on multiple computers in a network(nodes). We need to use the setscript resource because we are running command line tools as well as an installer that is an exe with no product id(so its not compatible with the other package resources).

We notice that when the setscript resource returns an error the LCM is stuck in a busy state and we cannot re-run our DSC configuration again without running several commands to clear the WMI process. Sometimes errors with the setscript resource also cause the DSC job to freeze indefinitely

Is there an easy way to set the LCM to ready state after a setscript job returns an error or throws an exception?

 

 

 

 

 

Can you reproduce the error without using that exe?

I would not be surprised that the exe installer has spawn one or several child process that isn’t terminated, and leaves your script resource hanging.

I think it would be good to rule out this particular exe for troubleshooting.

Also, please share:

  • the exe if you can (if it’s public/common)
  • the install arguments
  • the script resource properties you use (minus anything you shouldn’t share)
  • What command do you run (I guess Get-DscConfigurationStatus) and what error you get saying it’s busy?
  • What commands do you run to “reset the LCM in a working state”? That might help indicate “how” it’s stuck.

Thanks :slight_smile:

Hey Gaelcolas,

 

-Here is the installer we are using. Installing on Windows — RabbitMQ the arguments are /S and /D=“<path>”. The setscript resource looks like this

$proc = Start-Process “$($Using:MediaHome)\software\rabbitmq\rabbitmq-server-3.8.3.exe” “/S /D=$Using:RabbitMqHome” -Wait:$false -Passthru

Its hard for us to tell what exactly is causing the hang since it does not happen every single time, but I think you are right about the child process.

-There are no script properties I can think of.

-I believe the way we check is Get-DscLocalConfigurationManager before the script is run to check if its in a busy state. We can’t allow the scripts to be rerun unless the LCM is not busy for our case.

 

-The commands we use to reset the LCM are:

  1. get-process wmi | stop-process -force

  2. restart-service winrm -force

  3. Remove-DscConfigurationDocument -Stage current,pending,previous

 

I must run those commands in that order or else the LCM will stay busy for 30 min or more(I’ve seen 2 hours). One of my co workers just mentioned to me that it seems like the LCM stays in busy state longer when this code is present:

foreach ($user in $ConfigurationData.RabbitMq.Users ) {
Script “AddUserTags_$($user.User)” {
GetScript = {@{ Result = ‘’ }}
SetScript = {
$taglist = New-Object Collections.Generic.List[String]
foreach($tag in $using:user.Tags){
$list.Add(“$($tag)”)
}
$process = Start-process -FilePath “$($Using:RabbitMqHome)\sbin\rabbitmqctl.bat” -ArgumentList “set_user_tags”, “$($using:user.User)”,“$($list)” -PassThru -Wait -NoNewWindow
if ($process.ExitCode -ne 0) {
throw “Add User Tags failed”
}
}
TestScript = {
$userTags = New-Object Collections.Generic.List[String]
foreach($tag in $using:user.Tags){
$userTag.Add(“$($tag)”)
}
foreach($tag in $userTag){
if(!(invoke-expression “$($Using:RabbitMqHome)\sbin\rabbitmqctl.bat list_users” | Select-string “^$($using:user.User)” | Select-String “$tags)”)){
Write-Verbose “User tag already exists”
return $false
}else{
Write-Verbose “User tag does not exist, creating…”
#return $true
}
}
return $true
}
}
}

The script here is just reading information from a list and running a command line tool two add that list data to a database.

Do you think that calling New-Object Collections.Generic.List[String] should have an impact on how long the LCM stays busy. Another thing to note is that the LCM stays busy even if it errors before reaching the script shown above. However if I remove that script the LCM takes way shorter to recover from busy state(4 min vs 30 min)

 

If you have a clean way for us to get the lCM out of busy state on error/exception then that would help a lot. Right now we are working around the LCM hang but if you have any idea how we can stop the hanging then that would also be a lot of help too. Thanks!