Issues with Node Failing Consistency

A little background on the issue.

We have (4) pools of 70+ web servers running IIS 8.5 and Windows 2012 R2. Each of these servers should be configured identical which I thought was a perfect use case for DSC.

I wrote a custom PowerShell to provision the VM’s in VMWare using VSC/PowerCLI from a standardized template containing minimal pre-installed software.

I run the script to provision the machines and everything works perfectly in the cloning process. During bootup, these machines execute a startup script that enable remote powershell among some other very basic functions specific to our environment.

Once the machines are alive, I connect to each VM via RemotePS and execute (1) script and (1) function.

  • Script – (gist link below) is the script used to generate the meta.mof file for the local LCM to connect to the pull server stored in the d:\scripts\dsc folder
  • Function – Set-DscLocalConfigurationManager -path d:\scripts\dsc -computername {hidden} – to apply the meta.mof file to the machine.

Some of these machines initiate their consistency check and finish properly in the desired state while others do not and fail the consistency check, and when I look in the event logs under DesiredStateConfiguration for the VM’s that show consistency failure, I get error messages such as I show below.

If you notice in the error the URL has an extra slash in it (///). When I cut and paste that URL as is, it fails. If I remove a single slash (//) as you see below, the URL at least connects to the pull server and returns something (although I am not sure exactly what it is supposed to return).

https://lv-dsc-pull.domain.com:8080///PSDSCPullServer.svc/Nodes(AgentId=‘B52CF336-2C34-11E6-8102-005056A87D2F’)/Configurations(ConfigurationName=‘LV-WEB-D’)/ConfigurationContent

https://lv-dsc-pull.domain.com:8080//PSDSCPullServer.svc/Nodes(AgentId=‘B52CF336-2C34-11E6-8102-005056A87D2F’)/Configurations(ConfigurationName=‘LV-WEB-D’)/ConfigurationContent

The weird thing is, if this were the issue would it not have the same impact on ALL servers? I am new to DSC so please help.

Gist Link (LCM Config File)

Gist File (XML Output from Pull Server Once I remove extra slash)

Error Message from Event Log

Job {BC2B7A68-2C34-11E6-8102-005056A87D2F} :
This event indicates that failure happens when LCM is trying to get the configuration from pull server using download manager NULL. ErrorId is 0x1. ErrorDetail is Cannot find configuration https://lv-dsc-pull.domain.com:8080///PSDSCPullServer.svc/Nodes(AgentId=‘B52CF336-2C34-11E6-8102-005056A87D2F’)/Configurations(ConfigurationName=‘LV-WEB-D’)/ConfigurationContent on the server.

Is this error from Pull Server encountered by using DSC or when you tried to use the URL directly? You can use ‘Update-DScConfiguration -verbose -wait’ to force DSC to do pull. Can you share the errors from pull server generated by using ‘Update-DscConfiguration’ cmdlet?

When I hit the URL directly after fixing the triple slash, issue. Which errors do you want from the Pull Server? From Event Log? Please remember I am pretty new to DSC and not fully aware of where all the data is to help troubleshoot.

Just FYI, I wasn’t expecting the URL to work, but with the triple slash it doesn’t work at all and my point was when removed I get the error page at least.

Basically when it tries to do its consistency check it generated that error I posted from the event log. Issue is that all of these VM’s are clones of the same template and the scripts that exist on them are exactly the same but some VM’s start and complete the consistency check, while others do not. Right now it seems like it is about 50/50. I create 10 clones and 3-5 of them will register, start consistency and successfully complete and the rest register, fail consistency and generate the error I posted form the event log.

You can collect the logs from pull server using:

Get-WinEvent -LogName "Microsoft-Windows-Powershell-DesiredStateConfiguration-PullServer/Operational"

Get-WinEvent -LogName "Microsoft-Windows-ManagementOdataService/Operational"

Best way to collect the logs from target node is using xDscDiagnostic from GitHub - dsccommunity/xDscDiagnostics: This module contains cmdlets for analyzing DSC event logs..

The URL that you see in the error code could just be a formatting issue, so I wouldn’t rely on it considering some of your machines are working fine with same/similar URL.
Are you trying to register all nodes at once? Try to add some randomization and see if that helps as well.

Edit
I get consistency check success when the config is out of sync and DSC reapplies settings. If it checks the pull server for a new config and there is nothing new, it still reports failure.

I’ve been seeing similar issues where Get-DSCConfigurationStatus will report Failure for type:consistency checks, but success for type:initial check when I kick it using Update-DSCConfiguration. Configuration is still maintained if I remove a piece mananged by DSC. Noticed the below on the DSC known issues page:

https://msdn.microsoft.com/en-us/powershell/wmf/limitation_dsc

Get-DscConfigurationStatus returns pull cycle operations as type Consistency

When a node is set to PULL refresh mode, for each pull operation performed, Get-DscConfigurationStatus cmdlet reports the operation type as Consistency instead of Initial

Resolution: None.

Jered, now that you have been to narrow down the issue, can you collect the logs and share.
Pull server logs using:

Get-WinEvent -LogName "Microsoft-Windows-Powershell-DesiredStateConfiguration-PullServer/Operational"

Get-WinEvent -LogName "Microsoft-Windows-ManagementOdataService/Operational"

Best way to collect the logs from target node is using xDscDiagnostic from GitHub - dsccommunity/xDscDiagnostics: This module contains cmdlets for analyzing DSC event logs..

BTW what you are experiencing is not a known issue.

Working on setting up for the xDscDiagnostic module…heres a dump of log data in .csv format

//dscoperational.csv · GitHub

I am still having an issue with node registration. I get nodes that register fine and others that appear to register but when fail consistency and fail status report back to the pull server. I am not sure I understand why this is happening. I execute many commands via WinRM against the machines to run the DSC Commands, such as Pull Server Registration (LCM) and the Set-DscLocalConfigurationManager d:\scripts\dsc. Sometimes when it executes it states that the pull server does not have an agent registered with that ID. This has become a pretty big issue, I thought this was related to IPv6 so I disabled it and that did not solve my issue. Any help with this would be awesome. I have another datacenter to do with another 140 vm’s to create and the process was painful when I had to hit many of the boxes manually to register them and execute the update-dscconfiguration.

There is a known issue ( is being fixed in the next release) that pull server ignores the registration for some of the nodes when Pull server is using ESENT database and multiple nodes are trying to register with the pull server simultaneouslyThere are two ways to fix it:

  1. If somehow you can serialize (not so easy) the registration with the pull server this issue should go away.
  2. Instead of using ESENT database use OLEDB.

What kind of time frame between. Would 10 seconds work? I do register them in serial fashion now, but I want to know what the thoughts are on the time between registrations.

How would I switch to OLEDB when using WMF 5?

Registration 10 seconds apart is fine. As long as two (or more) calls are not happening simultaneously it should work. Update web.config of the pull server (under C:\inetpub\wwwroot\ps*) for database connection settings.
PullServerOleDbSettings · GitHub