Hi Mike,
I think having different Pull server is not necessary here, as its role is only to serve the MOF it knows to be the current configuration over HTTP. What you want is just switching between MOF versions for a given node.
The overall approach to follow here is a Release pipeline as documented in “The Release Pipeline Model” whitepaper by Steven Murawski & Michael Greene.
To quickly summarise the idea, the pipeline is the automated assembly line that will compile unique (and immutable) artefacts from your changes (in source control), before helping you progressively build trust in them by use of tests, staged releases, gates, or whatever you may need.
With DSC, you have several artefacts used and composed at different stage of the process:
- MOFs & Checksum
- Meta MOFs
- DSC resources & Composites in PSModules
- PowerShell Modules Zipped for Pull Server with checksum
- Certificates
And some other I’m omitting here.
What we implemented at my last customer for similar scale was the following:
1. Separate dependencies in their own pipelines
If your infrastructure needs custom DSC Resources & Composites, separate them in coherent modules and build trust in them (compile & test & Release) independently (in PS Module Pipelines, using
Test-Kitchen for DSC Resources/Configs). This allows you to separate the Data (parameters) with the code, to better handle complexity by reducing the scope of each changes (hence reducing batch size, reducing the risk with each change).
2. Separate Data from Code
As the code has been tested in 1. and is now trusted, you will now 'compose' your infrastructure configuration in a "Control Repository" (aka control repo, which is the puppet terminology but it's spot on).
It's basically a git repository where all changes to your infrastructure are funnelled through (for collaboration), and which will
eventually be applied to your nodes: the starting point of your release pipeline for your whole infrastructure.
That repository is made of all the directives that makes your configuration: "on node
x apply resource
y of version v.v.v with parameter
z".
Traditionally, it's done with a DSC configuration files, but there's many ways to do it (i.e. ways managing configuration data, abstracting in Composites and so on). The ones I'll mention here are Puppet's Roles & Profiles, Chef's Roles and Runlists, Ansible's Roles & Playbooks, and my DSC
Roles & Configurations.
You'll still have a bit of code in there, probably to manage the automation, but only
3. Control the changes, Automatically
Before a change is allowed to be pushed/merged to that control repository, make sure you validate it: Linting, Syntax, security, everything that gives quick feedback to the user before you even start compiling MOFs.
For example, you may want to ensure no Nodes is set to allow plain text password in their MOF.
4. Compile & Version the Artefacts
Now you'll mix and mash this together, and compile the artefacts listed earlier. This will give you another opportunity for feedback, because if some parameters end up wrong, or you have resources duplicating changes, the DSC Compilation will fail and tell you.
You want to version it, because you want each artefact to be Unique, so that you build trust in a unique (immutable) version, not just in something that changes over time.
So that’s important, and we’re starting to actually solve your problem.
Now you have different versions of MOFs you built over time (amongst other Artefacts):
SERVER01_v0.0.1.MOF
SERVER01_v0.0.4.MOF
SERVER01_v1.2.3.MOF
5. Separate Release & Delivery
We actually learned that lesson the hard way.
The Release first:
Once you’ve compiled those artefacts, store them somewhere where it’s easy to find and use.
We just dumped the build artefacts to a file system, where we had 1 folder per version released (ex v0.0.1)
A subfolder per type of artefact: MOF, LCMConfigs (MetaMOF), ZippedModules, RSOP (custom artefact)
Then in MOF a folder for all the stages we had (more on this later): TEST, STAGE0, STAGE1, STAGE2.
Containing each server’s MOF & MOF Checksum: TESTSERVER01.MOF…
Overtime that will be a lot of artefacts, but you can now manage your history.
That is the release (making those artefacts available in a ‘feed’).
Now the Delivery (deployment):
That would be a separate ‘process’ (or build job), which has its own rules (don’t deploy on a Friday, or whatever makes sense to you, again, to build trust in the system).
This job would most likely copy (simple file copy) what you want to release to the Pull Server: cp \v1.2.3\MOF\TEST* \pullserver\MOFs
You’d also store delivery metadata somewhere (what has been released, last known good, when it was released, approvals, ticket reference, and whatever helps you).
How that solves your problem? If you want to go back and use a previous MOF, it’s just a copy job. I’d warn you the same way @Alex Aymonier did, you’re not dealing with immutable infrastructure, so it’s never a rollback but a fix forward using a previous MOF, so it may, or may not work (but you know this already…).
Another element, is how you promote those changes so that you trust them: The promotion process.
For a given version, say v1.2.3, you will have compiled all nodes’ MOF, for every staging environment. The idea is that where you deploy first should not impact the business if there’s a failure: your first staging environment, usually test.
Then, when you think that deployment is ‘successful’ (your threshold of trust for next stage), you deploy to the following stage.
Our STAGE1 was the Developers Test servers: Production servers where 100’s developers where testing on, so if impacted it would be annoying and hinder productivity, but no end customer impact. So the required MTTR wasn’t too high.
Then Stage 2 and 3 were the production environment and the DR environment respectively.
The point is that you build up trust into that version v1.2.3 until you’re happy it goes everywhere. If it fails to meet your standard at some point, you try to ‘shift left’ (bring test and feeback earlier in the process for next deployments), and create v.1.2.4, an start the process again. NEVER jump the pipeline.
6. Continually Improve
Bear in mind, that the key questions for how to divide those stages are:
- what will help me trust that change?
- what is the risk of failure vs impact potential for that stage?
- how can I go as fast as possible with the above limitations?
And your answer to those questions will evolve over time as your pipeline and configuration evolve, and you gain in visibility as to what’s happening to your infrastructure.
You will start fairly small, managing maybe a single role with very limited features, and a slow release cadence. Then will need to go faster and grow bigger at the same time.
Good luck!