Help with workflows and parallel loops

So I just recently started investigating PowerShell workflows and loops with the -parallel switch. My first thought, when I started reading about it was: “This is great. I should be able to write a script that queries every domain controller in our environment for a user’s ‘lastlogon’ attribute, and instead of waiting for every DC to respond (we’ve got about 80 in our environment) query every DC, essentially simultaneously.”

However, my first attempt at a script didn’t work out the way I wanted. He’re what I tried:

---------------------------begin script-----------------------------------------

workflow querydcs {

param([string]$computers)

foreach -parallel ($computer in $computers)

{
get-aduser “MyUserAccount” -server $computer -properties lastlogon | select @{name=“DC” ; expression={$computer}},lastlogon
}

}

$computers = get-addomaincontroller -filter * | select -expandproperty name

querydcs $computers

--------------------------------------------end script----------------------------------------------------

(Note: for simplicity’s sake, in this example I’m not converting the lastlogon attribute to human readable form)

So, the script appears to work as intended EXCEPT my output only includes the “lastlogon” value from the select statement but not the “DC” attribute value, which is just the value of the looping variable.

Output looks something like:

DC:

LastLogon: 5/10/2017 10:50:23 PM

PSComputerName: localhost

PsSourceJobInstanceID: hexhexhexh-hex-hex-hex-hexhexhex

Clearly there’s a fair amount going on under the hood that I don’t understand, here (why are PSComputerName and PsSourceJobInstanceID being returned, for instance). My biggest gripe, though, is that knowing the last time a user authenticated against a random DC is significantly less useful than also knowing the what DC he authenticated against. So what am I doing wrong here?

Yeah, a LOT. Workflows aren’t run by PowerShell. They’re run by WWF, and the rules are entirely different.

Before we dive into this, have you considered the lastLogonTimestamp value instead, which is replicated across DCs? So you only have to query one?

Thanks, Don. In 99% of cases I use lastLogonTimestamp but there are cases where HR wants to know precisely the last time a user touched the network (say an employee was terminated but the account wasn’t disabled on time) and lastLogonTimestamp is only updated every 14 days essentially. Also, lastLogonTimetamp is a pretty unreliable value, as it can be tripped by merely doing an “effective permissions” against the user on some random ACL (something that SharePoint does for large collections of users from time to time)!

you can try to use workflow with PSComputername scheme:

workflow querydc { param($user) get-aduser $user -properties lastlogon }
>querydc -PSComputerName $dclist -user $username

Thing is: I’m trying to associate the lastlogon value with a specific domain controller. That code snippet doesn’t look like it would do this.

OK, so this works:

--------------------------------Script-------------------------

workflow querydcs {

param([string]$computers)

foreach -parallel ($computer in $computers)

{
InlineScript {
get-aduser "MyUser" -server $using:computer -properties lastlogon | select @{Name="DC" ; expression = {"$using:computer"}},@{Name="LastLogon" ; expression = {[datetime]::FromFileTime($_.lastLogon)}}
}

}

}

$computers = get-addomaincontroller -filter * | select -expandproperty name
querydcs $computers

----------------------------------------end script-----------------------------

But it’s pretty slow. I was expecting this script to return a collection of 80 queries in about the same time it takes to query a single DC, but it doesn’t feel like that. Almost feels like I’m doing a simple, serial loop rather than a bunch of tasks in parallel. Now, due to IPSec rules and distance we do have some machines that take a while to respond. I wonder if the task is merely taking as long as the slowest machine.

So, here’s your problem.

InlineScript - whether explicit, or when used implicitly around a PowerShell command for which a Workflow Activity isn’t available - is always going to launch a new PowerShell instance, and essentially be its own scope. Many of Workflow’s alleged advantages evaporate when you don’t have a Workflow Activity and are instead working with InlineScript. WWF itself will start to throttle parallelism, using an invisible and uncontrollable algorithm, if you start to suck down too much RAM or CPU - and launching multiple PowerShell processes will certainly tip into that at some point. Each process, in your case, also has to load the AD module, which is non-trivial.

That’s why I initially asked if you were hell-bent on using Workflow, or if you just wanted this to work quickly. Workflow is literally the most complex way to do essentially anything, and it involves a completely different execution rule set and environment. It only looks like PowerShell on the surface.

For example, if your domain controllers have Remoting enabled, this would be vastly easier if you just used Invoke-Command, which also offers parallelism, tracks which machine a result came from, and works entirely inside PowerShell. And when I say “vastly,” I mean, like, “one line of code.” Maybe two.

A lot of people get… “tricked” into Workflow. I get it. It’s shiny, and the docs make a lot of big promises. But for what Workflow was supposed to accomplish, the actual implementation was about the worst way Microsoft could have gotten there. Heck, the underlying WWF is basically deprecated, which tells you how committed the .NET team is to it.

E.g.,…

Invoke-Command -Computer (get-addomaincontroller -filter * | select -expandproperty name)
               -ScriptBlock {

get-aduser "MyUser" -properties lastlogon | select @{Name="LastLogon" ; expression = {[datetime]::FromFileTime($_.lastLogon)}}

               }

The idea being that this runs ON each DC, locally, as if you’d logged into the console. Remoting will automatically add a PSComputerName to the return value, so you’ll know which DC returned which value. And they’ll run in parallel, and you can control the -Throttle for that (defaults to 32). Because each DC is querying itself, it ends up with a really fast connection to… well, to itself.

On DCs running 2012 or later, this will “just work by default.” Earlier (back to 2003) would need PowerShell v2 or later, and would need to have Enable-PSRemoting run to enable Remoting. Which is the same thing Workflow’s -PSComputerName parameter would have been using.

Thanks, Don. Helpful stuff here. I’d hate to bring a server to its knees because I was running 10,000 instances of PowerShell!

You’re right. Workflows do seem to promise unlimited wealth and power (not to mention getting repetitive tasks done more quickly) but I’ll be mindful of their limitations.

Did you try this or it just your thought ?
Did you understand what changed by -PSComputerName parameter usage ?

this code can get lastlogon for specific user from domain controllers list in parallel (as you wish, by workflow) because it will execute command on DC’s itself just like Don’s Invoke-Command. This is not about ‘how to choose the last timestamp from many returned’

OK, max I got you. Unfortunately your script won’t work for me because is can only be run under Domain Admin credentials (due to logon restrictions on DCs). I need a script that can query a DC but doesn’t have to be run on the DC.

so, because Get-ADUser seems not thread safe I can suggest you only adsi + RSJob way

something like

$dcs | Start-RSJob {
   $dchost = $_
   $ds = New-Object System.DirectoryServices.DirectorySearcher
   $ds.SearchRoot = "LDAP://$($dchost)/DC=yourcorp,DC=com"
   $ds.Filter = "(ANR=YourUserName)"
   [void]$ds.PropertiesToLoad.Add('LastLogon')
   $ds.FindAll() | Foreach-Object {
      [PSCustomObject]@{
        DC = $dchost
        LastLogon = $_.Properties.lastlogon
      }
   }
} | Wait-RSJob | Receive-RSJob

Thanks, Max. I may look into this. To be honest I was asking the question not so much because I needed to solve that particular problem but because I wanted to better understand the capabilities and limitations of workflows and looping with the parallel switch. That particular problem was just one I thought would make sense to tackle with a workflow.

I’ll have to look into RSJob separately and see how that might be a useful tool. To be honest this is the first I’ve heard of it.

when workflow first appeared I was dreamed that it can help me with many things
but in fact I now use it only in old scripts that I dont want to change @because it works@

imho it can be used only in fire-and-forget scenarios on local machine and with -PSComputername parameter on remote one’s.

PoshRSJob (GitHub - proxb/PoshRSJob: Provides an alternative to PSjobs with greater performance and less overhead to run commands in the background, freeing up the console and allowing throttling on the jobs.) or Invoke-Parallel - this is my choise