Can I get some good real-world examples of workflows?

by willsteele at 2012-09-09 15:46:38

I am not a sysadmin nor a developer. I manipulate data all day long. Some SQL, some folder/file management, some file signature analysis, zipping, etc. All the workflow examples I see are for things like provisioning. Can someone sketch a few examples (they can be totally made up) that relates more to data management. My problems tend to involve custom function, reading file headers, processing folders (in the thousand/tens of thousands) and putting things in zip files before archiving them. If anyone could show some examples that might make more sense to me I would appreciate it.
by DonJ at 2012-09-09 16:23:04
Yes, real-world examples are a trifling absent at the moment. Provisioning is the biggie, and not even all the examples I’ve seen of that actually require workflow’s unique capabilities.

Anything you do "in the thousands" might be a starting point. For example, if I needed to process a thousand files, and I wanted it to happen in parallel, and (this is the big part) I wanted it to pick up where it left off if the machine doing the processing died or something… that might lead me to workflow. It’s that last bit that’s the real "selector" for workflow - the ability to suspend and resume.

Another example: You need to process a mess of files, stick them in a ZIP file… but then you need to suspend automation and do some kind of manual review or check. Then, you want to resume automation to do something with the ZIP. That could be a workflow… although I could achieve the same effect more simply in other ways, too, I suspect.

In the real-world, Server Manager in 2012 is, right now, the only thing using workflow in Microsoft’s world (it’s still new, after all). And that’s for provisioning.

The real magic in workflow, that you can’t get anywhere else, is the ability to checkpoint its progress to disk and resume in the event of interruption. That’s not all workflow can do, but it’s one thing that it alone can do more or less automagically. So ask yourself if you have any processes that could benefit from that feature.
by willsteele at 2012-09-09 19:26:00
The point about suspension/resuming as key to determining whether to use a workflow is great to know. Thanks for throwing that out there. Definitely putting that in the mental notebook. Most of the time I have worked out my logic/error handling/etc with smaller test cases before I get to the point of scaling up to the point where workflows might come into play. So, I don’t think that I will need to do a lot with the ability to stop/start the process. I guess I had it in mind that I could just take advantage of the workflow framework to handle the queuing and overall organization. Taking the general premise from above, I might do something like this (or at least it’s what I had in mind):

function Analyze-Folder
{
param(
$folderpath
)
Some-CustomProcessing -Path $folderpath
}

workflow test-workflow
{
foreach -parallel($folder in (Get-ChildItem -Path $rootfolder | Where {$_.PSIsContainer}))
{
InlineScript{Analyze-Folder -folderpath $folder.fullname}
}
}

test-workflow


I am still trying to get my head around this, so, what I have laid out may be completely irrelevant. The main way I envisioned trying to get this to work was where I would work with large folder collections in parallel (i.e., faster and managed) and simply repeat the same processing steps on one folder at a time. Now, I have managed (not very well) in the past was to pass each folder to a job and processed them individually. From what I could tell, this seems like a perfect fit for workflows. Just not sure if the way workflows really work allows this. Still learning…

I’ll start looking for ways to incorporate the checkpointing. Seems like, in my case, this would be a great way to ensure a given set of folders had been processed and would not need to be rerun in case, for some reason, it did not work correctly. As a side note, if I wanted to denote something was processed, would I just use Add-Member and drop a new field onto the folder I handled, maybe something like .Processed = $true? Maybe another thread?
by DonJ at 2012-09-09 19:33:05
Organization is all in how you write the script. Queuing can be done with Remoting. Simply "bulk" isn’t necessarily a definitive pointer to workflow.

But yes - you’ve got the basic idea with your example. It’s not unlike passing them to a job - except that WF will keep track of what’s been done and what hasn’t, so it’s more resumable. The problem with WF is that everything you do has to be translated to WWF-speak. Which means, for now, any non-core cmdlets will have to be put into an InlineScript, as you’ve done. Those have been less than 100% reliable in my testing, but with a simple enough command should be fine.

Workflow automatically tracks what’s been completed, and resumes interrupted processes. You don’t need to do anything to it, provided that whatever you’re doing can be safely re-run if it was interrupted halfway through.

Just remember: PowerShell doesn’t run workflows. It translates them, and hands them off. So don’t start making assumptions about what’s happening based on your knowledge of PowerShell. PowerShell is just translating, not executing.
by coderaven at 2012-09-09 20:42:27
Over the past week, since I have gotten my hands on the RTM bits and have had the time to setup a fresh test lab, I have been working on a workflow and at the same time been trying to put together a post as a show and tell to get people familiar.

The workflow I am trying to create a an everyday task that if done manually just about everyone misses a step. Let me give you some background.
There are a large number of users at my organization and just about every user gets an S: drive, our standard set of shared folders. This drive is hosted with DFS and has ABE enabled. The folders it contains are all grouped together in bundles and placed on different actual shares depending on the required performance or other needs. If we just had permissions at the root level \domain\dfsroot\share\departmentalfolder (departmentalfolder bing the root of where users start accessing files) all would be pretty easy. In any of the root level folders, HR for example, there are many subfolders that grant different users different access, and there are even folders in there that may be public access that link up into SharePoint or something crazy like that. This subfolder permission can be complicated if you don’t know what you are doing. Since we have ABE DFS and the file system itself is ABE, you can grant someone access to a folder down in the structure but they would not be able to browse down to it unless they know the exact path.

What has to happen here is a list of things.
1. A request in our ticketing system needs to be created
2. New AD Group Created
3. Nest the new group into the group that maps the S drive since it sets permissions there
4. Initial members need to be added to the group
5. DFS allow read permission need to be set
6. The permissions required for the new secured folder need to be set
7. For each folder in the tree that users need to enter to get to the new folder needs to have permission set so they can do so
8. The newly created request need to be closed if all goes well.

In the past I have created this as a regular PowerShell script and it works, but I think it would fit nicely into a workflow.

Here what some of the documentation states are considerations on when to use a workflow
-You need to perform a long-running task that combines multiple steps in a sequence.
-You need to perform a task that runs on multiple devices.
-You need to perform a task that requires checkpointing or persistence.
-You need to perform a long-running task that is asynchronous, restartable, parallelizable, or interruptible.
-You need to run a task on a large scale, or in high availability environments, potentially requiring throttling and connection pooling.

The workflow I am wanting to create fits in the first two items in the list - long-running and more than one device. When creating the workflow, I will have my central event server handle the little stuff like creating the ticket, and working with AD. The file permissions would be better suited to run on the server sharing the files so there is less network traffic. Also, when setting those permissions, they must be set one at a time because I will be starting from the bottom and moving up. I could work top down setting permissions but either way if I start setting permission on one folder and also it’s parent at the same time, there could be some issues. These folder could be MB in size or TB in size taking 1 minute to many hours per level.

I hope this helps and I am trying to get some good starter stuff together and get it posted out here soon.
by DonJ at 2012-09-10 07:02:25
Yeah, I have some issues with the documentation’s guidance at this point. It’s a bit overreaching.

Long-running and more-than-one-device isn’t necessarily workflow. You could do that with remoting (which will parallelize operations across multiple devices, although will not parallelize what each device does) and background jobs (for the long-running aspect).

Your task list consists mainly of very small individual transactions, each of which can be accomplished quite rapidly. I don’t think I’d tackle it as a workflow. Keep in mind that workflows are exponentially more complex, both to design, write, troubleshoot, and debug.

Because your tasks are each so small and quick-running, I don’t think you’d need task parallelization. Also, your tasks need to come in an explicit sequence. I could see you wanting to make these restartable, but how many times have you had this sequence interrupted, and had to hand-troubleshoot to see what had been done and what hadn’t been?

Now, the permissions-setting bit in particular I could see. You’ve got a lot of folders on which to set, and you’d certainly gain benefit in parallelizing that. You’d want it to be self-resuming, too, since if it got interrupted halfway through it’d be a real pain to figure out how far it’d gotten. So make just that bit a workflow. Kick it off from the main script.
by willsteele at 2012-09-10 07:11:47
Ok, I see what you’re getting at. Yeah, the parallelization of specific tasks in the list of actions above are what would really benefit from the workflow. I have frequently swallowed 40Gb of memory running tons of jobs, let’s say one per folder, on thousands of folders. Even with a distributed workload, it ultimately bogs down and just takes some time. But, such is the nature of my little beast.

And, as you noted, it seems that complexity really is exponential. They seem very multidimensional in terms of design/implementation as compared to the types of things I have tackled (deep in one dimension) but very limited in total overall complexity. I’ll keep playing with them to see if I can find places to really use these. I think my environment may just be too small at this point and my team is not far enough along with PS to think in the multiple dimensions necessary for folks to see potential applications yet. Back to it…