Request for code review - sas.ps1

by scottbass at 2012-09-18 06:08:16

Hi,

Here’s another request for code review. Again, this code works. There’s no need to run it, but if you do want to, you could just change the Invoke-SAS function to say sleep 10.

What I’m most interested in here is parameter processing and validation techniques, and the way I’m handling parameter vs pipeline input.

This script is more a work in progress. It’s about 80% done, but I would like to add asynchronous processing as well. I’d use Start-Job for that, but from everything I’ve read, getting the return code from a process started as a job is problematic.

This script will be used to submit SAS programs both interactively (from the command line) as well as via a scheduler. For the scheduler, it is critical that the return code from SAS be propagated back to the calling process. The calling code does a $Host.SetShouldExit($LastExitCode), which seems to be the magic incantation to get error code propagation to work.

Thanks in advance for any tips to tighten up or otherwise improve this code.

Regards,
Scott
by DonJ at 2012-09-22 07:12:04
With this must work, add [CmdletBinding())] just prior to the Param keyword. That’ll turn on -WhatIf and -Confirm - you shouldn’t code those manually as you’ve done. You also get all the common parameters for free.

-WhatIf/-Confirm then get coded as:

if ($pscmdlet.shouldprocess($target) {
# do your dangerous thing here
}


$target should represent what you’re doing; the name of your script will describe its action - meaning it should have a cmdlet-style name, like "Do-Something."

I’d generally set up defaults in one step, rather than creating an empty variable and calling methods:

$defaults = {‘key’=‘value’;
‘otherkey’=‘othervalue’}

Just reads less C#-ish.

I’m a bit weirded out by what looks like a reliance on variables created by a calling scope - $saslev, for example. It’s a poor programming practice. Any data your script needs ought to be passed in via a parameter - it’s much cleaner to set default values on them, for example. If this is just something you can’t control, then it is what it is, but the overall script would be cleaner if it didn’t have that external dependency. This is really an overall comment - the script is, for me, harder to read because each function doesn’t define parameters for all of the input they rely upon.

I usually use Write-Warning or Write-Error instead of Throw, but it’s a personal thing. I spend a lot of time teaching folks who are a bit reluctant to become programmers, and Throw is more programmer-y. You’re clearly a bit more comfortable programming, which is why the keyword exists ;).

Print-Vars should be Show-Vars or Get-Vars. Ditto with the other functions; verbs should always come from the list at http://msdn.microsoft.com/en-us/library … 28(v=vs.85).aspx. "Print" also implies "piece of paper," which this doesn’t do.

June will get upset if I don’t mention how nice comment-based help would be ;).
by scottbass at 2012-09-22 18:46:56
Thanks for the input. Further comments:

[code2=powershell]if ($pscmdlet.shouldprocess($target) {
# do your dangerous thing here
}[/code2]

Sorry for being dense: what does "…should represent what you’re doing" mean? Should this be:

[code2=powershell]if ($pscmdlet.shouldprocess("Invoke-SAS") {
# do your dangerous thing here
}[/code2]

or

[code2=powershell]if ($pscmdlet.shouldprocess(???) {
Invoke-SAS
}[/code2]

(I’ll Google this as well, or perhaps I should just re-visit your book)

The Test-Path variable… {$variable = $null} construct I got from a Google post to deal with undefined variable references when Set-StrictMode is set.

I admit referencing variables from the calling environment is not typical, but here are the design reasons: this will be an "uber-script" used for both scheduled jobs (we’re evaluating JAMS, BTW) and interactive processing (from the console window). The only required parameter is sysin, which can be either a command line parm, read from the pipeline, or cooler could be a control file (gc control_file.txt | sas.ps1 or sas.ps1 -sysin (gc control_file.txt) ). All other parameters have the defaults for production processing.

The precedence of parameter setting is:

* If a reserved variable has been set in the calling environment, use it.
* If a parameter was specified on the command line, use it (either setting the parameter or overriding an environment variable).
* Otherwise, set default values for the parameters.

During SAS program development, it is typical to 1) set the SAS level to Lev2, which sets the SAS internal environment, and 2) override the default timestamped log - we don’t want hundreds of logs, plus we can just have the log open in Textpad and re-read the file after each run. Setting the variables in the console window ensures that I won’t accidentally run the program in the production environment, if I have a brain freeze later and forget to put the right options on the command line. I’m happy to consider a better approach, but those are my business requirements.

It sounds like it’s 50/50 whether to use Write-Warning or Write-Error instead of Throw. All my Throws would be fatal errors, and I guess I like the default behavior of Throw printing out the line number and statement that caused the error. Unless there’s a compelling reason (and I’m quite happy to hear them), I’ll leave this part of the script as is.

Further development includes multi-threaded processing by optionally specifying the -wait option on Start-Process. I think I can also add multi-threaded processing but also include dependencies by using a control file. My vision is something like this:

control_file.txt:

C:\Temp\saspgm3.sas, C:\Temp\saspgm1.sas, C:\Temp\saspgm4.sas, C:\Temp\saspgm2.sas (files must be separated by commas)
C:\Temp\saspgmA.sas
C:\Temp\saspgm5.sas, C:\Temp\saspgm7.sas, C:\Temp\saspgm6.sas
C:\Temp\saspgmB.sas, C:\Temp\saspgmC.sas, C:\Temp\saspgmD.sas

If I can work out how to have one file per line, with comma/no comma marking the end of a group, slurp this file into memory, then processing it as below, that would be better. IOW, the file would look like:

C:\Temp\saspgm3.sas,
C:\Temp\saspgm1.sas,
C:\Temp\saspgm4.sas,
C:\Temp\saspgm2.sas
C:\Temp\saspgmA.sas
C:\Temp\saspgm5.sas
, C:\Temp\saspgm7.sas (another approach)
, C:\Temp\saspgm6.sas
C:\Temp\saspgmB.sas,
C:\Temp\saspgmC.sas
, C:\Temp\saspgmD.sas

I’d prefer this approach, but have to figure out how to process them in groups.

Given the first version of the file, I think (???) with the right looping constructs, it would be relatively easy to code this as:

* read a line from the control file
* $files = gci $, where $ is the comma separated list of files (and could contain wildcards if the file naming convention allowed this)
* loop over the list of files
* start-process without the -wait option
* wait for that list of files to run
* capture the return codes from all those runs (very important)
* if all ran ok, rinse and repeat

So, the above would run as:

* Run saspgm3, saspgm1, saspgm4, saspgm2 in parallel, with the script waiting for all to complete
* Run saspgmA if above programs ran successfully, with the script waiting for all to complete
* Run saspgm5, saspgm7, saspgm6 in parallel if above program ran successfully, with the script waiting for all to complete
* Etc.

I also want to add an -abort (or -continue) switch to control whether the flow aborts on an error, or runs all programs regardless of upstream error. This would just depend on how critical an upstream error is to downstream processing.

I fully intend to create voluminous comment-based help…once development is finished :wink: So tell June to be patient :smiley:

Thanks again…
by DonJ at 2012-09-23 07:01:46
This is where http://manning.com/jones4 would be handy for ya ;).

The "target" of ShouldProcess isn’t the action you’re taking, it’s the target of that action. The file you’re deleting, the computer you’re rebooting, whatever. Tells the user what the script is about to modify.