Dynamically Building the Where conditions

by adegutis at 2013-04-03 09:35:04

I am trying to build a Where condition using parameters passed to the script.

[string]$StartDate = $args[0]
[string]$EndDate = $args[1]
for ($cnt = 2; $cnt -le $args.count; $cnt++) {
$Phonenumber = $Phonenumber + $args[$cnt]
}

$PhonenumberSearchString = ' { $.substring(0,8) -ge "' + $StartDate +'" -and $ -le "' +$EndDate +'" -and ($_ -like "' + $Phonenumber[1] + '"'
for ($cnt = 0; $cnt -le $Phonenumber.count-2; $cnt++) {
write-host 'Phone Number:' $Phonenumber[$cnt]
add-content $Logfile "Phone Number: $Phonenumber $Phonenumber[$cnt]"
$PhonenumberSearchString = $PhonenumberSearchString + ' -or $_ -like "' + $Phonenumber[$cnt] + '"'
}
$PhonenumberSearchString = $PhonenumberSearchString + ')}'

get-content $File | where { Invoke-Expression -Command $PhonenumberSearchString } | out-file -append $Outputfile


When it runs the Where condition is being ignored so the output is everything from the source file.
by ArtB0514 at 2013-04-03 10:43:40
Try removing the quotes from the $phonenumbersearchstring definition and fully defining it within the for loop and deleting the Invoke-Expression from the Where clause.
Not tested, so more debugging is going to be needed:

[string]$StartDate = $args[0]
[string]$EndDate = $args[1]
for ($cnt = 2; $cnt -le $args.count; $cnt++) {$Phonenumber = $Phonenumber + $args[$cnt]}
for ($cnt = 0; $cnt -le $Phonenumber.count-2; $cnt++) {
write-host 'Phone Number:' $Phonenumber[$cnt]
add-content $Logfile "Phone Number: $Phonenumber $Phonenumber[$cnt]"
$PhonenumberSearchString = {$.substring(0,8) -ge $StartDate -and $ -le $EndDate -and ($_ -like "$Phonenumber[1]" -or $_ -like "$Phonenumber[$cnt]")}
get-content $File | where $PhonenumberSearchString | out-file -append $Outputfile
}
by adegutis at 2013-04-03 11:26:06
Thanks for your input. I did not try your suggestions but before I used the Invoke-Command with the Where condition (the way you are suggesting) I was getting this error:

Where-Object : Cannot bind parameter 'FilterScript'. Cannot convert the " { $.substring(0,8) -ge "20080101" -and $ -l
e "20101231" -and ($_ -like "0001234567" -or $_ -like "1001234567" -or $_ -like "2001234567" -or $_ -like "0000
88879
" -or $_ -like "2001234567" -or $_ -like "3001234567" -or $_ -like "4001234567" -or $_ -like "0001234567"
)}" value of type "System.String" to type "System.Management.Automation.ScriptBlock".
At O:\archive-extract-z.ps1:265 char:45
+ get-content $File | where $PhonenumberSearchString | out-file -append $Output
file
+ CategoryInfo : InvalidArgument: (:slight_smile: [Where-Object], ParameterBindingException
+ FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.WhereObjectCommand


It is my understanding that you cannot pass a string as part of a command for security reasons, so that arbitrary code cannot be executed.

Also, the way I am reading your for loop, it would make multiple passes, one for each parameter passed, which will be too slow since this are 1 TB source files, hence the goals of reading the file once.
by ArtB0514 at 2013-04-03 12:48:34
Notice the error message:
Cannot convert the "very-long-string" value of type "System.String" to type "System.Management.Automation.ScriptBlock".
You need to remove the quote marks and make a scriptblock to perform the test.

You also need to enclose items like $Phonenumber[1] in an evaluation block "$($Phonenumber[1])" when you want them evaluated inside a string.

Also, you should probably look for a better way to form your filter. For example, trying to filter on dates as strings may not give you the results you want. And long groups of -OR tests can ususally be made shorter by using regular expressions and -MATCH and more readable by careful use of parentheses.
by nohandle at 2013-04-04 01:29:10
Hi, I am trying to get the notion of what your code is supposed to do. But when I pass 10 10 10 10 as argumets I get 20 in the phone number variable after the first for and then it dies on me with Unable to index into an object of type System.Int32. So I guess I am using it wrong. Giving the script a proper named parameters would greatly improve the usage for people who does not have any clue how to use it.

Here is an example of what interface I assume would fit the function:
param (
[parameter(Mandatory=$true, Position=0, valueFromPipeline = $false)]
[datetime]$StartDate,
[parameter(Mandatory=$true, Position=1, valueFromPipeline = $false)]
[datetime]$EndDate,
[parameter(Mandatory=$true, valueFromPipeline = $false, ValueFromRemainingArguments= $true)]
[string]$PhoneNumbers
)

"StartDate: $StartDate"
"EndDate: $EndDate"
"PhoneNumbsers $($PhoneNumbers -join ', ')"

Executed with: 2.2.2012 3.3.2013 123456 123456 123465 as arguments
StartDate: 02/02/2012 00:00:00
EndDate: 03/03/2013 00:00:00
PhoneNumbsers 123456, 123456, 123465


[quote="ArtB0514"]Also, you should probably look for a better way to form your filter. For example, trying to filter on dates as strings may not give you the results you want. And long groups of -OR tests can ususally be made shorter by using regular expressions and -MATCH and more readable by careful use of parentheses.[/quote]
Totally agree. :slight_smile:
by adegutis at 2013-04-04 05:49:05
The data being read (the source file) is just a simple ASCII text file, so there really aren’t any dates. I am using the string formatted date to try and quickly eliminate rows/line of text to check.

I was not familiar with the ValueFromRemainingArguments= $true option in the parameter section, so I may go back to using parameters again.

Regular expressions is one of my weaknesses but I will look into using them for with -MATCH.

I am uncertain on how to remove the quote marks and make a scriptblock. I will research this and give it a try once I wrap my brain around how to do that within the loop of the arguments or parameters.

Thanks for this feedback.
by nohandle at 2013-04-04 06:00:20
[quote="adegutis"]The data being read (the source file) is just a simple ASCII text file, so there really aren’t any dates. I am using the string formatted date to try and quickly eliminate rows/line of text to check[/quote]
Ok, as I said I was just guessing what the interface might be. The main point is: use the named parameters to avoid $args.

[quote="adegutis"]Regular expressions is one of my weaknesses but I will look into using them for with -MATCH.[/quote]
I am still unsure what your script is supposed to do, if you would give example of input data, example of output and description in plain on what happens in between then I can most likely help you create one or propose better way.
by adegutis at 2013-04-04 07:14:01
I am given date ranges and phone numbers to match inside text files. There may be anywhere from 1 to many phone numbers (a recent request had 22 phone numbers).

Here’s a mock of the data with the date always being the 1st "column" and the phone numbers in the 3rd and 4th columns (caller and recipient, respectively). In some of the archive files the phone numbers may not be the 3rd and 4th columns but in a different position.

20080101 0000 0001234567 1201234567 lots of other text follows on this line…
20090211 0000 8001234567 8101234567 lots of other text follows on this line…
20100408 0000 3001234567 1201234567 lots of other text follows on this line…
20101121 0000 4001234567 4111234567 lots of other text follows on this line…

These files are large, being between 700 MB and 1 GB.

My script was originally written to search for one phone number in the date range provide. Now I am trying to modify it to dynamically grab all number of phone numbers to search. Since these files are large, I am trying to check each line in one pass to see if the date falls inside the date range provided and if the phone number exists in that line.

Hope this helps.
by nohandle at 2013-04-04 07:34:59
So basically you have a file of data ordered by date. From that file you grab a sub-section limited by the start and end date (is the timespan more likely to be long or short?).
In this subsection of data you are trying to look up one or more numbers.

Is it important to know which number was found on the line, or just outputting the line on match is enough?
by adegutis at 2013-04-04 07:39:18
The timespan can be long. I’ve had requests for one month and others for 3 years.

It is not important to know which phone number was found on the line. Just a new output file containing all the lines that contained any of the phone numbers in the date range.
by nohandle at 2013-04-04 08:29:08
So, assuming the data I deliver are already the subsection of the file: is this enough to provide the core filtering functionality?
I am not taking any special care of the date and 0000, it won’t match because the group length is different.
$data = "20080101 0000 0001234567 1201234567
20090211 0000 8001234567 8101234567
20100408 0000 3001234567 1201234567
20101121 0000 4001234567 4111234567" -split "`n"


$Lookup = "1201234567","3001234567","8888888888"
#| is or in regular expressions
$pattern = $Lookup -join "|"

$data | Select-String $pattern

20080101 0000 0001234567 1201234567
20100408 0000 3001234567 1201234567