Breaking text file into separate arrays

by AevnsGrandpa at 2013-04-12 11:59:23

I know this is out there somewhere and after search on a few things and scrolling through the first 5 pages of items on the forum I gave up and am just posting.

I have a text file with multiple lines. Each line is laid out like this;
Company ActivationName ActivatedBy Duration

There are a couple of things I want to do with the data but I am getting stumped at the first point. After I read in the text file with a Get-Content and assign it to a variable how do I within a ForEach statement break up each line and assign it to its own variable which is an array. I have the following;


$raw = Get-Content("c:\dialer items\sqlreports\test.txt")
foreach ($activation in $raw){
$company = $activation.Substring(0,50)
$activationname = $activation.Substring(51,150)
$activatedby = $activation.Substring(202,50)
$duration = $activation.Substring(253,10)
}


This works but the ForEach keeps overwriting the $company, $activationname, etc. variables till they just contain the last element in the $raw array. How do I make the variables in the foreach that I am using the substring on, an array in of of itself?

Once I do this then I need to sort out each unique string in $company and total up the $duration for each of the unique $companies. Make sense? Then I will output it all again into something I can print, probably a HTML file.

Thanks, and maybe byt the time I get a reply I’ll have figured it out. (That’s the way it usually goes)

Jeff
by mjolinor at 2013-04-12 12:18:56
There’s lots of ways do do this.

I’m kind of partial to this:
http://gallery.technet.microsoft.com/scriptcenter/New-PSObjectFromMatches-87d8ce87

Which will let you do this:

$regex = '(.{50})(.{150})(.{50})(.{10})'

Get-Content "c:\dialer items\sqlreports\test.txt" |
New-PSObjectFromMatches -Pattern $regex -Property $null,Company,ActivationName,ActivatedBy,Duration
by AevnsGrandpa at 2013-04-12 12:34:58
Wow I didn’t think about creating new objects. I’ll try this out…

Where can I find the help for what the various formatting symbols and positions mean in the $regex variable? I have seen some strange looking this that others have done.

Also why the $null variable at the begining of the property list?

JEff
by mjolinor at 2013-04-12 12:43:20
get-help about_regular_expressions

‘(.{50})(.{150})(.{50})(.{10})’

says:
Group 1 is the first 50 characters
Group 2 is the next 150 characters
Group 3 is the next 50 characters
Group 4 is the next 10 characters.

Group 0 will always be the entire match.

New-PSObjectFromMatches will create an object from the groups, assigning property names specified in -Property,
so :

$null,Company,ActivationName,ActivatedBy,Duration

will ignore group 0 then make groups 1-4 Company, ActivationName, ActivatedBy, and Duration respectively.
by AevnsGrandpa at 2013-04-12 12:43:20
So I downloaded that ps1 file of this new cmdlet. Do I run it ahead of needing to use it? Also don’t I need to run the Get-Content into a variable? And what does the new cmdlet give me? Are the properties at the end the new objects thata re created and thus hold the matches from the source text file.

This looks a bit above my head…!

Jeff
by mjolinor at 2013-04-12 12:48:12
You do need to run it ahead of using it (it’s just a function). You can dot souce it from the file, or you can paste it into your profile, and then it will automatically get loaded whenever you start a powershell session. If you’re using the ISE, it has it’s own profile and you’ll need to paste it in there, too.

to dot source a script file :
. c:\scripts\new-psobjectfrommatches.ps1
by AevnsGrandpa at 2013-04-12 13:25:20
Thanks and I’ll get into it probably on Monday, about time to go and head home. Have a good weekend.
by RandomAdmin at 2013-04-14 11:31:39
Would this work?

$raw = Get-Content("c:\dialer items\sqlreports\test.txt")
foreach ($activation in $raw){
$company = $company + $activation.Substring(0,50)
$activationname = $activationname + $activation.Substring(51,150)
$activatedby = $activatedby + $activation.Substring(202,50)
$duration = $duration + $activation.Substring(253,10)
by AevnsGrandpa at 2013-04-15 08:00:49
Now that is what I think would be better and much simpler. I see, keep ading in the next substring. I’ll try that here in a few minutes. Should have thought of that one.

Jeff
by AevnsGrandpa at 2013-04-15 08:10:48
I tried it but it just gives me a real long string that I still have to break up into the individual parts. What I need to do is take all the individual substrings in a given position into a new array. Then I can more easily do some grouping and sorting.

Thanks for the help though.

Jeff
by Nobody at 2013-04-15 08:30:06
Can you post a sample of the input data? do the fields contain spaces? Is each field really a static number of characters?
by AevnsGrandpa at 2013-04-15 10:53:22
Here is part of the test file I am using;

CompanyName ActivationName ActivatedByName Duration in Seconds
-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- -------------------
Alpha P&C Test Delivery NXT Activation PCTestDelivery 16
Alpha P&C Test Delivery NXT Activation PCTestDelivery 10
Associate Mid-Atlantic Test - Swope Scheduled Activation 60
Associate ST - Ed and Scott Test CVZR 32
Charlie EN4 Thursday HMD3 102
Charlie EN4 Thursday HMD3 110

Sorry this is so huge, the field size I cannot change. So as you can see I will have a variety of different ‘Company’ names. What I am wanting to do is take all the durations from a certain company name and do a sum of them. I figured if I could break the text file into 4 arrays all with the same indexes when I could do a foreach and add the durations till the company changes and go from there. I don’t have all the logic done as I cannot get it all separated out the way I think I need to. Thanks for the help.

Jeff
by AevnsGrandpa at 2013-04-15 10:54:57
Just as a note, when I copied and pasted the data into the quick reply field it had all the spacing in it. Then after it posted it removed all the space so pretend the fields are much bigger but I guess that really doesn’t matter anyway. The dashed lines show the actual length of each field.

So to better explain for example
Company ActivationName ActivatedBy DurationInSeconds
Charlie EN4-Thursday HMD3 10

The spacing is strange you might not know what goes in each field.
by mjolinor at 2013-04-15 11:19:13
[quote]What I am wanting to do is take all the durations from a certain company name and do a sum of them.[/quote]

It sounds like this would be much easier to manage if you were creating objects from the data. Then you can filter, sort, sum etc. on the object properties.

Did you get the new-psobjectfrommatches function to work?
by Nobody at 2013-04-15 13:15:55
To roll with what you started in your first post, I think you can do this.


$raw = get-content -path "testdata.txt"
$myData = @()

foreach ($activation in $raw){
$myData += New-Object psobject -property @{
company = $activation.Substring(0,50)
activationname = $activation.Substring(51,150)
activatedby = $activation.Substring(202,50)
duration = $activation.Substring(253,10)
}
}

$myData|fl company, activationname, activatedby, duration


mjolinor, would the new-psobjectfrommatches function be more effecient on a large number of records?
by mjolinor at 2013-04-15 13:27:01
[quote]mjolinor, would the new-psobjectfrommatches function be more effecient on a large number of records?[/quote]

Probably not. It is a lot easier to code with - you just need the regex and the list of property names you want to use. There’s some built-in debug to show you what’s being matched to help sort out the regex if you need it.
by scottbass at 2013-04-18 21:06:35
[quote="AevnsGrandpa"]I know this is out there somewhere and after search on a few things and scrolling through the first 5 pages of items on the forum I gave up and am just posting.

I have a text file with multiple lines. Each line is laid out like this;
Company ActivationName ActivatedBy Duration

There are a couple of things I want to do with the data but I am getting stumped at the first point. After I read in the text file with a Get-Content and assign it to a variable how do I within a ForEach statement break up each line and assign it to its own variable which is an array. I have the following;


$raw = Get-Content("c:\dialer items\sqlreports\test.txt")
foreach ($activation in $raw){
$company = $activation.Substring(0,50)
$activationname = $activation.Substring(51,150)
$activatedby = $activation.Substring(202,50)
$duration = $activation.Substring(253,10)
}


This works but the ForEach keeps overwriting the $company, $activationname, etc. variables till they just contain the last element in the $raw array. How do I make the variables in the foreach that I am using the substring on, an array in of of itself?

Once I do this then I need to sort out each unique string in $company and total up the $duration for each of the unique $companies. Make sense? Then I will output it all again into something I can print, probably a HTML file.

Thanks, and maybe byt the time I get a reply I’ll have figured it out. (That’s the way it usually goes)

Jeff[/quote]

Hi,

You don’t really say how you want to access this information after you’ve read it in, so we can only guess the best approach.

You can use custom objects, which are pretty cool, but perhaps a hash would also meet your needs?

# simulate reading in your data
$raw =
"A B C D",
"E F G H",
"I J K L",
"M N O P",
"Q R S T"

# create an array to contain the results
$results = @()

# process each row
$regex = '(.{1}) (.{1}) (.{1}) (.{1})'
foreach ($activation in $raw){
# new hash
$hash=@{}
$activation -match $regex | Out-Null
$hash.Add("Company", $matches[1])
$hash.Add("ActivationName",$matches[2])
$hash.Add("ActivatedBy", $matches[3])
$hash.Add("Duration", $matches[4])
$results+=$hash
}

# display the results
foreach ($activation in $results) {
$activation | Format-Table -AutoSize
}
by scottbass at 2013-04-18 21:19:16
These links may also help:
http://technet.microsoft.com/en-us/libr … 76900.aspx
http://blogs.msdn.com/b/powershell/arch … otals.aspx
http://blogs.msdn.com/b/powershell/arch … bject.aspx
by AevnsGrandpa at 2013-04-19 08:15:40
Thanks guys. Will try both approches. The output will go to a CovertTo-HTML as a franment and then put together with all the company results to a final HTML report.

Jeff
by AevnsGrandpa at 2013-04-19 08:28:20
They both do what I believe will help me then sort by 1 property and count another.

Interestingly enough, both also show nothing in the Duration field or property when ran… I have verified the position in the line when it is read…Would it be read in as a INT variable or a string? I want it as a number so I can perform math functions on it.

Again thanks, I am learning and not real up on hash tables and creating a new object with the perperties I want…Cool though.

Jeff
by scottbass at 2013-04-19 15:47:38
Hi,

Let’s see if I understand your process correctly:

* You have a text file with 4 column delimited fields. IOW, each field begins in a particular column, with spaces to pad data into the correct column
* Using Powershell, you want to read that text file into 4 separate variables
* Each line in the file becomes a record in some type of collection (array, hash, dictionary, custom object, etc).
* The first three fields are character, the fourth is numeric (integer)
* You want to summarize the fourth field (duration in seconds) by [what? CompanyName? All three fields?]
* The summarization will collapse your data based on your grouping variables
* You want to produce a final report in HTML

In SQL, you would state this such as:

select CompanyName, sum(DurationInSeconds) as SumDurationInSeconds from whatever group by CompanyName

-or-

select CompanyName, ActivationName, ActivatedByName, sum(DurationInSeconds) as SumDurationInSeconds from whatever group by CompanyName, ActivationName, ActivatedByName

Is this a correct definition of your problem statement?

Scott
by AevnsGrandpa at 2013-04-22 07:32:25
Yes Scott that is fairly close. I took one of the suggestions and then took the $mydata array into a for next loop and used the switch statement on the company name to separate out the summing up of the durations for each company. Now I have a new variable, one for each company that hold the sum of the durations for that company. Now I am working on getting it out of powershell to something (HTML or whatever) I can create reports for management for. So right now I am ok but will post back if I get stuck or need help.

Jeff
by Nobody at 2013-04-22 08:49:17
This is something I was playing around with previously.


$raw = get-content -path "testdata.txt"
$myData = @()

foreach ($activation in $raw){
$myData += New-Object -type psobject -property @{
company = $activation.Substring(0,50)
activationname = $activation.Substring(51,150)
activatedby = $activation.Substring(202,50)
duration = $activation.Substring(253,10)
}
}


$grouped = $myData|group-object -property company
$newobject = @()
foreach ($company in $grouped){
$newObject += New-Object -type psobject -property @{
company = "$($company.name)"
count = "$($company.count)"
durationTotal = $company.group|measure-object -property duration -sum|%{$_.sum}
}

}
#$newObject|ft count, company, durationTotal -auto

#convertto-html
$newObject|convertto-html -property count, company, durationTotal -title "Duration totals by company" > DurationTotalsReport.html
by AevnsGrandpa at 2013-04-22 10:26:25
Thanks, I actually have it working and decided that since I needed to be able to bring it back into Excel to create charts I just create a text file and use Add-content to put the company name and duration total separated by a comma through another for next loop.

Love doing this stiff, really help me to learn Power shell.

Jeff
by Nobody at 2013-04-22 10:57:51
It might be worth mentioning that you can import log file directly into excel and set fixed width columns. Not nearly as fun as messing with powershell, but would be a quick and easy solution.
by AevnsGrandpa at 2013-04-23 11:58:01
I was doing that but it was more of a pain sorting by the one column and then putting the SUM function in to cover just the individual duration amounts. And yes this is more fun.

Jeff
by eisenbergz at 2013-04-24 00:08:32
What does the final working code look like, if it is ok to show it?