by Matt Hitchcock at 2013-04-03 19:28:19
Hiby mjolinor at 2013-04-03 20:41:24
I’m having some difficulty with manipulating some information into a text file. Basically, I receive some file in one format that I need to evaluate and convert into a different format through an automated script. The CSV file starts out as:
EmployeeID,AD Account UPN,Firstname,Lastname
001,mh@domain1.com,Matt,Hitchcock
001,matty@domain2.com,Matt,Hitchcock
001,matthew@domain3.com,Matt,Hitchcock
001,hitch@domain4.com,Matt,Hitchcock
001,hitch@domain5.com,Matt,Hitchcock
EmployeeID is not a unique value, basically this is for a migration project where users have accounts in different domains. I need to re-work this into a format that Quest Quick Connect can use to stamp employee numbers onto the accounts in different Forests, so I end up with:
EmployeeID,Domain1UPN,Domain2UPN,Domain3UPN,Firstname,Lastname,AllOtherAccounts
001,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"
The AllAccounts column exists because all accounts not matched to one of the domains we are looking for is populated into a multi-valued attribute for reference.
The issue I am having is that I while my script works fine to import the file, build an object for each employee, identify what we want and put the object into an array to output to a CSV at the end, the import file I receive is now 111,000 lines long as is taking an immense amount of time to process.
My logic so far is (roughly explained) as follows:
1. Import the CSV File into an array $Records
2. Loop through the Array and for each employee number not in the arry $UniqueEmployeeIDs, add it (so we get a count of unique employees)
3. For each $UniqueEmployeeID, go through the $Records array and where an object has the same employee ID, pull it into the $CurrentEmployee array to work with it
4. Build a new Object for the User and create attributes with values i.e. Name = Domain1UPN, Value = mh@domain1.com. etc.
5. Add the new Object to the $ProcessedObjects array and move to the next $UniqueEmployeeID and repeat
6. Output $ProcessedObjects into a CSV
Step 3 is where I hit the problem, I am using this on what is now a 111,000 array! (it was fine when it was smaller!):
[Array]$CurrentEmployee = $Records | Where-Object {$.EmployeeID -like $UniqueEmployeeID}
I have the $UniqueEmployeeIDs array sorted into Numerical order. Is there a way that I can find the first matching object in the array, when I hit a match, start adding the objects to the $CurrentEmployee array then once I hit an object that does not match i.e. the employeeID is greater, stop processing and move onto the next $UniqueEmployeeID? Rather than evaluating every object in the array?
It’s difficult to share the script as it has a lot of environment specific information which I can’t really post online, I am working to change it all so I can post it, but in the meantime I guess I wanted to ask is this possible? To go through an array until you match an object then do something until they objects stop matching and move onto the next? I am thinking Do While and Do Until’s would make this possible but just can’t see the code in my head!
I am thinking something like the following but am yet to try:
foreach ($i in $Records) {
while ($i.EmployeeID -notlike $Employee.EmployeeID) {
Continue
Do Until ($i.EmployeeID -like $Employee.EmployeeID) {
$CurrentEmployee += $i
Do Until ($i.EmployeeID -notlike $Employee.ID)
}
}
}
Thanks
I’d be tempted to forget the import-csv and just deal with it as text lines, splitting the data fields out of the text.by Matt Hitchcock at 2013-04-03 22:46:31
If it’s sorted data, you should be able to trigger the output of a completed object and starting the creation of a new one based on the EmployeeID changing from the last line that was read.
Thanks for the reply. Not importing the CSV sounds better but I don’t completely follow what you mean, how would I do what you’re suggesting?by nohandle at 2013-04-04 03:08:05
[quote="Matt Hitchcock"]Not importing the CSV sounds better but I don’t completely follow what you mean, how would I do what you’re suggesting?[/quote]by nohandle at 2013-04-04 03:18:08
I suppose this is what he means:
If the data are already sorted by unique identifier, you are just a step away from grouping the data to create bulks from which you can generate the result -> effectively skipping the "get me list of unique employee ids and then create arrays of user data by looking them up in the huge table". (in this case the best candidate for uniques identifier is obviously the employee ID).
In attempt to implement this solution I came up with this pipeline, where i dumbly follow the structure of the data (rely on the data being properly sorted). It processes 150000 lines in about 25 seconds$delimiter = ";"
Get-Content .\book1.csv | foreach -Begin {
$skipHeader = $true
$userIdLength = 3
$userData = @()
} -Process {
#using first to skip header
if (-not ($skipHeader))
{
$lineCurrent = $
#to avoid as much searching as possible I rely highly on the structure of the data
$userIdCurrent = $lineCurrent.Substring(0,$userIdLength)
#create groups based on user id
if ($userIdCurrent -eq $userIdPrevious)
{
$userData += $lineCurrent
}
else
{
#output the group as array
if ($userData) {,$userData}
$userData = @()
$userIdPrevious = $userIdCurrent
$userData += $lineCurrent
}
}
else
{
$skipHeader = -not $skipHeader
}
}-End {
#make sure you output the last group
,$userData
} |
foreach {
$groupCurrent = $
$EmployeeId,$Domain1UPN,$Firstname,$Lastname = $groupCurrent[0] -split $delimiter,$null,'simplematch'
$Domain2UPN = ($groupCurrent[1] -split $delimiter,3,'simplematch')[1]
$Domain3UPN = ($groupCurrent[2] -split $delimiter,3,'simplematch')[1]
#process rest of the groups to all other accounts
$groupCurrentCount = $groupCurrent.Count
if ($groupCurrentCount -gt 3)
{
$allOtherAccountsTemp = @()
for ($index = 3; $index -lt $groupCurrentCount; $index++)
{
$allOtherAccountsTemp += ($groupCurrent[$index] -split $delimiter,3,'simplematch')[1]
}
$AllOtherAccounts = $allOtherAccountsTemp -join ','
}
"$EmployeeID,$Domain1UPN,$Domain2UPN,$Domain3UPN,$Firstname,$Lastname,"$AllOtherAccounts
""
$EmployeeId = $Domain1UPN = $Domain2UPN = $Domain3UPN = $Firstname =$Lastname = $AllOtherAccounts = $allOtherAccountsTemp =@()
}
Of course more logic can be placed in the second part (second foreach) if needed.
note: When I tested the runtime I piped the output to out-Null.
This is my input:by mjolinor at 2013-04-04 04:15:13PS C:\users\ja07602\desktop> Get-Content .\book1.csv
EmployeeID;AD Account UPN;Firstname;Lastname
001;mh@domain1.com;Matt;Hitchcock
001;matty@domain2.com;Matt;Hitchcock
001;matthew@domain3.com;Matt;Hitchcock
001;hitch@domain4.com;Matt;Hitchcock
001;hitch@domain5.com;Matt;Hitchcock
002;mh@domain1.com;Matt;Hitchcock
002;matty@domain2.com;Matt;Hitchcock
002;matthew@domain3.com;Matt;Hitchcock
002;hitch@domain4.com;Matt;Hitchcock
002;hitch@domain5.com;Matt;Hitchcock
003;mh@domain1.com;Matt;Hitchcock
003;matty@domain2.com;Matt;Hitchcock
003;matthew@domain3.com;Matt;Hitchcock
003;hitch@domain4.com;Matt;Hitchcock
003;hitch@domain5.com;Matt;Hitchcock
This is my output:001,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"
002,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"
003,mh@domain1.com,matty@domain2.com,matthew@domain3.com,Matt,Hitchcock,"hitch@domain4.com,hitch@domain5.com"
Another possible method - not tested but should work.by nohandle at 2013-04-04 05:03:14$inputfile = <path to input file>
$outputfile = <path to output file>
$UserObj = $null
$NewUser = # script block to clean up and output user object and set up for next user
{
if ($UserObj){
$UserObj.AllOtherAccounts = $UserObj.AllOtherAccounts -replace ',$',''
$UserObj
}
$UserObj =
[PSCustomObject]@{
EmployeeID = $null
Domain1UPN = $null
Domain2UPN = $null
Domain3UPN = $null
Firstname = $null
Lastname = $null
AllOtherAccounts = $null
}
$DomainCount = 1
}
&{
Get-Content $inputfile |
select -skip 1 |
foreach {
$parts = $.split(',')
if ($parts[0] -ne $UserObj.EmployeeID) {.$NewUser}
Switch ($DomainCount)
{
1 {
$UserObj.EmployeeID = $parts[0]
$UserObj.Domain1UPN = $parts[1]
$UserObj.Firstname = $parts[2]
$UserObj.Lastname = $parts[3]
$DomainCount++
Break
}
2 {
$UserObj.Domain2UPN = $parts[1]
$DomainCount++
Break
}
3 {
$UserObj.Domain3UPN = $parts[1]
$DomainCount++
Break
}
4 {
$UserObj.AllOtherAccounts += $parts[1] + ','
}
}
}
.$NewUser # output last object
} | export-csv $outputfile -notype
Looks very elegant.by mjolinor at 2013-04-04 06:12:42
but it does not work for me.
Stepping through the code, unfortunately I get several error.
Error regarding the new user label.
After removing it, I get few errors about calling method on null-values expression originating from the newuser scriptblock.
This does not look right.
$UserObj.Domain1UPN = $patrs[1]
$UserObj.Domain1UPN = $parts[2]
and this in what I get in the output csv:False,"False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Object","7"
also the object content does not look right but maybe I just broke your code:Name Value
---- -----
AllOtherAccounts
Firstname Hitchcock
Domain1UPN Matt
Lastname
Domain2UPN Matt
Domain3UPN Matt
EmployeeID 001
[quote]by nohandle at 2013-04-04 06:54:38
Looks very elegant.
[/quote]
Thatnks. That’s what I was going for.
You’re right, that Domain1UPN is busted.
There may be some more detail bugs besides that. As I said, I didn’t have time to test that yet, but wanted something the OP could easily follow and debug/modify.
I’ve got another script I need to go work on (one of those ASAP deals), but I patched the obvious bugs. I’ll come back and do some testing later if there’s still issues.
[quote="mjolinor"]$UserObj.Domain1UPN = $patrs[1][/quote]by mjolinor at 2013-04-04 09:30:24
Still a typo there.
OK. Finally got a chance to get back to this.by nohandle at 2013-04-04 09:49:19
Should work better now.
Gave it quick test and the output is still like this:by mjolinor at 2013-04-04 10:00:20"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+V
alueCollection","System.Object","7"
I’m using this for inputdata:by nohandle at 2013-04-04 10:06:10
EmployeeID,AD Account UPN,Firstname,Lastname
001,mh@domain1.com,Matt,Hitchcock
001,matty@domain2.com,Matt,Hitchcock
001,matthew@domain3.com,Matt,Hitchcock
001,hitch@domain4.com,Matt,Hitchcock
001,hitch@domain5.com,Matt,Hitchcock
002,mh@domain1.com,Matt,Hitchcock
002,matty@domain2.com,Matt,Hitchcock
002,matthew@domain3.com,Matt,Hitchcock
002,hitch@domain4.com,Matt,Hitchcock
002,hitch@domain5.com,Matt,Hitchcock
003,mh@domain1.com,Matt,Hitchcock
003,matty@domain2.com,Matt,Hitchcock
003,matthew@domain3.com,Matt,Hitchcock
003,hitch@domain4.com,Matt,Hitchcock
003,hitch@domain5.com,Matt,Hitchcock
And getting this result:
EmployeeID : 001
Domain1UPN : mh@domain1.com
Domain2UPN : matty@domain2.com
Domain3UPN : matthew@domain3.com
Firstname : Matt
Lastname : Hitchcock
AllOtherAccounts : hitch@domain4.com,hitch@domain5.com
EmployeeID : 002
Domain1UPN : mh@domain1.com
Domain2UPN : matty@domain2.com
Domain3UPN : matthew@domain3.com
Firstname : Matt
Lastname : Hitchcock
AllOtherAccounts : hitch@domain4.com,hitch@domain5.com
EmployeeID : 003
Domain1UPN : mh@domain1.com
Domain2UPN : matty@domain2.com
Domain3UPN : matthew@domain3.com
Firstname : Matt
Lastname : Hitchcock
AllOtherAccounts : hitch@domain4.com,hitch@domain5.com
Here’s the raw content of the output file:
"EmployeeID","Domain1UPN","Domain2UPN","Domain3UPN","Firstname","Lastname","AllOtherAccounts"
"001","mh@domain1.com","matty@domain2.com","matthew@domain3.com","Matt","Hitchcock","hitch@domain4.com,hitch@domain5.com"
"002","mh@domain1.com","matty@domain2.com","matthew@domain3.com","Matt","Hitchcock","hitch@domain4.com,hitch@domain5.com"
"003","mh@domain1.com","matty@domain2.com","matthew@domain3.com","Matt","Hitchcock","hitch@domain4.com,hitch@domain5.com"
Then something is broken on my side probably. Hope it works for the OP. Thanks for lecture in code eleganceby mjolinor at 2013-04-04 10:10:20
I was going more for "example" than "lecture", and hoping I came up with a good example.by Matt Hitchcock at 2013-04-08 00:35:06
Thanks for the input, there is some great stuff I can use here, I’ll have a play and update with how I get on.by Matt Hitchcock at 2013-04-10 02:10:43
Appreciate the help.
Hiby nohandle at 2013-04-10 02:23:10
Just wanted to say thanks, this worked great! I modified it a bit after it has gathered all lines for a single user to run all of the UPN’s against a list of naming conventions to determine the domain that they belong to and whether the account has been verified (there is a prefix in the import file) and it works like an absolute charm. 111,000 rows of data re-worked into an importable CSV in just a few minutes!
Thanks for your help, it was a much more efficient way of doing it.
Great to hear back from you.by Matt Hitchcock at 2013-04-10 03:14:55
Which approach you used btw mjolinor’s or mine?
[quote="mjolinor"]Another possible method - not tested but should work.by Matt Hitchcock at 2013-04-10 03:16:47$inputfile = <path to input file>
[/quote]
$outputfile = <path to output file>
$UserObj = $null
$NewUser = # script block to clean up and output user object and set up for next user
{
if ($UserObj){
$UserObj.AllOtherAccounts = $UserObj.AllOtherAccounts -replace ',$',''
$UserObj
}
$UserObj =
[PSCustomObject]@{
EmployeeID = $null
Domain1UPN = $null
Domain2UPN = $null
Domain3UPN = $null
Firstname = $null
Lastname = $null
AllOtherAccounts = $null
}
$DomainCount = 1
}
&{
Get-Content $inputfile |
select -skip 1 |
foreach {
$parts = $_.split(',')
if ($parts[0] -ne $UserObj.EmployeeID) {.$NewUser}
Switch ($DomainCount)
{
1 {
$UserObj.EmployeeID = $parts[0]
$UserObj.Domain1UPN = $parts[1]
$UserObj.Firstname = $parts[2]
$UserObj.Lastname = $parts[3]
$DomainCount++
Break
}
2 {
$UserObj.Domain2UPN = $parts[1]
$DomainCount++
Break
}
3 {
$UserObj.Domain3UPN = $parts[1]
$DomainCount++
Break
}
4 {
$UserObj.AllOtherAccounts += $parts[1] + ','
}
}
}
.$NewUser # output last object
} | export-csv $outputfile -notype
Thanks for this, will try to test it also. It looks a lot nice than mine does
I modified yours (nohandle), I was going to test both and see what worked but I ran through yours and saw straight away how I needed to modify it so just went with it