Need to match column data across CSV's.

I need to match data from CSV1 and pull matching data from CSV2 according to Value 2 in CSV1.

I can easily set the values, but am unsure of the logic I’d use to find the Value2 of Value1 in CSV1 , and accordingly search CSV2 for a matching Value2 next to Value1, then pull Value4 and pipe it to CSV1 on that row.

 

 

Example CSV:

CSV1

Value1 | Value 2

1 2

1 3

1 2

 

CSV2

Value1 | Value 2 | Value 3

1 2 4

1 3 5

1 2 4

 

I need to move the matching Value3 from CSV2 to CSV1 for each row. (Value 3 is updated constantly, so simply merging the CSV files is not an option.)

Value 1 is a large number, again a variable so I need to match that with its corresponding Value2 in CSV1, find the matching value1 in CSV with a matching value2, then export value3 from that row in CSV2, and append to the row in CSV1.

 

I’ll first set the values in CSV1 - right? Then what? Head scratch

[pre]import-csv $CSVPath | foreach-object { $Value1 = $_.Column1 $Value2 = $_.Column2 }[/pre]
 

Hmmm … I don’t know if others here understand better what you’re trying to do but in both of your CSV files the first and the third row are identical. How do you tell the difference between the two? Shouldn’t they have a unique attribute?

And could you please try to post less white space? Thanks.

Please provide a set of representative data with less abstraction (use actual column headers not value1,value2 with actual data pointing in the example to what the script needs to achieve)

Sorry, only double spacing is allowed in my editor for some reason. The first and third rows are in fact separate data sets. Value 1 will be a variable between 0 and lets say 999,999. Hence I’m not doing an “if -eq” cause that would be nuts. Value 3 will have maybe 3 potential values, 5 at most. So the data I posted above is in fact representative, and not abstract. For argument’s sake, however, let’s do this instead.

Value 1 | Value 2

976543 2

976545 3

976544 2

976543 1

CSV2:

Value1 | Value2 | Value3

976543 2 4

976545 3 5

976544 2 4

976543 1 0

976543 3 1

976544 5 6

I need to match value 1 in CSV 2, where Value 1 and Value 2 match, then export the Value 3 data for that row, and append to a new column in CSV1.

CSV 2 contains a very specific number of Value 1 rows, 1-5 for each possible variant of value 2 for each value 1. So let’s say 976543 has 4 entries. One of those has 2 as value 2. I need the value 3 on that row. 976545 has 3 possible entries. I need the value 3 from the row in CSV2 that contains a 3 as value2.

Our value 3 is in fact dependent on value 2, but the data is broken up in such a way we must always match value 1 AND 2 to ensure we get the correct value 3. To simplify this, imagine value 1 is an item number, say , and we’re matching upc, or barcode or something against other values. Does that make more sense?

 

 

 

You should have formatted it as code! :wink:

I’m still pretty unsure if I got what you need but if I got it right you could start with something like this:

$CSVData01 = @’
“Value1”, “Value2”
976543, 2
976545, 3
976544, 2
976543, 1
‘@ | Convertfrom-Csv
$CSVData02 = @’
“Value1”,“Value2”,“Value3”
976543,2,4
976545,3,5
976544,2,4
976543,1,0
976543,3,1
976544,5,6
'@ | Convertfrom-Csv

foreach ($DataSet01 in $CSVData01) {
$DataSet02 = $CSVData02 | Where-Object { $DataSet01.Value1 -eq $.Value1 -and $DataSet01.Value2 -eq $.Value2 }
[PSCustomObject]@{
Value1 = $DataSet01.Value1
Value2 = $DataSet01.Value2
Value3 = $DataSet02.Value3
}
}

Would this work instead? I am working with 40,000 and 500 rows, respectively.

`$CSV1 =gc import-csv C:\path\to\CSV1\`
`$array1=@()`

`$csv2= import-csv C:\path\to\csv2\`
`$array2=@()`

`$CSV1 | foreach ($DataSet01 in $array1) {`
`$DataSet02 = $array2 | Where-Object { $DataSet01.Value1 -eq $_.Value1 -and $DataSet01.Value2 -eq $_.Value2 }`
`[PSCustomObject]@{`
`Value1 = $DataSet01.Value1`
`Value2 = $DataSet01.Value2`
`Value3 = $DataSet02.Value3`
`}`

`} | out-file C:\path\to\final\csv.csv -notypeinformation -append`

-Edit, no it does not work.
When I DO get output, the new CSV contains (Note: I have modified “Value” columns to their actual names. No actual values are writing.)

#TYPE System.Management.Automation.PSCustomObject		
Value1	Value2	Value3

Why do you enclose every single line in Mosquito poop? (`) :wink:
What should the code line #1 in your code do in your opinion?
Do you debug your own code? (run it line by line in the ISE or in VSCode?
Please explain for me whatfor you use the code line #2 and #5?
In your code line #7 you pipe something to a foreach statement. That does not make sense at all.
Please do yourself a favor make a step back and start with learning the very basics of Powershell from scratch. You save your self from a lot of wasted time and frustration.
Did you run my code example? If you ran it - did it do what you expect it to do?

I’m not sure how to be more clear on this. I need to match two fields in one CSV against another CSV, pull a value from the matched row, and write that third value to another CSV. When I run this, I expect to be able to write each line out (Matched values with third value on the same line) , or to append it to another CSV (The third value only for each row.)

I found an issue while debugging due to formatting on one of my CSV’s. That was resolved. I am piping the array into a foreach as foreach statements are stored and processed as arrays anyways, so it should be able to handle it. Is that not the case? Correct me if wrong.

Your code example is not useful,

    as it manually defines values that are not static
, and were only being used as examples. The process you used, however, I can recognize, and was attempting to repurpose it to fit my needs. You seem to be misunderstanding the fact that I am not going to enter 40,000 values into my code daily to run this, it is reading them from a CSV. Your example neither took that into account nor demonstrated a clear example of it. HOWEVER, I did see you putting the objects into arrays, which is a good idea, therefore I ran with it.
But thank you for the advice on going back to the basics. That’s why I’m posting, and asking for help, after all. No?

By the way, My code has mosquito poop due to me trying to figure out how to use pre tags. That’s the format leftovers of switching from format->code to pre tags. Mosquito poop leftovers.

The gist is this - I am being as clear as possible.

  1. Export SQL Table to CSV. Search data for 2 items, match those against another CSV. Get value from matched line. Write matched line value(s) to another CSV. These are big CSV’s. CSV’s have roughly 50 columns each, and one has 40,000 plus rows.
    2.Avoid entering value data into script. Data is in flux, and will change daily. Avoid entering 40,000 lines of values into script daily.

I found an issue while debugging due to formatting on one of my CSV’s. That was resolved. I am piping the array into a foreach as foreach statements are stored and processed as arrays anyways, so it should be able to handle it. Is that not the case? Correct me if wrong.
The process you used, however, I can recognize, and was attempting to repurpose it to fit my needs. You seem to be misunderstanding the fact that I am not going to enter 40,000 values into my code daily to run this, it is reading them from a CSV. Your example neither took that into account nor demonstrated a clear example of it. HOWEVER, I did see you putting the objects into arrays, which is a good idea, therefore I ran with it.
But thank you for the advice on going back to the basics. That’s why I’m posting, and asking for help, after all. No? :wink:

By the way, My code has mosquito poop due to me trying to figure out how to use pre tags. That’s the format leftovers of switching from format->code to pre tags. Mosquito poop leftovers! :slight_smile:

Also, gc is a garbage collection command. It’s cleaning up before executing this portion of my script. I’m running this as a single job in a much larger process, so I take care to clean up after each section/at the beginning of each section. That’s a placeholder, and should not have been in my response example. It’s not “Get-Content” as you may be assuming, and was not run as such.

There is a distinct difference between a foreach statement and the Foreach-Object cmdlet. You cannot pipe something to a foreach statement (what you did in your code).

Do your CSV files have 40,000 columns or rows? I understood that one CSV file has 2 columns and the second one has 3. If yes - do they have headers as well? Could you post these header names without revealing any secrets?

  1. Gotcha. Should I use a foreach-object on my array, then?

  2. The first CSV has exactly 50 columns, and 44,780 rows. This will vary (Row count, not column count.) Headers are present, yes. We are using headers on column B and column K as value1 and value2 respectively. These headers are reading correctly in $array1 when called, as are the other 48 headers.
    Second CSV has ~17 rows, and only 837 columns. We are matching $CSV1.B and $CSV1.K against $CSV2.B and $CSV2.E , respectively. Once matched, we look on the line of the match in $CSV2 to obtain $CSV2.K < —this is what we want.
    We then want to write $CSV2.K to another place, preferably append it to a new column in a CSV with the original $CSV1 content.

-FYI , I have split the multi-sheet CSV2 into a set of files, and am adding them to an array together, as you can see on line 3./Edit no, you cannot see that as the wrong code was pasted. LOL.
Here it is.

$CSV1 = import-csv C:\Users\blah\SQLExport.csv
$array1=@()

$csv2= Import-Csv -Path  (Get-ChildItem -Path C:\Users\blah\groupedCSVs\ -Filter '*.csv').FullName
$array2=@()

 #Change to foreach-object?    $csv1 | foreach {
    $DataSet02 = $csv2 | Where-Object { $CSV1.ItemNumber -eq $_.ItemNumber -and $CSV1.SoldQty -eq $_.Sold }
    [PSCustomObject]@{
        Value1 = $CSV1.ItemNumber
        Value2 = $CSV1.SoldQty
        Value3 = $CSV2.TargetData
    }
}

$Csv1 | export-csv C:\Users\blah.csv


#Manual runs for testing -ignore please.
Clear-Variable csv1
Clear-Variable csv2
$array1.clear
$array2.clear
clear-variable array1
clear-variable array2   

I’m pretty sure my issue is with matching for CSV2.K , and then writing it out where I want it.

 $csv1 | export-CSV
is a placeholder, and is not being used. I am piping the end of the foreach statement to export-CSV, and it is writing out a psautomation object, as well as value1, value2, value3. This is improved from where we were, as it now looks to be iterating, which is good, but I’m unsure of why it’s writing out the object itself instead of the value. EX:
#TYPE System.Management.Automation.PSCustomObject
Value1 Value2 Value3
System.Object System.Object System.Object
System.Object System.Object System.Object
System.Object System.Object System.Object

[quote quote=173203]1. Gotcha. Should I use a foreach-object on my array, then?

  1. The first CSV has exactly 50 columns, and 44,780 rows. This will vary (Row count, not column count.) Headers are present, yes. We are using headers on column B and column K as value1 and value2 respectively. …[/quote]
    Where should we know all this??? Read your first post again and compare it to your last one - that’s a complete different task. :-/
    You could try a calculated property like this …
    $CSV1 = Import-Csv -Path ‘C:\Users\blah\CSVFile1.csv’
    $CSV2 = Import-Csv -Path ‘C:\Users\blah\CSVFile2.csv’

$CSV1 |
Select-Object -Property *,
{
Name = ‘AdditionalColumn’;
Expression = {
$CurrentRow = $_
($CSV2 |
Where-Object {
$CurrentRow.ColumnB -eq $.ColumnB -and
$CurrentRow.ColumnK -eq $
.ColumnE
}).ColumnK
}
}

In your last post …
Lines #2 and #5 are completely unnecessary … drop them! … and do not do this anymore in the future. :wink:
In line #4 you use a query to provide the CSV file for the Import-CSV. This query might return more than one CSV file. Are you aware of this?

  1. Remove the Arrays? Alright, I can use the variables only. For anyone who wants to know why this is, I actually looked into this, and by default import-csv inputs the data into an array, so there is no need. (This is better to know.)

  2. Yes, I am aware it will return -ALL- CSV files in the directory. This is intended. My static data (CSV2) is split into multiple CSV’s. I have also verified it returns all data, as intended. This was done during my development of the script as searching across multiple sheets was proving annoying.

My first post contains my goal. “Value 1 is a large number, again a variable so I need to match that with its corresponding Value2 in CSV1, find the matching value1 in CSV(2) with a matching value2, then export value3 from that row in CSV2, and append to the row in CSV1.” I am sorry if it was difficult to understand, I am not good with CSV functions in powershell.

No need to re-invent the wheel.
On the lighter side of things, the code it improves on is hilariously notated. Enjoy reading the Github repository!
https://www.powershellgallery.com/packages/Join-Object/2.0.1