Compare columns between 2 files and delete non common columns and data

Hello,

I have a bunch of files in folder A and their corresponding metadata files in folder B. I want to loop though the data files and check if the columns are the same in the metadata file. If the columns in both match, no action to is to be taken. If Data file has more columns than metadata file, then those columns should be deleted from incoming data file.

ex

Data file is ps_job.dat

“empid”|“name”|“deptid”|“zipcode”|“salary”|“gender”
“1”|“Tom”|“10”|“11111”|“1000”|“M”
“2”|“Ann”|“20”|“22222”|“2000”|“F”

Meta data file is ps_job_metadata.dat

“empid”|“name”|“zipcode”|“salary”

I want my output to be

“empid”|“name”|“zipcode”|“salary”

“1”|“Tom”|“11111”|“1000”
“2”|“Ann”|“22222”|“2000”

k, welcome to Powershell.org. Please take a moment and read the very first post on top of the list of this forum: Read Me Before Posting! You’ll be Glad You Did!.

When you post code, error messages, sample data or console output format it as code, please.
In the “Text” view you can use the code tags “CODE”, in the “Visual” view you can use the format template “Preformatted”. You can go back edit your post and fix the formatting - you don’t have to create a new one.
Thanks in advance.

This forum is for scripting questions rather than script requests. We do not write customized and ready to use scripts or solutions on request.

What have you tried so far? We expect you to make an own attempt to get your task done or to solve your problem. If you have done so already please document here what exactly you have done and show your code. Then we probably might be able to help you step further.

When you crosspost the same question at the same time to different forums you should at least post links to the other forums along with your question to avoid people willing to you help making their work twice or more.

https://stackoverflow.com/questions/63198313/compare-columns-between-2-files-and-delete-non-common-columns-using-powershell

Thanks

$outputFile = “C:\Users\Abdul\Desktop\Script_test\ps_job_mod.dat”

$sample = Import-Csv -Path “C:\Users\Abdul\Desktop\Script_test\ps_job.dat” -Delimiter ‘|’

$metadataLine = Get-Content -Path “C:\Users\Abdul\Desktop\Script_test\ps_job_metadata.txt” -First 1

$desiredColumns = $metadataLine.Split(“|”).Replace(“`”“,”")

$sample | select $desiredColumns | Export-Csv $outputFile -Encoding UTF8 -NoTypeInformation -Delimiter ‘|’

Well at least he shared his solution from his cross posting. I was able to edit his formatting there. Oh well here is just another way in case someone needs it. I’m going to guess it’s much faster than the select-object method on very large datasets.

Please note the phrase BACKTICKGOESHERE - replace it with the backtick character. I’m not sure how to escape it in this forum.

$jobfile    = "ps_job.dat"
$metafile   = "ps_job_metdata.dat"
$outputfile = "some_file.csv"

$meta = ((Get-Content $metafile -First 1 -Encoding UTF8) -split '\|').Trim('"')

Class ColumnSelector : System.Collections.Specialized.OrderedDictionary {

    Select($line,$meta)
    {
        $meta | foreach{$this.add("$_",(iex "BACKTICKGOESHERE$line.$_"))}
    }
    ColumnSelector($line,$meta)
    {
        $this.select($line,$meta)
    }
}

import-csv $jobfile -Delimiter '|' | 
    foreach{[pscustomobject]([columnselector]::new($_,$meta))} | 
        Export-CSV $outputfile -Encoding UTF8 -NoTypeInformation -Delimiter '|'

Output

Get-Content $outputfile
"empid"|"name"|"zipcode"|"salary"
"1"|"Tom"|"11111"|"1000"
"2"|"Ann"|"22222"|"2000"

There has to be a simpler solution.

Provided you want to keep those curly quotes and your code page supports all the characters, you can do the following:

# Create array of properties delimited by |
$headers = (Get-Content .\ps_job_metadata.dat -Encoding UTF8) -split '\|'
Import-Csv ps_job.dat -Delimiter '|' -Encoding utf8 | Select-Object $headers