Validating data prior to changing values in CSV file

Hello,

I have a CSV file that contains a list of Windows operating systems that need to be updated often using new folder names created during the update process.

For example, when Windows 11 Pro x64 21H2 22000.318 is updated offline, the updated version is placed into a new folder. That new folder name will either have an updated Build number, or a timestamp appended to the original name automatically: For example: Windows 11 Pro x64 21H2 22000.556 or Windows 11 Pro x64 21H2 22000.556 2203140144.

I was able to figure out how to grab those new folder names, and sort them in order before plugging them into the CSV file using the following code:

### Update CSV with new folder names ###

# Global Variables
$OSMedia = "D:\OSD\Builder\OSMedia\"
$csv_file = "D:\OSD\Builder\List-SourceNames.csv"
$csv_file_old = "D:\OSD\Builder\List-SourceNames-OLD.csv"

# Backup CSV File
Copy-Item $csv_file -Destination $csv_file_old -Force

# Prompt for latest number of folders to get
$num = Read-Host "Enter number of latest folders"

# Create $latest folder names array (excluding "build" folders)
$latest = (Get-ChildItem $OSMedia -Exclude "build*" | Where { $_.PSIsContainer } | Sort CreationTime -Descending | Select -First $num | Select-Object Name).Name
$latest = $latest | Sort-Object

# Replace osd_names in CSV with $latest folder names
$osd_names = Import-CSV "$csv_file_new"
$num = 0
foreach ($osd_name in $osd_names)
    {
    $osd_name.osd_name = $osd_name.osd_name.replace($osd_name.osd_name,$latest[$num])
    $num = $num+1
    }
$osd_names | Export-CSV "$csv_file" -Force -NoTypeInformation -Encoding ASCII

Iā€™m currently working with six operating systems. If all six are updated, then using my above script will work perfectly. However, if I have fewer than six, I need to update the CSV file manually because the above script doesnā€™t know how to tell a given value to only replace another given value.

Iā€™m wondering if I could use the first part of the folder names that never change, such as Windows 11 Pro x64 21H2. Based on this value, the script would (hypothetically) know to replace Windows 11 Pro x64 21H2 22000.318 with Windows 11 Pro x64 21H2 22000.556.

CSV column contents:

osd_name
Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1566
Windows 10 Pro for Workstations x64 21H2 19044.1288
Windows 11 Pro for Workstations x64 21H2 22000.318
Windows 11 Pro x64 21H2 22000.318
Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000
Windows Server 2022 Datacenter x64 Dev 22509.1000

Iā€™m playing around with the -split option, but Iā€™m not sure this will help me achieve my objective.

$split_names = "Windows 11 Pro x64 21H2 22000.556"
$a = $split_names.split(".")[0]
$a

Output: Windows 11 Pro x64 21H2 22000

Is there a way that I can use a portion of this string Windows 11 Pro x64 21H2 22000.318 or another value in my csv file to tell PowerShell what value to replace it with?

Old Name: Windows 11 Pro x64 21H2 22000.318
New Name: Windows 11 Pro x64 21H2 22000.556

I have a column that I use for the base name for each OS, such as Windows 11 Pro.

Could someone point me in the right direction so I can take a stab at it?

Thank youā€¦

Updated thought:

Perhaps I could use an IF statement along with the -like and -replaceā€¦?
My PowerShell is running updates now, so it will be tied up for a few hours (itā€™s Patch Tuesday), but when I get a chance later, it would be nice to know if Iā€™m on the right track with that thought.

Iā€™m not sure if I fully understood what youā€™re trying to do but I think something like this would work:

$data = @'
osd_name
Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1566
Windows 10 Pro for Workstations x64 21H2 19044.1288
Windows 11 Pro for Workstations x64 21H2 22000.318
Windows 11 Pro x64 21H2 22000.318
Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000
Windows Server 2022 Datacenter x64 Dev 22509.1000
'@

$csv = $data | ConvertFrom-Csv

$latest = @(
    'Windows 10 Pro for Workstations x64 21H2 19044.3000'
    'Windows Server 2022 Datacenter x64 Dev 22509.1500'
)

foreach ($update in $latest) {
    
    $update -match '(.+)(\d{5}\.\d+$)'
    $entry = ($csv.osd_name | Where-Object {$_ -like "$($matches[1])*"}) 
    $index = $csv.osd_name.IndexOf($entry)
    $csv[$index].osd_name = $csv[$index].osd_name -replace '(\d{5}\.\d+$)',$matches[2]

}

What it does:

The regular expression uses two capture groups
1:(.+)
2: (\d{5}\.\d+$)
which populates the automatic variable $matches with 2 entries (actually more, but weā€™ll ignore the details). $matches[1] will refer to everything up to the 5 digit number, $matches[2] will refer to the 5 digit number to the end of the string.

We append the wildcard character ā€˜*ā€™ to the first match, to find the entry in the CSV; then we get the entryā€™s index number so we can update it directly.

Finally, we look for the same pattern in the entry to find and replace the version number.

Hello @matt-bloomfield,

Thanks for taking a look into this for me. I ran your code, and it came back with:

True
True

I updated your $latest array with my values:

$latest = @(
    'Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586'
    'Windows 10 Pro for Workstations x64 21H2 19044.1586'
    'Windows 11 Pro for Workstations x64 21H2 22000.556'
    'Windows 11 Pro x64 21H2 22000.556'
    'Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000 2203140144'
    'Windows Server 2022 Datacenter x64 Dev 22509.1000 2203140148'
)

And it came back with:

True
True
True
True
False
False

I modified your code some more so that it would import my CSV file and import the $latest content as well to see if any changes would be made to the CSV file.

$data = Import-CSV "D:\OSD\Builder\List-SourceNames-NEW.csv"
$latest = Get-Content "D:\OSD\Builder\OSMedia-Latest_.txt"
foreach ($update in $latest) {
    $update -match '(.+)(\d{5}\.\d+$)'
    $entry = ($csv.osd_name | Where-Object {$_ -like "$($matches[1])*"}) 
    $index = $csv.osd_name.IndexOf($entry)
    $csv[$index].osd_name = $csv[$index].osd_name -replace '(\d{5}\.\d+$)',$matches[2]
}

Because Iā€™m importing the CSV file in place of the $data array, I tested this without the $csv = $data | ConvertFrom-Csv line.

I then replaced your $latest array with mine by importing the contents from the Text file using $latest = Get-Content "D:\OSD\Builder\OSMedia-Latest_.txt" as seen in the code above.

After running this, I still get the same output with 4 True(s) and 2 False(s). Interestingly, I didnā€™t see any changes to the CSV file.

Because I didnā€™t use your $csv = $data | ConvertFrom-Csv, I replaced all of the $csv variables to $data:

foreach ($update in $latest) {
    $update -match '(.+)(\d{5}\.\d+$)'
    $entry = ($data.osd_name | Where-Object {$_ -like "$($matches[1])*"}) 
    $index = $data.osd_name.IndexOf($entry)
    $data[$index].osd_name = $data[$index].osd_name -replace '(\d{5}\.\d+$)',$matches[2]
}

I get the same results as Iā€™m sure is expected.

I still need to absorb the topic of regular expressions. Itā€™s going to take me some time for it to really sink in. Iā€™ll look for some online videos that show how it works. I read Microsoftā€™s about_Regular_Expressions, but it seems a bit oversimplified considering I canā€™t figure out how to apply it to my situation. Itā€™s also where I came across -split which I was hoping would be my saving grace. :slight_smile:

Any thoughts on why the CSV file isnā€™t getting any changes? Iā€™m also curious about those two False values being returned.
Iā€™ll try to figure out how I can experiment with a single string, which I think may help me grasp this concept faster.

As for my end goal. Iā€™ll provide an image for a better visual.

Based on the data that weā€™re comparing in your sample script, I need to take the values from the OSMedia-Latest_.txt file and use that to replace the values in the CSV file under osd_name. Although they are in perfect order in the Text file, there will be times when I have fewer than 6 entries and probably in a different order. The CSV file is always in the same order though.

Thank you for your sample code. I will continue to play with and revisit the topic of learning how to better understand PowerShellā€™s regular expressions.

Thank youā€¦

I have a few thoughts on this.

  1. You are way overcomplicating this.
  2. You should never have to manually input how many new folders.

First thing, letā€™s get our list of OSMedia. If there are more folders here other than the ā€œnewā€ folders - you should figure out how to properly identify the new ones. (could easily use the existing csv to compare to entire list to find which are new, or perhaps use a datetime as the filter) Iā€™m going to work off the assumption that the folders in this directory are the folders that you need to match your csv.

$OSMedia = "D:\OSD\Builder\OSMedia\"
$latest = Get-ChildItem $OSMedia -Exclude "build*" -Directory |
    Sort-Object -Property Name | Select-Object -ExpandProperty Name

Now we get our CSV data

$csv_file = "D:\OSD\Builder\List-SourceNames.csv"
$osd_names = Import-CSV "$csv_file"

Now process the new folders, updating the csv if there is a match

foreach($name in $latest){
    Write-Host Processing new folder $name -ForegroundColor Cyan

    # take the folder name and replace everything from the space and build number to the end
    # Windows 11 Pro for Workstations x64 21H2 22000.318 becomes Windows 11 Pro for Workstations x64 21H2
    $basename = $name -replace '\s\d{5}.+$'

    # try to find a line in the csv that matches this basename
    $match = $osd_names | Where-Object osd_name -match $basename
    if($match.count -gt 1){
        Write-Warning "More than one csv entry matched $basename"
    }
    elseif($match.count -eq 1){
        Write-Host "Updating CSV entry $($match.osd_name) to $name" -ForegroundColor Cyan
        $match.osd_name = $name
    }
    else{
        Write-Host No entry in CSV matching $basename
    }
}

Now you can look at your csv to confirm itā€™s changed before export

$osd_names | Format-Table

Hello @krzydoug,

This is pretty amazing work with the ā€œProcessing New Folders to Matchā€. I was analyzing the $basename = $name -replace '\s\d{5}.+$' portion and it makes sense after cross-referencing Microsoftā€™s about_Regular_Expressions.

I can see where the $basename becomes the truncated name after stripping the space (\s), the first 5 numbers (\d{5}) of the Build, along with the dot (.+) and if I recall correctly, the ($) represents everything to the end of the string; or is it (+$) that does that?

One thing to note is that although the code for processing the New Folders to Match is producing the correct truncated names, they donā€™t seem to want to match. The output after running your code shows they match, but the output is returning:

No entry in CSV matchingā€¦

Processing new folder Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586
No entry in CSV matching Windows 10 Enterprise LTSC 2021 x64 21H2
Processing new folder Windows 10 Pro for Workstations x64 21H2 19044.1586
No entry in CSV matching Windows 10 Pro for Workstations x64 21H2
Processing new folder Windows 11 Pro for Workstations x64 21H2 22000.556
No entry in CSV matching Windows 11 Pro for Workstations x64 21H2
Processing new folder Windows 11 Pro x64 21H2 22000.556
No entry in CSV matching Windows 11 Pro x64 21H2
Processing new folder Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000 2203140144
No entry in CSV matching Windows Server 2022 Datacenter Desktop Experience x64 Dev
Processing new folder Windows Server 2022 Datacenter x64 Dev 22509.1000 2203140148
No entry in CSV matching Windows Server 2022 Datacenter x64 Dev

I get the same results when using $osd_names and $latest as arrays versus importing the data:

# OLD Values: CSV File Column 'osd_name'
$osd_names = (
    'Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1566',
    'Windows 10 Pro for Workstations x64 21H2 19044.1526',
    'Windows 11 Pro for Workstations x64 21H2 22000.493',
    'Windows 11 Pro x64 21H2 22000.493',
    'Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000',
    'Windows Server 2022 Datacenter x64 Dev 22509.1000'
)

# NEW Values: Latest Folders
$latest = (
    'Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586',
    'Windows 10 Pro for Workstations x64 21H2 19044.1586',
    'Windows 11 Pro for Workstations x64 21H2 22000.556',
    'Windows 11 Pro x64 21H2 22000.556',
    'Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000 2203140144',
    'Windows Server 2022 Datacenter x64 Dev 22509.1000 2203140148'
)

I noticed that $osd_names | Format-Table outputs the old vaues from the CSV file.

PS C:\Windows\system32> $osd_names | Format-Table
Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1566
Windows 10 Pro for Workstations x64 21H2 19044.1526
Windows 11 Pro for Workstations x64 21H2 22000.493
Windows 11 Pro x64 21H2 22000.493
Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000
Windows Server 2022 Datacenter x64 Dev 22509.1000

I noticed that the elseif($match.count -eq 1) is what Iā€™m expecting to happen considering that means that the $match.count is equal to only 1 object in the CSV file. When I call on $match.count, it shows a value of 0.

PS C:\Windows\system32> $match.count
0

Is osd_name the correct property name?

Correct. osd_name is the Property name for the list of OS Names in the CSV file.
I double checked by importing the CSV file and running:

$osd_names | gm

Which outputs in part, the following:

Name         MemberType   Definition                                                                                                    
----         ----------   ----------                                                                                                    
Equals       Method       bool Equals(System.Object obj)                                                                                
GetHashCode  Method       int GetHashCode()                                                                                             
GetType      Method       type GetType()                                                                                                
ToString     Method       string ToString()                                                                                             
osd_builds   NoteProperty string osd_builds=D:\OSD\Builder\OSBuilds                                                                     
osd_import   NoteProperty string osd_import=D:\OSD\Builder\OSImport                                                                     
osd_media    NoteProperty string osd_media=D:\OSD\Builder\OSMedia                                                                       
osd_name     NoteProperty string osd_name=Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1566                                           

Thanksā€¦

What if you try changing it to

Where-Object osd_name -match ā€œ$basenameā€

Or

Where-Object {$_.ā€™osd_nameā€™ -match $basename}

@krzydoug,

Thanks for double checking this, but unfortunately both of those changes result in the same output.

Processing new folder Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586
No entry in CSV matching Windows 10 Enterprise LTSC 2021 x64 21H2
...

The order should not matter, for each updated value in $latest, the script checks if thereā€™s a matching value in the csv data.

My code updates the ā€˜copyā€™ of the csv file (my variable $csv) in memory, which you can view by putting $csv at the bottom to see the changes have been made.

To update the actual file, you would need to use Export-CSV.

To suppress the True/False output, you can redirect the output:

$update -match '(.+)(\d{5}\.\d+$)' | Out-Null

Can you share the full code you made after adapting my version?

Hello @matt-bloomfield,

I ran your code again and then called $csv to see what changes were being stored in memory. It turns out those two False values were for the two Server 2022 operating systems which show the old values, while the other 4 show the updated values.

It appears the $update -match '(.+)(\d{5}\.\d+$)' is having issues between the Client and Server operating systems due to their naming just prior to the -matching criteria:

  • The True values end with 21H2
  • The False values end with DEV; no number at the end.
$csv

True
True
True
True
False
False

osd_name                                                            
--------                                                            
Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586                 
Windows 10 Pro for Workstations x64 21H2 19044.1586                 
Windows 11 Pro for Workstations x64 21H2 22000.556                  
Windows 11 Pro x64 21H2 22000.556                                   
Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000
Windows Server 2022 Datacenter x64 Dev 22509.1000

As for the modifications to your code; those were done just for testing. The only one that produces similar results when I imported my CSV file was the following:

Note: The only changes made were omitting $csv = $data | ConvertFrom-Csv and changing all $csv entries to $data.

$data = Import-CSV "D:\OSD\Builder\List-SourceNames-OLD.csv"
$latest = Get-Content "D:\OSD\Builder\OSMedia-Latest-1.txt"

foreach ($update in $latest) {
    
    $update -match '(.+)(\d{5}\.\d+$)'
    $entry = ($data.osd_name | Where-Object {$_ -like "$($matches[1])*"}) 
    $index = $data.osd_name.IndexOf($entry)
    $data[$index].osd_name = $data[$index].osd_name -replace '(\d{5}\.\d+$)',$matches[2]
}
$data
  • I tested the ordering by disordering my $latest array values and they lined up correctly as you stated.
  • No issues with exporting, except the two False values failing to match.
  • I piped Out-Null as you suggested, but will leave it commented out for the time being.
    $update -match '(.+)(\d{5}\.\d+$)' #| Out-Null

Thank youā€¦

OK, I missed that you sometimes have the date at the end of the latest version. That means that this:

$update -match '(.+)(\d{5}\.\d+$)

wonā€™t match the update because $ asserts the end of the line.

Removing the $ should resolve the problem:

$data   = Import-CSV E:\Temp\Files\List-SourceNames-OLD.csv
$latest = Get-Content E:\Temp\Files\latest.txt

foreach ($update in $latest) {
    
    $update -match '(.+)(\d{5}\.\d+)' | Out-Null
    $entry = ($data.osd_name | Where-Object {$_ -like "$($matches[1])*"}) 
    $index = $data.osd_name.IndexOf($entry)
    $data[$index].osd_name = $data[$index].osd_name -replace '(\d{5}\.\d+)',$matches[2]

}

$data

Output:

src_name                                            osd_name
--------                                            --------
Windows 10 Enterprise LTSC 2021                     Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586
Windows 10 Pro for Workstations                     Windows 10 Pro for Workstations x64 21H2 19044.1586
Windows 11 Pro for Workstations                     Windows 11 Pro for Workstations x64 21H2 22000.556
Windows 11 Pro                                      Windows 11 Pro x64 21H2 22000.556
Windows Server 2022 Datacenter (Desktop Experience) Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1200
Windows Server 2022 Datacenter                      Windows Server 2022 Datacenter x64 Dev 22509.1000

Hello @matt-bloomfield,

Dropping the $ certainly changed the results for the two Server entries to True.
Using the original arrays:

$csv

True
True
True
True
True
True

osd_name                                                            
--------                                                            
Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586                 
Windows 10 Pro for Workstations x64 21H2 19044.1586                 
Windows 11 Pro for Workstations x64 21H2 22000.556                  
Windows 11 Pro x64 21H2 22000.556                                   
Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000
Windows Server 2022 Datacenter x64 Dev 22509.1000  

Unfortunately, the output for the two servers at the bottom of the list have their (new value) Timestamps removed.

It appears from your output that you are using my Source Names for the CSV input instead of the osd_nameā€™sā€¦?

Just curious. I provided those in the screenshot to see if it was possible to just match those with the $latest values considering they donā€™t have all that extra information behind it.

Would that make it easier or worse?

Or do you think the value should be compared at the " x64" instead of the build numberā€¦?

I tried the following to no avail:
$update -match '\s[x]\d{2}' comes back True but returns all of the old values, but for some reason truncates the last one in the list to: Windows Server 2022 Datacenter x64 Dev

Using the $ at the end returns all False, so no changes are applied.
$update -match '\s[x]\d{2}$'

I thought I could start with the space \s, followed with the letter [x] and the trailing 2 digits \d{2}.

Considering there is only one ā€œxā€, I thought I could use that as the starting point instead of the build numbers dot.

Even after reading through this article on how RegEx works, itā€™s terribly confusing still.
PowerShell Match | How do Match Operators work in PowerShell? (educba.com)

Try removing the $ from my suggestion. Regular expressions are very specific so if you show some text but are working with different text, you can expect suggestions not to work.

Hi @krzydoug ,

I dropped the $ from your code as requested. Unfortunately, the results were the same:
$basename = $name -replace '\s\d{5}.+'

Processing new folder Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1586
No entry in CSV matching Windows 10 Enterprise LTSC 2021 x64 21H2
...

$osd_names | Format-Table Outputs the old values still:

Windows 10 Enterprise LTSC 2021 x64 21H2 19044.1566
Windows 10 Pro for Workstations x64 21H2 19044.1526
Windows 11 Pro for Workstations x64 21H2 22000.493
Windows 11 Pro x64 21H2 22000.493
Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000
Windows Server 2022 Datacenter x64 Dev 22509.1000

Is it possible to modify the -match or -replace criteria to truncate it before the " x64" instead of the build.#?

Iā€™m still reading up on that and playing around with some experiments, but nothing promising so far.

Then we either have the wrong property names, wrong values, or something different that we canā€™t see.

Perhaps we can slim down your original code and me ruling out the Import-CSV used to populate the $osd_names array (old values) and the Get-Content used to populate the $latest array (new values).

# OLD Values: CSV File Column 'osd_name'
$osd_names = (
    'Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000',
    'Windows Server 2022 Datacenter x64 Dev 22509.1000'
)

# NEW Values: Latest Folders
$latest = (
    'Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000 2203140144',
    'Windows Server 2022 Datacenter x64 Dev 22509.1000 2203140148'
)

foreach($name in $latest){
    Write-Host Processing new folder $name -ForegroundColor Cyan

    # take the folder name and replace everything from the space and build number to the end
    # Windows 11 Pro for Workstations x64 21H2 22000.318 becomes Windows 11 Pro for Workstations x64 21H2
    $basename = $name -replace '\s\d{5}.+'

    # try to find a line in the csv that matches this basename
    $match = $osd_names | Where-Object osd_name -match ā€œ$basenameā€
    if($match.count -gt 1){
        Write-Warning "More than one csv entry matched $basename"
    }
    elseif($match.count -eq 1){
        Write-Host "Updating CSV entry $($match.osd_name) to $name" -ForegroundColor Cyan
        $match.osd_name = $name
    }
    else{
        Write-Host No entry in CSV matching $basename
    }
}

$osd_names | Format-Table

Using just the two values with timestamps, do you think starting the match with the " x64" portion in the name would help? Iā€™m still trying to wrap my head around RegEx and all the rules to figure out how I can do that, but so far no luck.

Iā€™m reading here at the moment: Regular Expression Language - Quick Reference | Microsoft Docs

Damn if it ainā€™t extremely confusingā€¦ but Iā€™m determined to make some sense out ofā€¦ eventually. :grimacing:

Interestingly enough, I was able to get the -match to truncate from the " x64" to the end using either of these first two (RegEx) expressions. The third one didnā€™t trim anything at all.

$basename = $name -replace '\s[x]\d{2}.+$'
$basename = $name -replace '\s[x]\d{2}.+'
$basename = $name -replace '\s[x]\d{2}'

Output:

Processing new folder Windows Server 2022 Datacenter Desktop Experience x64 Dev 22509.1000 2203140144
No entry in CSV matching Windows Server 2022 Datacenter Desktop Experience
Processing new folder Windows Server 2022 Datacenter x64 Dev 22509.1000 2203140148
No entry in CSV matching Windows Server 2022 Datacenter

Unfortunately, this didnā€™t make any difference with the $match.count which always remains at 0, which means it will never be -eq 1 here:

elseif($match.count -eq 1){
        Write-Host "Updating CSV entry $($match.osd_name) to $name" -ForegroundColor Cyan
        $match.osd_name = $name
    }