Find and replace strings with REGEX and incremental replacement string

Hello, I want to find and replace multiple strings (multiple patterns ), and some of the strings are on the same line. The replacement value should be incremental so this will allow to keep track in separate file on original / replaced values. I need some help to make my code to achieve that.

An example input is:

<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv4d-zxcv56</requestId>
<requestId>1234qw-12qw9x-123456</requestId> Stevie Wonder <messageId>1234qw-12qw9x-123456</msg

Desired output:

<requestId>Request-1</requestId>Ace of Base Order: Request-2<something else...
<requestId>Request-3</requestId>
<requestId>Request-4</requestId> Stevie Wonder <messageId>Request-4</msg

And the code on which I am working on:

@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
'@ | Set-Content $log -Encoding UTF8

$requestId = @{
    Count   = 1
    Matches = @()
}

$tmp = Get-Content $log | foreach { $n = [regex]::matches((Get-Content $log),'\w{6}-\w{6}-\w{6}').value
    if ($n) 
    {
        $_ -replace "$n", "Request-$($requestId.count)"
        $requestId.count++
    } $_ }
$tmp | Set-Content $log

 

 

$log = 'E:\Temp\log.txt'

@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
'@ | Set-Content $log -Encoding UTF8

$tmp = Get-Content $log -Raw

$results = $tmp | Select-String '\w{6}-\w{6}-\w{6}' -AllMatches | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value

$count = 1

foreach ($result in $results) {

    $tmp = $tmp -replace $result,"Replace-$count"
    $count++

}

$tmp | Set-Content $log -Encoding UTF8

 

That appears to be xml. Rather than attempting to manipulate a string, why not use xpath or xml dot notation?

Thanks Matt! How will you modify the script if there are 2 or more patterns? Here reportId {\w19} and KeyID {\w32}? Then how can I export the hash-table with “original” and “replaced” values to keep them in a file (e.g. “qwerty-qwer12-qwer56 : Request-1”)?

@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
<reportId>plmkjh8765FGH4rt6As</msg:reportId>something here.,. <requestId>zxcvbn-zxcv12-zxcv56</requestId>
reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
'@ | Set-Content $log -Encoding UTF8

Hi Rob, no it is not XML. Its big log file with a lot of data, some of it with some structure.

Thanks Matt! How will you modify the script if there are 2 or more patterns? Here reportId {\w19} and KeyID {\w32}? Then how can I export the hash-table with “original” and “replaced” values to keep them in a file (e.g. “qwerty-qwer12-qwer56 : Request-1”)?

@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
<reportId>plmkjh8765FGH4rt6As</msg:reportId>something here.,. <requestId>zxcvbn-zxcv12-zxcv56</requestId>
reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
'@ | Set-Content $log -Encoding UTF8

Select-String should accept an array of strings for the Pattern, although that didn’t seem to work. However, you can put two patterns in the regex which is what I’ve done.

I also added a Select-Object -Unique so that the count matches the replacement value if there’s more than one of the same value.

To keep the changes, I opted to use a custom object and export it to a CSV file as it’s easy to read.

$log = 'E:\Temp\log.txt'
$changeLog = 'E:\Temp\ChangeLog.csv'
 
@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
<reportId>plmkjh8765FGH4rt6As</msg:reportId>something here.,. <requestId>zxcvbn-zxcv12-zxcv56</requestId>
reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
'@ | Set-Content $log -Encoding UTF8
 
$tmp = Get-Content $log -Raw
 
$results = $tmp | Select-String -Pattern '(\w{6}-\w{6}-\w{6})|(<keyID>\w{32})' -AllMatches | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value | Select-Object -Unique
 
$count = 1
 
foreach ($result in $results) {
 
    $tmp = $tmp -replace $result,"Replace-$count"
    [PSCustomObject] @{
        'Replacement Value' = "Replace-$count"
        'Original Value'    = $result
    } | Export-csv $changeLog -NoTypeInformation -Append -NoClobber
    $count++
 
}
 
$tmp | Set-Content $log -Encoding UTF8

 

Thanks Matt for the prompt reply. However I have 5 more patterns to replace. There should be some sort of loop for the Select-String for each pattern and I need to keep some identification per type. My target is to achieve the following result after running the code:

<requestId>RequestId-1</requestId>Ace of Base Order: RequestId-2<something else...
<requestId>RequestId-3</requestId>
<requestId>RequestId-4</requestId> Stevie Wonder <messageId>RequestId-4</msg
<reportId>ReportId-1</msg:reportId>something here.,. <requestId>RequestId-3</requestId>
reportId>ReportId-2</msg:reportId> uraaa 123 <keyID>KeyId-1</msgdc

Do you have an idea how to achieve that? Many thanks in advance!

In that case, I would match the tag as well then use multiple counters:

$log = 'E:\Temp\log.txt'
$changeLog = 'E:\Temp\ChangeLog.csv'
 
@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
<reportId>plmkjh8765FGH4rt6As</msg:reportId>something here.,. <requestId>zxcvbn-zxcv12-zxcv56</requestId>
<reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
'@ | Set-Content $log -Encoding UTF8
 
$tmp = Get-Content $log -Raw
 
$results = $tmp | 
    Select-String -Pattern '(<ReportId>\w{19})|(<RequestId>\w{6}-\w{6}-\w{6})|(<KeyID>\w{32})' -AllMatches | 
        Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value | Select-Object -Unique
 
$reportCount  = 1
$requestCount = 1
$keyCount     = 1

foreach ($result in $results) {
 
    if ($result -like '<ReportId>*') {

        $find = ($result -replace '<ReportId>')
        $replace = "ReportID-$reportCount"
        $tmp = $tmp -replace $find,$replace
        $reportCount++

    }

    if ($result -like '<RequestId>*') {

        $find = ($result -replace '<RequestId>')
        $replace = "RequestID-$requestCount"
        $tmp = $tmp -replace $find,$replace
        $requestCount++

    }
    
    if ($result -like '<KeyID>*') {

        $find = ($result -replace '<KeyID>')
        $replace = "KeyID-$keyCount"
        $tmp = $tmp -replace $find,$replace
        $keyCount++

    }

    [PSCustomObject] @{
        'Replacement Value' = $replace
        'Original Value'    = $find
    } | Export-csv $changeLog -NoTypeInformation -Append -NoClobber
 
}
 
$tmp | Set-Content $log -Encoding UTF8

 

Perfect solution Matt! Thanks a lot. Can I ask one last question on this example? Can the “export” section be dependent on a variable so if needed to export (e.g. $export=Y) then the export section is considered and on contrary if $export=N it is skipped? Maybe the export will have an impact on the server performance that’s why I am considering this option. Many thanks for your support!

For sure. Just add the variable at the top of the script:

$export = $true #change to $false to turn off exporting
Then modify the export section:
if ($export) {
[PSCustomObject] @{
    'Replacement Value' = $replace
    'Original Value'    = $find
} | Export-csv $changeLog -NoTypeInformation -Append -NoClobber

}


 

 

Dear Matt, I notice that with the new pattern (<RequestId>\w{6}-\w{6}-\w{6}) the “Order: Q2we45-Uj87f6-gh65De<” is not detected. I tried to remove <requestId> from the Select-String pattern but then I got errors. What should I modify? It is OK that OrderId, RequestId and messageId match the same pattern - each of them will return RequestId-x. I am also struggling to replace the “user” (user 1 or user-1 or user_1 are the same) by “customer”. Can you help with these 2 points please? Here is updated entry array:

@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
<reportId>plmkjh8765FGH4rt6As</msg:reportId>something here.,. <requestId>zxcvbn-zxcv12-zxcv56</requestId>
<reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
Abcd <requestId>1234qw-12qw12-123456</requestId> abcdef ole ole Order: zxcvbn-zxcv12-zxcv56 abracadabra <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
User_2
User-2
User_1
User_12
User-12
User 12
User 9
'@

I recommend using regex101.com to test and build your regular expressions:

It’s not clear if you want the RequestId and Order in the same string or not?

<requestId>\w{6}-\w{6}-\w{6}.+Order:\s\w{6}-\w{6}-\w{6}

Would match the whole string up to the end of the order number.

(Order:\s\w{6}-\w{6}-\w{6})

Would match just the order portion.

For the users, you just need to find a regular expression that matches your examples.

User.\d+

Would match User, followed by any character, followed by 1 or more digits.

 

 

Thank you Matt for the prompt feedback. I tried with the suggested pattern I probably did something wrong in the foreach part as it didn’t capture everything:

 Select-String -Pattern '(<ReportId>\w{19})|(<requestId>\w{6}-\w{6}-\w{6}.+Order:\s\w{6}-\w{6}-\w{6})|(KeyID>\w{32})' -AllMatches

Regarding the strings RequestID, OrderId and messageID, YES, I want to capture them in the same string/pattern. I wonder if it will be possible to get rid of ReportId and OrderId and to have the pattern in this form (but I am not sure how to modify the foreach part):

Select-String -Pattern '(<ReportId>\w{19})|(\w{6}-\w{6}-\w{6})|(KeyID>\w{32})' -AllMatches

And at last , for the “User” a good pattern is the one below but I cannot make it working

^User(.*)(\d{1,2}$)

or that one:

((?<=User\-?_?\s?)\d{1,2})

Here is my new test input:

@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
<reportId>plmkjh8765FGH4rt6As</msg:reportId>something here.,. <requestId>zxcvbn-zxcv12-zxcv56</requestId>
<reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
Abcd <requestId>1234qw-12qw12-123456</requestId> abcdef ole ole Order: zxcvbn-zxcv12-zxcv56 abracadabra <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
User_2
User-2
User_1
abc User_12
User-12
User 12
xyz User 9
'@

And my updated desired output:

@'
<requestId>RequestId-1</requestId>Ace of Base Order: RequestId-2<something else...
<requestId>RequestId-3</requestId>
<requestId>RequestId-4</requestId> Stevie Wonder <messageId>RequestId-4</msg
<reportId>ReportId-1</msg:reportId>something here.,. <requestId>RequestId-3</requestId>
<reportId>ReportId-2</msg:reportId> uraaa 123 <keyID>KeyID-1</msgdc
Abcd <requestId>RequestId-4</requestId> abcdef ole ole Order: RequestId-3 abracadabra <keyID>KeyID-1</msgdc
Customer-1
Customer-1
Customer-2
abc Customer-3
Customer-3
Customer-3
xyz Customer-4
'@

Thanks in advance for your help.

You can just add the order part to the existing part that replaces the request id.

The users are a bit trickier because you have an inconsistent format. I would make that consistent first as it makes the find and replace simpler.

$log = 'E:\Temp\log.txt'
$changeLog = 'E:\Temp\ChangeLog.csv'
 
@'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
<reportId>plmkjh8765FGH4rt6As</msg:reportId>something here.,. <requestId>zxcvbn-zxcv12-zxcv56</requestId>
<reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
Abcd <requestId>1234qw-12qw12-123456</requestId> abcdef ole ole Order: zxcvbn-zxcv12-zxcv56 abracadabra <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
User_2
User-2
User_1
abc User_12
User-12
User 12
xyz User 9
'@ | Set-Content $log -Encoding UTF8
 
$tmp = Get-Content $log -Raw

$tmp = $tmp -replace 'User.','User-'
 
$results = $tmp | 
    Select-String -Pattern '(<ReportId>\w{19})|(<RequestId>\w{6}-\w{6}-\w{6})|(<KeyID>\w{32})|(Order:\s\w{6}-\w{6}-\w{6})|(User-\d+)' -AllMatches | 
        Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value | Select-Object -Unique
 
$reportCount  = 1
$requestCount = 1
$keyCount     = 1
$userCount = 1

foreach ($result in $results) {
 
    if ($result -like '<ReportId>*') {

        $find = ($result -replace '<ReportId>')
        $replace = "ReportID-$reportCount"
        $tmp = $tmp -replace $find,$replace
        $reportCount++

    }

    if ($result -like '<RequestId>*' -or $result -like 'Order:*') {

        $find = ($result -replace '(<RequestId>)|(Order:\s)')
        $replace = "RequestID-$requestCount"
        $tmp = $tmp -replace $find,$replace
        $requestCount++

    }
    
    if ($result -like '<KeyID>*') {

        $find = ($result -replace '<KeyID>')
        $replace = "KeyID-$keyCount"
        $tmp = $tmp -replace $find,$replace
        $keyCount++

    }

    if ($result -like '*User*') {

       $find = $result + '\W'
       $replace = "Customer-$userCount"
       $tmp = $tmp -replace $find,$replace
       $userCount++

    }

    [PSCustomObject] @{
        'Replacement Value' = $replace
        'Original Value'    = $find
    } | Export-csv $changeLog -NoTypeInformation -Append -NoClobber
 
}
 
$tmp | Set-Content $log -Encoding UTF8

 

 

Big Thanks Matt.

These days, quite busy, I am working to adapt your solution to my real data :).

Is it possible to change the pattern related to “requests and orders” only to

'(\w{6}-\w{6}-\w{6})'

This will cover all requestId’s and Order’s. How then the “foreach” part has to be modified?
I tried the following code but with no success:

foreach ($result in $results) {

    if ($result -match '\w{6}-\w{6}-\w{6}') {

        $find = $result
        $replace = "requestId-$requestCount"
        $tmp = $tmp -replace $find,$replace
        $requestCount++

    }
}

That modification worked OK for me and matched your desired output.