Substrings are wonderful thing

First I would like to thank all of the guys who post here especially Curtis. I know you don’t see me ask any questions here and that’s because I read every thread on this board and most of the things I run into have been answered before and I just go back to a previous thread.

So I had an interesting request from my coworkers The notorious phone number attribute in AD. we are in the process of switching over to a new system and the fields required are country code then the number like this +44 4323454645

because these fields can be populated by end users you can imagine the filter required to clean this up. so I started writing the code I have found out that some people where kind enough to put the country code in there for me. well that’s great and all but for the code to work I would need to a have uniform clean data so I was going to need to trim the country code off of the front of the string well how in the heck do you do this?

he is how. I created a CSV file that had the country code in one column and the country abbreviation in another. boom now I have somthing to compare to because country codes range from 1-3 numbers I cannot trim by count I had to trim by value. so as each aduser account passes through my loop I am able to select the correct country code based off an Ad attribute we have that contains there country abbreviation.

$telcode = $Codes | ?{$_.ABBR -like $user.c}

Now the fun stuff so now that I have the number I can simply trimstart the country code out right? Nope for some crazy reason if a country code is say 44 and the number is 44449837456 it will trim all the 4’s out damn you powershell.

So in short remembering a post by Curtis a while back I was able to use a substring to trim out the number of characters based on the number of characters of the country code from the start and subtract the number of characters of the country code to mark the end of the string

$Cleannumber = $Cleannumber.Substring(($TelCode.Ccode.Length + 1),($Cleannumber.Length - ($TelCode.Ccode.Length +1)))

out of 15,000 records I only have to clean up about 400 manually that passed through the filter

once again I would like to thank you guys for your help. Although you don’t see me ask you guys are a big part of my day.

If any one is interested I can post the entire code.

This is really better suited for RegEx than for substring.

Please provide a sample of your files, to include header lines. Sanitize as necessary. Do NOT use made up data.

@mark-hammonds, thanks for you kind words. I believe that this is exactly what @donj and the other founders of had in mind when they launched the site. I two have learned a great deal by participating in the forums here. PowerShell is a wonderful tool to be explored.

On you phone number challenge. Here is another approach you might find interesting. As I understand from reading your post. Some, but not all, phone numbers have the country code. Unless your are using and If statement and checking the length of your phone number first using .length, your substring could drop the first 2 numbers of your 10 digit phone number. Additionally, if the + sign, you need to trim 3 characters off the front, and if there is a space in-between, you need to trim 4 characters. One alternative approach you could take is to use RegEx to return just the 10 consecutive digits from the end of the number.

In the example below I have the same phone number 7 different ways, some with prefix, some without, some with leading or trailing spaces, some with no spaces, some with + sign, some with none, etc, etc, etc. I use -match operator and apply a RegEx expression against the phone number after first triming the whitespace from the beginning and end of the number. The RegEx pattern uses a labeled capture (a technique I learned from @bobmccoy in this very forum:, and matches 10 digits at the end of the string. -match automatically populates the $matches variable with the matches to the RegEx pattern. Since we label the pattern, we can easily get that match by name.

"+44 4323454645", " 4323454645", "44 4323454645", "4323454645", "+44 4323454645 ", "+444323454645", "444323454645" |
ForEach-Object {
    $_.trim()  -match "(?'BaseNum'\d\d\d\d\d\d\d\d\d\d$)"



I am using regex as well there are so many things I have to fiter out that may or may not be there like trunk code (o) after the country code leading 0’s if there is not country code a 0 with out the () that some people may have added. some people add x9832 for extension some use ext0984 some in other countries use al9348 it gets really strange. I will definitely try what you have posted. im always learning

there are hundreds of patterns I have to scan for. This just cleans the data I have not added the code to update the attribute yet.

Set-StrictMode -Version Latest
$output = @()
$SkippedAccounts = @()
$TrashValue = @()
$users = Get-ADUser -SearchBase "OU=_Employees,OU=XXX,DC=XXX,DC=XXX" -Filter * -Properties c,telephoneNumber,sAMAccountName | ?{$_.telephoneNumber -ne $null -and $_.c -ne $null}
$Codes = Import-Csv -Path C:\temp\Country.csv # Document with country codes
$Filter1 = "(0)","Mobex", "Tel","PBX", "Rijswijkoffice" # Filters out trash
$Filter2 = "local","extension","Officedial","office","orext","Extn","EXT","ext","Ext","loc","EX","xt","ex","al","or","x" # Filters to replace with ;ext=
Foreach($user in $users){
    $telcode = $Codes | ?{$_.ABBR -like $user.c}
        if(!($TelCode -eq $null)){
            $Cleannumber = $user.telephoneNumber.Replace(" ", "")
            if($Cleannumber -like "*/*"){$Cleannumber = $Cleannumber.Split("/")[0]}
            if($Cleannumber -like "*,*"){$Cleannumber = $Cleannumber.Split(",")[0]}
            $Cleannumber = $Cleannumber -replace "[^0-9a-zA-Z+()]"
            [string]$Clean1 = $filter1 | ?{$Cleannumber -clike "*$($_)*"}  
            if(!($Clean1 -eq "")){
                $Clean1 =  $clean1.Split(" ")[0]
                $Cleannumber = $Cleannumber.Replace($Clean1, "")
            [string]$Clean2 = $filter2 | ?{$Cleannumber -clike "*$($_)*"}  
            if(!($Clean2 -eq "")){
                $Clean2 =  $clean2.Split(" ")[0]
                $Cleannumber = $Cleannumber.Replace($Clean2, ";ext=")
            $Cleannumber = $Cleannumber -replace "[()]"
            if(!($Cleannumber -eq $null)){
                if(!(($user.telephoneNumber -replace "[^0-9+]").Length -lt 8)){
                while($Cleannumber[0] -eq "0"){$Cleannumber = $Cleannumber.TrimStart("0")} #This Trunk Code Must be dropped when dialing internationaly -- if 0's proceed number
                if($Cleannumber.Substring(0,($TelCode.Ccode.Length)) -eq $TelCode.Ccode){$Cleannumber = "+$($Cleannumber)"}
                if(!($Cleannumber -like "*+*")){$Cleannumber  = ("+$($TelCode.Ccode)" + "$($Cleannumber)")}
                if(!($Cleannumber -like "*;ext=*" )){$Cleannumber = $Cleannumber -replace "[^0-9+]"} 
                        if(!($Cleannumber.Substring(0,$TelCode.Ccode.length) -eq $TelCode.Ccode)){
                            if(!($Cleannumber -eq "" -or $Cleannumber.Length -lt 8)){ 
                                Write-Host "User $($user.SamAccountName) Updated $($user.telephoneNumber) to $($Cleannumber) $($user.c)"
                                if($Cleannumber.Substring(0,$TelCode.Ccode.length + 1) -eq "+$($TelCode.Ccode)"){
                                    $Cleannumber = $Cleannumber.Substring(($TelCode.Ccode.Length + 1),($Cleannumber.Length - ($TelCode.Ccode.Length +1)))
                                    while($Cleannumber[0] -eq "0"){$Cleannumber = $Cleannumber.TrimStart("0")} #This Trunk Code Must be dropped when dialing internationaly -- if 0's come after country code
                                    $Cleannumber = ("New +$($TelCode.Ccode) " + $Cleannumber)
                                    $Cleannumber = ("Fix " + $Cleannumber)
                                $vcheck = 0
                                $var2 = $Cleannumber.Replace(" ", "")
                                if(!($user.telephoneNumber -like "*+*")){$var1 = $TelCode.Ccode + $user.telephoneNumber}
                                Else{$var1 = $user.telephoneNumber}
                                if(($var1 -Replace("[^0-9]")) -eq ($var2 -replace("[^0-9]"))){$vcheck = 1}
                                $output += New-Object PSObject -Property([ordered]@{UserID = $user.SamAccountName;OldNumber = "Old $($user.telephoneNumber)";NewNumber = $($Cleannumber); CountryCode = $user.c;Vcheck = $vcheck})
                                #not enough characters Write code to clear feild
                                $TrashValue += New-Object PSObject -Property([ordered]@{UserID = $user.SamAccountName;OldNumber = $user.telephoneNumber;NewNumber =  $Cleannumber; CountryCode = $user.c})}                           
                    #Trash in Number Field  Write code to clear feild
                    $Cleannumber = ("New +$($TelCode.Ccode) " +$Cleannumber.TrimStart("+$($TelCode.Ccode)"))
                    $TrashValue += New-Object PSObject -Property([ordered]@{UserID = $user.SamAccountName;OldNumber = "Old $($user.telephoneNumber)";NewNumber =  $Cleannumber; CountryCode = $user.c})}
                # if new number field is blank
                Write-Host "User $($user.SamAccountName) Skipped Country ID $($user.c) not found in file!"
                $SkippedAccounts += New-Object PSObject -Property([ordered]@{UserID = $user.SamAccountName;OldNumber = "Old $($user.telephoneNumber)";CountryCode = $user.c})} 
            #country code not found
            Write-Host "User $($user.SamAccountName) Skipped Country ID $($user.c) not found in file!"
            $SkippedAccounts += New-Object PSObject -Property([ordered]@{UserID = $user.SamAccountName;OldNumber = "Old $($user.telephoneNumber)";CountryCode = $user.c})} 

also the length of the number can be from 8 -17 numbers depending on country or if they have an extension so there is no absolute length

+33 (0) 47 11 68 31 (or ext.1351)
+31 70 35 7904 Rijswijk office
+39 02 6196017 / Houston 8326197103
+40 244.36.600 ext.173
+44113276415 (internet link)
+49 5141 806 787 or 348
82-2-3210-1496 (Dir)
+966 (0)1 3 86 1617 Ext. 286
Nigeria: 0810 345 9633
PBX – 55 12 227-4400
Tel. 52.93.316.9219 al 22
VOIP 713460826
993 3 39 01 51 / 52
868-66-3663 / 868-66-6124
780-463-868 x6823
22tel34 45 2345
+44 local 203049586 
710-317-2117  EXT. 6
58 282 424897 or  58 282 426615
54 11 4314 520 / 540 ext 270
403-570-227 local 557
404-291-4814 extension 2245

Sample of CSV file

Ccode	ABBR	Country
1	US	United States
1	CA	Canada
20	EG	EG Egypt
212	MA	Morocco
213	DZ	Algeria
216	TN	Tunisia
218	LY	Libya
220	GM	Gambia
221	SN	Senegal

Wow, that is truly ugly. (the Data, not the Code) :slight_smile: I’m actually very interested in what Bob comes up with. I’ve seen some pretty impressive stuff from him.

yeah I it is worse in some places like I said my script fix 14,600 numbers out of 15,000 i’m sure there are simpler ways to implement it but i’m not that familiar with regex. With some of the stuff in the attribute adding a filter will break other parts of the code. this is about as clean as I can get it.

at first I had each word that needed to be filtered on a separate line. Then I though why not have all my filtered words in a variable and roll through the variable for matches if a match is found remove it.

[string]$Clean1 = $filter1 | ?{$Cleannumber -clike “$($_)”}
if(!($Clean1 -eq “”)){
$Clean1 = $clean1.Split(" ")[0]
$Cleannumber = $Cleannumber.Replace($Clean1, “”)

the spit is in case it triggers more then one match to use the first one. like extension, then ext, then ex, then x, so a number like 404-291-4814 extension 2245 will match x,ex and ext because it triggered a match with extension first that will be the one used.

OK, this is way more hairy than originally suggested. Here is my solution based on the original set of data you proposed above. And it did indeed solve all of those use cases. However, by the time I got ready to post it, your additional data made it way more complicated.

Maybe this will get you to the 80% solution, the rest having to be hand-coded. At any rate take a look at it and see if you can figure out where I was going.

$pattern1 = "^00(\d+)\s(\d+)$"; $replace1 = '+$1 $2'
$pattern2 = "^\+(\d+)\s\(0\)(\d+)$"; $replace2 = '+$1 $2'
$pattern3 = "^(\d+)$"; $replace3 = '+44 $1'
$pattern4 = "^(\d+)\sOffice\s(\d+)$"; $replace4 = '+44 $1;ext=$2'
$pattern5 = "^\+(\d+)\s0\s(\d+)\s(\d+)\s(\d+)$"; $replace5 = '+$1 $2$3$4'
$pattern6 = "^(\d+)x(\d+)$"; $replace6 = '+44 $1;ext=$2'
$pattern7 = "^(\d{3})(\d+)/.+$"; $replace7 = '+$1 $2'
$pattern8 = "^(\d{2})(\d+);.+$"; $replace8 = '+$1 $2'
$pattern9 = "^(\d{2})(\d+)ex(\d+)$"; $replace9 = '+$1 $2;ext=$3'
$pattern10 = "^(\d{2})(\d+) al (\d+)$"; $replace10 = '+$1 $2;ext=$3'
$pattern11 = "^my\soffice\s(\d+)\s(\d+)\s(\d+)$"; $replace11 = '+1 $1$2$3'

Get-Content -Path C:\Ephemeral\numbers.txt | foreach {
    $test = $PSItem.ToString().Trim()
    Switch -Regex ($test)
        $pattern1 { $update = $test -replace $pattern1, $replace1; break }
        $pattern2 { $update = $test -replace $pattern2, $replace2; break }
        $pattern3 { $update = $test -replace $pattern3, $replace3; break }
        $pattern4 { $update = $test -replace $pattern4, $replace4; break }
        $pattern5 { $update = $test -replace $pattern5, $replace5; break }
        $pattern6 { $update = $test -replace $pattern6, $replace6; break }
        $pattern7 { $update = $test -replace $pattern7, $replace7; break }
        $pattern8 { $update = $test -replace $pattern8, $replace8; break }
        $pattern9 { $update = $test -replace $pattern9, $replace9; break }
        $pattern10 { $update = $test -replace $pattern10, $replace10; break }
        $pattern11 { $update = $test -replace $pattern11, $replace11; break }
        default { $update = $test }
    Write-Verbose "$test changed to $update"

The rules for translation are setup in the pattern/replace pairs at the start of the script. There is no magic to the current order in which the patterns are evaluated. But if you can, you probably do want to evaluate them in the most to least popular order.

There are definitely more refined ways to do this, however, this seemed like a one-time effort to standardize.

man I like Regex can you suggest a book to learn it? I love how simple that is all tho the regex looks like gibberish to me lol

I had to do this for my main filter

$Filter2 = "local","extension","Officedial","office","orext","Extn","EXT","ext","Ext","loc","EX","xt","ex","al","or","x"

            [string]$Clean2 = $filter2 | ?{$Cleannumber -clike "*$($_)*"}  
            if(!($Clean2 -eq "")){
                $Clean2 =  $clean2.Split(" ")[0]
                $Cleannumber = $Cleannumber.Replace($Clean2, ";ext=")

The tool I use most often is RegEx Buddy ( It was well worth the $39 and you can run it on multiple machines if you are the exclusive user, or multiple users on a single macine.

there are relatively few patterns I uses a lot. The others I have to do a little digging for.

There are also several sites that will test regex for you.

The problem is that you are asking the computer to recognize patterns. This is something that humans doe fairly well even in the event of inconsistent and varying patterns. Computers on the other had do not deal with ambiguity very well. For instance, from your original data (now gone from this post) …

239439658459/ 392432932923
935495849394; 324934945944

One yielded a 2-digit country code and the other a 3-digt CC. So I based the RegEx recognition of the CCs based on the delimiter (slash or semi-colon). You have to “stick it” to something.

I have the following O’Reilly books:

  • Mastering Regular Expressions
  • Regular Expressions Cookbook
  • Regular Expressions Pocket Reference

However, with the online references and RegEx Buddy I can usually figure out what I want.

I just bought

Beginning Regular ExpressionsFeb 4, 2005
by Andrew Watt

PowerShell and the attendant RegEx capability that bubbles up from the .NET implementation is an unbeatable combination for text manipulation.