Powershell Regex Help

I am trying to convert phone numbers from various formats to a single format.

$phone = Read-Host "Enter Phone Number"

Switch -Regex ($phone) {
    '^\+(\d{11})' { 
        Write-Host "Match 1"
        $fmtPhone = $phone -replace $pattern, '+$1 ($2) $3-$4 x$5'
    }
    '^\+(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)' {
        Write-Host "Match 2"
        $fmtPhone = $phone -replace $pattern, '+$1 ($2) $3-$4 x$5'
    }
    '^(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)' {
        Write-Host "Match 3"
        $fmtPhone = $phone -replace $pattern, '+$1 ($2) $3-$4 x$5'
    }
    '^\+(\d) (\d{3})-(\d{3})-(\d{4}) x(\d{4})' { 
        Write-Host "Match 4"
        $fmtPhone = $phone -replace $pattern, '+$1 ($2) $3-$4 x$5'
    }
    default {
        Write-Host "No Match"
    }
}
Write-Host "Formatted Number: " $fmtPhone

If I provide a number like +12345678901, then I get:
Match 1
Formatted Number: +1 (234) 567-8901

If I provide a number like +12345678901;ext=2345, then I get:
Match 2
Formatted Number: +1 (234) 567-8901 x2345

If I provide a number like 123445678901;ext-2345, then I get:
Match 3
Formatted Number: +1 (234) 567-8901 x2345

If I provide a number like +1 234-567-8901 x2345, then I get:
Match 4
Formatted Number: +1 234-567-8901 x2345

I can’t figure out why it is matching my regex/pattern, but it is not replacing it in the right format.
I have searched several Regex resources and I believe that the pattern works because it is triggering the correct Switch condition.

Thank you,
Scott

What’s in $pattern?

1 Like

As Olaf asked, we need to know what your regex in $pattern is before we can offer any advice on it.

1 Like

First note, your example 3 is an invalid phone number, has an extra 4. Second note, I feel you’ve overcomplicated this. The beauty of regex is it’s so flexible and versatile. I’ve adjusted the sample numbers a bit.

# list of sample numbers stored in an array
$cases = @'
+12345678901
+14325678901;ext=2345
13425678901;ext-5432
+1 423-567-8901 x119
'@ -split '\r?\n'

# the single magic regex pattern
$pattern = '^\+?(?<Country>\d)\s?(?<Area>\d{3})(\s|-)?(?<Prefix>\d{3})(\s|-)?(?<Suffix>\d{4})(.*?(?<Extension>\d{3,4}))?'

# process each number and extract/format the desired data
foreach($case in $cases){
    if($case -match $pattern){
        "+$($Matches.Country) ($($matches.Area))-$($Matches.Prefix)-$($Matches.Suffix) x$($Matches.Extension)"
    }
}

Here is the output

+1 (234)-567-8901 x
+1 (432)-567-8901 x2345
+1 (342)-567-8901 x5432
+1 (423)-567-8901 x119

Now if you really have some without extensions and you don’t want that x there, you can run a simple if statement.

foreach($case in $cases){
    if($case -match $pattern){
        if($matches.Extension){
            "+$($Matches.Country) ($($matches.Area))-$($Matches.Prefix)-$($Matches.Suffix) x$($Matches.Extension)"
        }
        else{
            "+$($Matches.Country) ($($matches.Area))-$($Matches.Prefix)-$($Matches.Suffix)"
        }
    }
}

Updated output

+1 (234)-567-8901
+1 (432)-567-8901 x2345
+1 (342)-567-8901 x5432
+1 (423)-567-8901 x119
4 Likes

Show-off! :winking_face_with_tongue: :winking_face_with_tongue: :wink: :face_blowing_a_kiss: :face_blowing_a_kiss:

1 Like

I love regular expressions. I am still very lacking in skills with it, so I typically jump on any chance to improve on it.

I’m just kidding. :wink: :face_blowing_a_kiss: … and I’d strongly disagree that you’re …

I think you’re ahead of the vast majority of the PowerShell coders out there.

(Your statement sounds like a mild version of an Impostor syndrome :wink: )

1 Like

I totally agree. Krazy Doug has helped me in the past with odd ball regex matches and I am very grateful :slight_smile:

From what I found searching, $pattern should have been the Switch condition, maybe this is where I was wrong.

For the first condition ‘^+(\d{11})’, I assumed that $pattern = ‘^+(\d{11})’ because the script would output the number correctly.

I changed $pattern to reflect the pattern that the Switch condition was evaluating and it all works.

Correction, it snagged a bit.

I could have numbers like these:

+12345678901
(123) 456-7890 x 1234
+1 (123) 456-7890
+1 123-456-7890
+1 123-456-7890 x1234
+1 123 456 7890
+11234567890;ext=1234
tel:=11234567890;ext=1234
1234567890
123-456-7890

No matter what format that it finds, I always want the output to be:
w/o Ext: +1 (123) 456-7890
w/ Ext: +1 (123) 456-7890 x1234

I am not dealing with any international numbers (ex. +44, +49, etc.)

??? What? Didn’t you try the code before?

How??? The replace pattern references groups the search pattern does not have!!! :man_shrugging:

Show the code you’re using!!

Please share the code!

Did you actually read all replies you’ve got?

Updated Code: I did incorporate the suggestion by krzydoug to the end to see how his code ran compared to mine

$phone = Read-Host "Enter Phone Number"
$fmtPhone = ""

Switch -Regex ($phone) {
    '^\+(\d{11})' { 
        Write-Host "Match 1"
        $fmtPhone = $phone -replace '^\+(\d)(\d{3})(\d{3})(\d{4})', '+$1 ($2) $3-$4'
    }
    '^\+(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)' {
        Write-Host "Match 2"
        $fmtPhone = $phone -replace '^\+(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)', '+$1 ($2) $3-$4 x$5'
    }
    '^(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)' {
        Write-Host "Match 3"
        $fmtPhone = $phone -replace '^(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)', '+$1 ($2) $3-$4 x$5'
    }
    '^\+(\d) (\d{3})-(\d{3})-(\d{4}) x(\d{4})' { 
        Write-Host "Match 4"
        $fmtPhone = $phone -replace '^\+(\d) (\d{3})-(\d{3})-(\d{4}) x(\d{4})', '+$1 ($2) $3-$4 x$5'
    }
    '^\((\d{3})\) (\d{3})-(\d{4}) x(\d{4})' { 
        Write-Host "Match 5"
        $fmtPhone = $phone -replace '^(\d{3}) (\d{3})-(\d{4}) x(\d{4})', '+$1 ($2) $3-$4 x$5'
    }
    '^\+(\d) (\d{3}) (\d{3})-(\d{4}) x(\d{4})' { 
        Write-Host "Match 6"
        $fmtPhone = $phone -replace '^\+(\d) (\d{3}) (\d{3})-(\d{4}) x(\d{4})', '+$1 ($2) $3-$4 x$5'
    }
    default {
        Write-Host "No Match"
    }
}
Write-Host "Formatted Number: " $fmtPhone


$pattern = '^\+?(?<Country>\d)\s?(?<Area>\d{3})(\s|-)?(?<Prefix>\d{3})(\s|-)?(?<Suffix>\d{4})(.*?(?<Extension>\d{3,4}))?'
Write-Host "Regex Number: "

if($phone -match $pattern){
    if($matches.Extension){
        "+$($Matches.Country) ($($matches.Area)) $($Matches.Prefix)-$($Matches.Suffix) x$($Matches.Extension)"
    }
    else{
        "+$($Matches.Country) ($($matches.Area)) $($Matches.Prefix)-$($Matches.Suffix)"
    }
}

Here is a sampling of different telephone number formats that I found and ran each one

(123) 456-7890 x1234
+1 (123) 456-7890
+1 (123) 456-7890 x1234
+1 (123) 456-7890 x.1234
+1 123 456 7890 x1234
+1 123-456-7890 x1234
+1 123 456 7890
+11234567890
+11234567890;ext=1234
11234567890;ext=1234
123456790
123-456-7890
tel:+1123456790;ext=1234
+  +1 (123) 4567890 x.1234
+1 +11234567890
11234567890

Enter Phone Number: (123) 456-7890 x1234
Match 5
Formatted Number: (123) 456-7890 x1234
Regex Number:

Enter Phone Number: +1 (123) 456-7890
No Match
Formatted Number:
Regex Number:

Enter Phone Number: +1 (123) 456-7890 x1234
No Match
Formatted Number:
Regex Number:

Enter Phone Number: +1 (123) 456-7890 x.1234
No Match
Formatted Number:
Regex Number:

Enter Phone Number: +1 123 456 7890 x1234
No Match
Formatted Number:
Regex Number:
+1 (123) 456-7890 x1234

Enter Phone Number: +1 123-456-7890 x1234
Match 4
Formatted Number: +1 (123) 456-7890 x1234
Regex Number:
+1 (123) 456-7890 x1234

Enter Phone Number: +1 123 456 7890
No Match
Formatted Number:
Regex Number:
+1 (123) 456-7890

Enter Phone Number: +11234567890
Match 1
Formatted Number: +1 (123) 456-7890
Regex Number:
+1 (123) 456-7890

Enter Phone Number: +11234567890;ext=1234
Match 1
Match 2
Formatted Number: +1 (123) 456-7890 x1234
Regex Number:
+1 (123) 456-7890 x1234

Enter Phone Number: 11234567890;ext=1234
Match 3
Formatted Number: +1 (123) 456-7890 x1234
Regex Number:
+1 (123) 456-7890 x1234

Enter Phone Number: 123456790
No Match
Formatted Number:
Regex Number:

Enter Phone Number: 123-456-7890
No Match
Formatted Number:
Regex Number:

Enter Phone Number: tel:+1123456790;ext=1234
No Match
Formatted Number:
Regex Number:

Enter Phone Number: + +1 (123) 4567890 x.1234
No Match
Formatted Number:
Regex Number:

Enter Phone Number: +1 +11234567890
No Match
Formatted Number:
Regex Number:

Enter Phone Number: 11234567890
No Match
Formatted Number:
Regex Number:
+1 (123) 456-7890

You know you just changed the precondition, don’t you?

… 'used your list of input numbers and a loop and a PSCustomObject to make the output easier for the eyes …

$phoneList = (
    '(123) 456-7890 x1234',
    '+1 (123) 456-7890',
    '+1 (123) 456-7890 x1234',
    '+1 (123) 456-7890 x.1234',
    '+1 123 456 7890 x1234',
    '+1 123-456-7890 x1234',
    '+1 123 456 7890',
    '+11234567890',
    '+11234567890;ext=1234',
    '11234567890;ext=1234',
    '123456790',
    '123-456-7890',
    'tel:+1123456790;ext=1234',
    '+  +1 (123) 4567890 x.1234',
    '+1 +11234567890',
    '11234567890'
)

foreach ($phone in $phoneList) {
    Switch -Regex ($phone) {
        '^\+(\d{11})' { 
            $MatchNr = 1
            $fmtPhone = $phone -replace '^\+(\d)(\d{3})(\d{3})(\d{4})', '+$1 ($2) $3-$4'
        }
        '^\+(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)' {
            $MatchNr = 2
            $fmtPhone = $phone -replace '^\+(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)', '+$1 ($2) $3-$4 x$5'
        }
        '^(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)' {
            $MatchNr = 3
            $fmtPhone = $phone -replace '^(\d)(\d{3})(\d{3})(\d{4});ext=(\d+)', '+$1 ($2) $3-$4 x$5'
        }
        '^\+(\d) (\d{3})-(\d{3})-(\d{4}) x(\d{4})' { 
            $MatchNr = 4
            $fmtPhone = $phone -replace '^\+(\d) (\d{3})-(\d{3})-(\d{4}) x(\d{4})', '+$1 ($2) $3-$4 x$5'
        }
        '^\((\d{3})\) (\d{3})-(\d{4}) x(\d{4})' { 
            $MatchNr = 5
            $fmtPhone = $phone -replace '^(\d{3}) (\d{3})-(\d{4}) x(\d{4})', '+$1 ($2) $3-$4 x$5'
        }
        '^\+(\d) (\d{3}) (\d{3})-(\d{4}) x(\d{4})' { 
            $MatchNr = 6
            $fmtPhone = $phone -replace '^\+(\d) (\d{3}) (\d{3})-(\d{4}) x(\d{4})', '+$1 ($2) $3-$4 x$5'
        }
        default {
            $MatchNr = 0
        }
    }

    $pattern = '^\+?(?<Country>\d)\s?(?<Area>\d{3})(\s|-)?(?<Prefix>\d{3})(\s|-)?(?<Suffix>\d{4})(.*?(?<Extension>\d{3,4}))?'
    $RegexNumber =
    if ($phone -match $pattern) {
        if ($matches.Extension) {
            "+$($Matches.Country) ($($matches.Area)) $($Matches.Prefix)-$($Matches.Suffix) x$($Matches.Extension)"
        }
        else {
            "+$($Matches.Country) ($($matches.Area)) $($Matches.Prefix)-$($Matches.Suffix)"
        }
    }
    [PSCustomObject]@{
        InputNumber     = $Phone
        MatchNr         = $MatchNr
        FormattedNumber = $fmtPhone
        RegexNumber     = $RegexNumber
    }
}

Output looks like this:

InputNumber                MatchNr FormattedNumber         RegexNumber
-----------                ------- ---------------         -----------
(123) 456-7890 x1234             5 (123) 456-7890 x1234
+1 (123) 456-7890                0 (123) 456-7890 x1234
+1 (123) 456-7890 x1234          0 (123) 456-7890 x1234
+1 (123) 456-7890 x.1234         0 (123) 456-7890 x1234
+1 123 456 7890 x1234            0 (123) 456-7890 x1234    +1 (123) 456-7890 x1234
+1 123-456-7890 x1234            4 +1 (123) 456-7890 x1234 +1 (123) 456-7890 x1234
+1 123 456 7890                  0 +1 (123) 456-7890 x1234 +1 (123) 456-7890
+11234567890                     1 +1 (123) 456-7890       +1 (123) 456-7890
+11234567890;ext=1234            2 +1 (123) 456-7890 x1234 +1 (123) 456-7890 x1234
11234567890;ext=1234             3 +1 (123) 456-7890 x1234 +1 (123) 456-7890 x1234
123456790                        0 +1 (123) 456-7890 x1234
123-456-7890                     0 +1 (123) 456-7890 x1234
tel:+1123456790;ext=1234         0 +1 (123) 456-7890 x1234
+  +1 (123) 4567890 x.1234       0 +1 (123) 456-7890 x1234
+1 +11234567890                  0 +1 (123) 456-7890 x1234
11234567890                      0 +1 (123) 456-7890 x1234 +1 (123) 456-7890

What if multiple regex filters are not the answer?

$phoneList = (
    '(123) 456-7890 x1234',
    '+1 (123) 456-7890',
    '+1 (123) 456-7890 x1234',
    '+1 (123) 456-7890 x.1234',
    '+1 123 456 7890 x1234',
    '+1 123-456-7890 x1234',
    '+1 123 456 7890',
    '+11234567890',
    '+11234567890;ext=1234',
    '11234567890;ext=1234',
    '123456790',
    '123-456-7890',
    'tel:+1123456790;ext=1234',
    '+  +1 (123) 4567890 x.1234',
    '+1 +11234567890',
    '11234567890'
)

foreach ($phone in $phoneList) {
    # split the phone on the 'x' character
    $parts = $phone -split 'x', 2
    # remove all non-digit characters from the first part
    $mainNumber = ($parts[0] -replace '\D', '')
    # if there is a second part, remove all non-digit characters from it
    if ($parts.Count -gt 1) {
        $extension = ($parts[1] -replace '\D', '')
    }
    # if the main number has 10 digits, add the 1 country code
    if ($mainNumber.Length -eq 10) {
        $mainNumber = "1$mainNumber"
    }
    # format the main number as +1 (123) 456-7890
    if ($mainNumber.Length -eq 11) {
        $formattedMainNumber = "+$($mainNumber.Substring(0, 1)) ($($mainNumber.Substring(1, 3))) $($mainNumber.Substring(4, 3))-$($mainNumber.Substring(7, 4))"
    } else {
        Write-Warning "Invalid phone number: $phone"
        continue
    }
    # if there is an extension, add it to the formatted number
    if ($extension) {
        $formattedMainNumber += " x$extension"
    }
    $output = [PSCustomObject]@{
        Original = $phone
        Formatted = $formattedMainNumber
    }
    write-output $output
}

Two of the numbers in your list are invalid by the logic of this code - those might have to be handled differently.

It’s best if you provide all the pertinent information up front.

We can simplify the regex pattern and match more of these examples.

$cases = @'
(123) 456-7890 x1234
+1 (123) 456-7890
+1 (123) 456-7890 x1234
+1 (123) 456-7890 x.1234
+1 123 456 7890 x1234
+1 123-456-7890 x1234
+1 123 456 7890
+11234567890
+11234567890;ext=1234
11234567890;ext=1234
123456790
123-456-7890
tel:+1123456790;ext=1234
+  +1 (123) 4567890 x.1234
+1 +11234567890
11234567890
'@ -split '\r?\n'

$pattern = '^\D*?(?<Country>\d?)\D*?(?<Area>\d{3})\D*?(?<Prefix>\d{3})\D*?(?<Suffix>\d{4})(\D*?(?<Extension>\d{3,4}))?\D*?$'

# process each number, try and extract/format the desired data. Output is collected in the $results variable
$results = foreach($case in $cases){
    $matched, $formatted = if($case -match $pattern){
        Write-Host "$($case) matches" -ForegroundColor Green

        # case matched, write true, will collect in the $matched variable
        $true

        # if no country code present, set country code to 1 for consistency
        if(-not $matches.Country){$matches.country = 1}

        # output the formatted phone number, will collect in the $formatted variable
        if($matches.Extension){
            "+$($Matches.Country) ($($matches.Area))-$($Matches.Prefix)-$($Matches.Suffix) x$($Matches.Extension)"
        }
        else{
            "+$($Matches.Country) ($($matches.Area))-$($Matches.Prefix)-$($Matches.Suffix)"
        }
    }
    else{
        Write-Host "$($case) does not match" -ForegroundColor yellow
        
        # case did not match, write $false, will collect in the $matched variable
        $false
    }
    
    [PSCustomObject]@{
        Original       = $case
        PatternMatched = $matched
        Formatted      = $formatted
    }
}

# output the results 
$results

I added a write-host to show which examples matched and which didn’t.

You can see the two that didn’t match aren’t valid numbers. The first has has only 9 numbers, the second has 11 numbers (duplicated country code). All the rest match and were output as an object containing the original number, if it matched the pattern or not, and the formatted result.

To get just the formatted numbers, simply extract the value of the Formatted property. This can be done in many ways, here are a few

# member enumeration via dot notation
$results.Formatted 

# Select-Object 
$results | Select-Object -ExpandProperty Formatted

# ForEach-Object
$results | ForEach-Object -Property Formatted

Any of these 3 should output

+1 (123)-456-7890 x1234
+1 (123)-456-7890
+1 (123)-456-7890 x1234
+1 (123)-456-7890 x1234
+1 (123)-456-7890 x1234
+1 (123)-456-7890 x1234
+1 (123)-456-7890
+1 (123)-456-7890
+1 (123)-456-7890 x1234
+1 (123)-456-7890 x1234
+1 (123)-456-7890
+1 (112)-345-6790 x1234
+1 (123)-456-7890 x1234
+1 (123)-456-7890

Also note I added a country code of 1 to those that did not have a country code, for consistency.

2 Likes

Here’s another take on parsing US phone numbers. If the number of items to be processed is large, using a pre-compiled regex would speed things up. Using them, though, means using .Net instead of ‘native’ PowerShell.

$numbers = @'
'(123) 456-7890 x1234',
'+1 (123) 456-7890',
'+1 (123) 456-7890 x1234',
'+1 (123) 456-7890 x.1234',
'+1 123 456 7890 x1234',
'+1 123-456-7890 x1234',
'+1 123 456 7890',
'+11234567890',
'+11234567890;ext=1234',
'11234567890;ext=1234',
'123456790',
'123-456-7890',
'tel:+1123456790;ext=1234',
'+  +1 (123) 4567890 x.1234',
'+1 +11234567890',
'11234567890'
'@ -split "`n"

# normalizes the telephone number
# Remove everything except digits, and 'x' (the "ext" becomes just "x",and "tel:=" just disappears)
$regexclean     = [regex]::new('[^\dx]', [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
# get the 10 digit phone number
$regexnumber    = [regex]::new('1?(\d{3})(\d{3})(\d{4})', [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
# get the extension
$regexext       = [regex]::new('x(\d{1,5}$)', [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)

foreach ($number in $numbers) {
    $clean = $regexclean.Replace($number, '')
    $match = $regexnumber.Match($clean)
    if ($match.Success){
        $formatted = "+1 ({0}) {1}-{2}" -f $Match.Groups[1].value, $Match.Groups.value[2], $Match.Groups[3].value
        $match = $regexext.Match($clean)
        if ($match.Success){
            $formatted = "{0} x{1}" -f $formatted, $Match.Groups[1].value
        }
    } else {
        $formatted = "Invalid number: {0}" -f $number
    }

    Write-Output $formatted
}