Match (part of) string to an array of (sub)strings

Hi,

I’ve been wondering how to best deal with this.

For example, if I have these:

$CNs = 'domain.root/OU1', 'domain.root/OU2'
$CN = 'domain.root/OU2/SubOU'

What I’ve been doing is this:

$CNs.ForEach({ if ($CN -match $PSItem) { write-host "$PSItem matches" } })

Since IRL I often deal with such arrays of >50k objects, I find this to be sub-optimal and I was wondering if there would be a better way to match against the array directly instead of iterating thru it.

In fact, here’s a sample of the real code I’m currently working on. The table is from a spreadsheet I’m importing. There are empty lines in there since it’s maintained by another team, so just checking to skip those. I pull the index of the table to use later in the script to populate cmpObj from certain properties from the table…

$businessEntityDefaultTable.CN.ForEach({
        if (-not [string]::IsNullOrWhiteSpace($_)) {
            if ($cmpObj.CanonicalName -match $_) {
                $indexBusinessEntityDefaultTable = $businessEntityDefaultTable.CN.IndexOf($_)
            }
        }
    })

Thanks!

Depending on the complexity and amount of patterns instead of using a loop you could try to combine the two patterns in one

$CNs = 'domain.root/OU1', 'domain.root/OU2'
$CN = 'domain.root/OU2/SubOU'

if($CN -match ($CNs -join '|')) { $Matches[0]}

That’s an excellent suggestion. It works quite nicely and for smaller arrays it’s a lot faster! I’ll check later today how it fares with larger arrays.

Thank you so much! :slight_smile:

@cythraul - to answer to your comment regarding matching on a string without a loop, I believe $Matches may only return the first match encountered under this condition. I tried different strategies on a tab-delimited string to see if I could return multiple instances, but each time I only returned the first instance of the match—I even tested versions of the |Foreach-Object{$Matches} loop.

Therefore, I’m modifying a similar reply to meet your post. The output is a list of all array members that match the specified criteria. I tacked on a identity value for easy identification (i.e., “I pull the index of the table to use later”).

Procedure

  • The canonical names are stored in a array.
  • The search/ match criteria are stored in a variable that is formatted for regex matching.
  • A custom class object, CanonicalName, provides the blueprint for the data.
    • Preference: inside the class, CanonicalName($q1,$q2) is a constructor (ctor). I use ctors on small classes to keep loops clean.
  • An array list (uses the namespace declaration) object stores the results of the match.
  • A for loop iterates over an integer series and executes an if conditional.
  • The if conditional calls up the array member at the index matching the current iteration.
  • When true, a variable instantiates the CanonicalName class and the variable is added to the array list.

Regex is very efficient for matching (which -match performs). At first, regex was unintelligible to me, but a cheat sheet, a book, and practice has paid off.

Demo Code

using namespace System.Collections;

$canonicalNames = @("domain.com/Administration/country4/salary/Navision/Server1",
"domain.com/Administration/country3/salary/Navision/2017/Server2",
"domain3.com/SomethingElse/country2/salary/Navision/2019/Server3",
"domain.com/Administration/country2/salary/Server4",
"domain5.com/NotAdministration/country4/salary/saladShooter/yada3/yada2/yada1/Server5",
"domain.com/Administration/country2/Compliance/Bob, Billy",
"",
"domain2.com/Administration/country2/Compliance/Doe, Jane",
"domain.com/Administration/country4/HR/Server33",
"",
"domain5.com/Administration/country3/HR/Server11",
"domain.com/Administration/country2/HR/Server55",
"domain.com/Partners/Gold/BigWig, Senior",
"domain.com/Partners/Silver/BigWig, Junior")

$matchCriteria = "^domain.com/(\bAdministration\b|Partners(?=/\bGold\b))";

Class CanonicalName
{
    [int] $_id
    [string] $CanonicalName;

    CanonicalName($q1,$q2)
    {
        $this._id = $q1;
        $this.CanonicalName = $q2;
    }
}

$results = [ArrayList]::new();
for ($i = 0; $i -lt $canonicalNames.Count; $i++)
{
    if($canonicalNames[$i] -match $matchCriteria)
    {
        $x = [CanonicalName]::new($i,$canonicalNames[$i]);
        [void]$results.Add($x);
    }
}

cls
$results;

Match criteria

  • Item must begin with domain.com/.
  • Item must include the word Administration or Partners but only when the latter is followed by \Gold.

Preference: I use \b word boundaries to reduce the chance of character pattern matches.

Results

7 out of 14 items matched the criteria.

Remarks

Should you adapt the demo code for your own purposes, play around with the match criteria. For example, I set up the Partners only-when-followed-by \Gold to demonstrate the lookahead assertion. This patterning would not meet your domain.root/OU2 and domain.root/OU2/SubOU standard.

See also

Regular Expressions Cheatsheet