Powershell select characters instead of lines

Hi people!

I am trying to crawl som websites to determine their postal number.
So far its working fine, as long as the postal number is “alone” on a line in the InnerHtml.
Normally it looks something like:

postalcode: “95135”
Now, my output here would be 95135 as desired

However, sometimes it looks like:
postalcode: “95135” whatever: “436363” somethingelse:“77334”
Which gives me the output 9513543636377334

Does anyone have any idea how to tell powershell to just get the 8 characters after postalcode, instead of the entire line?

$Site = "domain.com"
$Request = Invoke-WebRequest -URI $Site
$imsorted = $Request.AllElements | Where-Object {$_.InnerHtml -like "*postalCode*"} |
Sort-Object { $_.InnerHtml.Length } | Select-Object InnerText -First 1

$sortingnumbers = $imsorted = $imsorted -replace "[^0-9]"


write-host "Data: " $sortingnumbers

The first X characters is pretty straight-forward using substring or REGEX:

PS C:\Users\rasim>
 $innerHtml = '95135 436363 77334'

$innerHtml.Substring(0,5)
95135
PS C:\Users\rasim> $innerHtml -match '^[0-9]{5}'
True
PS C:\Users\rasim> $matches


Name                           Value
----                           -----
0                              95135

Hi Rob!

Ah, i have tried to do that, but i need to determine the “starting point” of the characters i need.

The value that i am looking for, is often somewhere in the middle of a long substring

For an example, a line could be:
conentinformation detailing delieveries for the department.???",“address”:{"@type":“postalAddress”,“streetAddress”:“mainstr 7”,“addressLocality”:“newyork”,“postalCode”:“5900”,“addressCountry”:“US”},“url”:“https://test.com

So the part i need, which in this case is just “5900” is still eluding me.
Can i in some way choose that (postalCode) has to the my “starting point” for the substring i want?

That appears to be JSON, so you should be able to use ConvertFrom-Json or use Invoke-RestMethod and it will automatically parse it to an object so you can just get .postalCode

Hi again Rob!

If i try to yse convertfrom-json i just get a output where the postalcode section is not present.
If i run this:

          
            write-host "Trying to get postalcode"
            $invoker = Invoke-WebRequest "air-tech.dk" 
            $postalcheck = $invoker.tostring() -split "[`r`n]" | select-string "postalCode" | Select-Object -First 1 | ConvertFrom-Json
            $postalcheck = $postalcheck # -replace "[^0-9]"

            write-host "This is the postal data: "
            write-host "-- -- --"
            write-host  $postalcheck
         

My output is this:

Trying to get postalcode
This is the postal data: 
-- -- --
@{@context=https://schema.org; @type=LocalBusiness; name=AIR-TECH.dk ApS; description=Det er vigtigt at lufte ud, men det er bare ikke altid, at det helt kan fjerne den d??rlige lugt og usunde partikler fra rygning, stearin
lys, br??ndeovn, fugt, inkontinens, skimmelsvamp og andet. I stedet kan den bedste l??sning faktisk v??re en luftrenser til hjemmet.

???AIR-TECH's luftrenser MAC500s renser indeklimaet ved hj??lp af en speciel lampe, der laver ultraviolet lys. Det sker helt uden brug af kemikalier eller filtre.??????; address=; url=https://www.air-tech.dk; telephone=+454
0519933; email=air-tech@air-tech.dk; additionalType=http://www.productontology.org/id/Air_purifier; sameAs=System.Object[]}

PS C:\Users\Dern> 

Instead of this, if i dont add Convertfrom-json

Trying to get postalcode
This is the postal data: 
-- -- --
	{"@context":"https://schema.org","@type":"LocalBusiness","name":"AIR-TECH.dk ApS","description":"Det er vigtigt at lufte ud, men det er bare ikke altid, at det helt kan fjerne den d??rlige lugt og usunde partikler fra rygn
ing, stearinlys, br??ndeovn, fugt, inkontinens, skimmelsvamp og andet. I stedet kan den bedste l??sning faktisk v??re en luftrenser til hjemmet.\r\n\r\n???AIR-TECH's luftrenser MAC500s renser indeklimaet ved hj??lp af en sp
eciel lampe, der laver ultraviolet lys. Det sker helt uden brug af kemikalier eller filtre.??????","address":{"@type":"postalAddress","streetAddress":"Mj??lbyvej 7","addressLocality":"Rudk??bing","postalCode":"5900","addres
sCountry":"DK"},"url":"https://www.air-tech.dk","telephone":"+4540519933","email":"air-tech@air-tech.dk","additionalType":"http://www.productontology.org/id/Air_purifier","sameAs":["https://www.facebook.com/MAC500.dk/","htt
ps://www.proff.dk/firma/air-tech/rudk%C3%B8bing/maskiner-og-udstyr/GMFZMTI10PO/","https://www.krak.dk/air-tech+rudk%C3%B8bing/66807822/firma"]}

I tried something else, instead then converting to CSV to try and catch it there.

But also, not quite working :frowning:

Script:

           write-host "Trying to get postalcode"
            $invoker = Invoke-WebRequest "air-tech.dk" 
            #$postalcheck = $invoker.tostring()
            #$postalcheck = $postalcheck # -replace "[^0-9]"

            $postalcheckcsv = $invoker | ConvertTo-Csv -UseCulture -NoTypeInformation | Select-Object "postalCode"

            write-host "This is the postal data: "
            write-host "-- -- --"
            write-host  $postalcheckcsv

Output:

Trying to get postalcode
This is the postal data: 
-- -- --
@{postalCode=} @{postalCode=}

PS C:\Users\Dern> 

Hi again :smiley:

I tried something alot different, that seemed to work and its also simpler in my opinion:

            write-host "Trying to get postalcode"
            $invoker = Invoke-WebRequest "test-domain.dk" 

            $Inputstring = $invoker
            $CharArray =$InputString.Split(",")
            $isolatedpostal = $CharArray | select-string "postalCode"
            $strippedpostal = $isolatedpostal -replace "[^0-9]"

            write-host $strippedpostal

JSON can be parsed to an object and then flattened into a CSV. This is your JSON:

$json = @'
{
    "@context": "https://schema.org",
    "@type": "LocalBusiness",
    "name": "AIR-TECH.dk ApS",
    "description": "Det er vigtigt at lufte ud, men det er bare ikke altid, at det helt kan fjerne den d??rlige lugt og usunde partikler fra rygning, stearinlys, br??ndeovn, fugt, inkontinens, skimmelsvamp og andet. I stedet kan den bedste l??sning faktisk v??re en luftrenser til hjemmet.\r\n\r\n???AIR-TECH's luftrenser MAC500s renser indeklimaet ved hj??lp af en speciel lampe, der laver ultraviolet lys. Det sker helt uden brug af kemikalier eller filtre.??????",
    "address": {
      "@type": "postalAddress",
      "streetAddress": "Mj??lbyvej 7",
      "addressLocality": "Rudk??bing",
      "postalCode": "5900",
      "addressCountry": "DK"
    },
    "url": "https://www.air-tech.dk",
    "telephone": "+4540519933",
    "email": "air-tech@air-tech.dk",
    "additionalType": "http://www.productontology.org/id/Air_purifier",
    "sameAs": [
      "https://www.facebook.com/MAC500.dk/",
      "https://www.proff.dk/firma/air-tech/rudk%C3%B8bing/maskiner-og-udstyr/GMFZMTI10PO/",
      "https://www.krak.dk/air-tech+rudk%C3%B8bing/66807822/firma"
    ]
  }
'@ | ConvertFrom-Json

#Dot notation to get Postal Code
$json.address.postalCode

Try simply changing Invoke-WebRequest to Invoke-RestMethod and then use dot notation to get to the postal code.

1 Like

Hi Rob!

Thanks for your input.

I ended up with this solution for isolating the postal number:

$Responserest = Invoke-RestMethod -Uri $DomainName

$fourdigitvalues = ($Responserest | Select-String -Pattern "(?<!\d)(\d{4})(?!\d)" -AllMatches ).Matches.Value | Select-Object -Unique

Afterwards i will be matching the postal numbers with actual postal numbers for further isolation of correct candidates :slight_smile: