Searching for Keyword on WebSite

Hi there,
i’m currently running a script that will check certain websites (own hosted & owned) for some keywords but it is not that fast. Maybe you have an idea how to speed this a bit up.

# Load SiteContent
            $ProgressPreference = "SilentlyContinue"
            $SiteContent = (Invoke-WebRequest $SiteLoc -UseBasicParsing -Headers @{"Cache-Control"="no-cache"}).Content
foreach ($word in $Keywordlist){
    if ($sitecontent.contains($word)){ 
		$objproperties=@{
			Site = $SiteLoc
			Keyword = $Word
			Reason = "new"
		}
		$global:site_text_found += New-Object psobject -Property $objproperties 
		$global:KeyWordFoundOnSite = "Found"
    }
}

Cheers Shorty

Shorty,
Welcome to the forum. :wave:t4:

Hmmm … your code looks convoluted and confusing to me. How does your $KeywordList look like? And what output would you expect? How about using Select-String?

I’d start with something like this:

$SiteContent = (Invoke-WebRequest $SiteLoc -UseBasicParsing -Headers @{"Cache-Control" = "no-cache" }).Content
$SiteContent | Select-String -Pattern $Keywordlist

Hey Olaf,
thanks :slight_smile: . $KeywordList is an array of stings.

$keywordList = @("word1","word2","word3","word4")

I also tested it with Select-String but it looks like that it is a bit slower. The script need to process around 4000 sites.

You could have mentioned that before. If you want to know where in script the bottle neck is you should measure it. I’d assume the Invoke-WebRequest takes some time. And if you have more than one you’re pretty much out of luck if you cannot speed this up. :man_shrugging:t4:

Hey Olaf,
ok :frowning: . I know that Invoke-Webrequest will take some time. But there is no chance to check $SiteContent for all keywords at the same time?

Hmmm … isn’t that exactly what Select-String does when you provide the keyword array directly to it?

Ah, overseen it :smiley: - will test it. :wink:

Hey Olaf,
looks good - thanks for that. But one other question. Does it make sense to check the site if a specific word from the list was detected word by word in this way?

$Keywordlist=@("word1","word2","word3")
$SiteContent = (Invoke-WebRequest $SiteLoc -UseBasicParsing -Headers @{"Cache-Control" = "no-cache" }).Content
if ($SiteContent | Select-String -Pattern $Keywordlist){
    # one of the words where found
    foreach ($word in $Keywordlist){
        $WordWasFound = ($SiteContent | Select-String -Pattern $word)
        if ($WordWasFound){
            # add it to list
        }
    }
}
else {
   # move on to next site
   continue
}

That’s a question only you can answer. :man_shrugging:t4: I’d probably use the output Select-String already provides and parse this instead of running it again and again and again for every single keyword.

But how should I then now if one specify word was found on the site?
If I’ll do it this way then i do a previous check and if it is existing on this site then I need to check for the specific word or do you know a better way?

I’m not sure if I understand. :thinking:

The output of Select-String contains information about all found patterns, their line number, the complete line it was found on and some more if you like. I’d use this output for further steps rather than running Select-String on each individual key word again - that’s what I meant.

I think select-string is working like it should but if you combine it with Invoke-WebRequest it doesn’t work well. If you run this command

$SiteContent = (Invoke-WebRequest "https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-string?view=powershell-5.1" -UseBasicParsing -Headers @{"Cache-Control" = "no-cache" }).Content
$SiteContent | Select-String -Pattern "about_ActivityCommonParameters"

Then you will get an output of the whole HTML file not only the found patterns and I think that is my problem :wink: because I couldn’t see if the pattern has been found.

Maybe you have an idea for that.

$SiteContent = (Invoke-WebRequest "https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-string?view=powershell-5.1" -UseBasicParsing -Headers @{"Cache-Control" = "no-cache" }).content -split "\n"

This way you split the big bulky chunk into individual lines. :wink:

$SiteContent | Select-String -Pattern "about_ActivityCommonParameters" | Select-Object -Property *

Now you can see the individual pieces of information. :wink:

You made my day :slight_smile:
If found something similar but didn’t think about -property parameter

$Keywordlist=@("about_ActivityCommonParameters")
$SiteContent = ((Invoke-WebRequest "https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-string?view=powershell-5.1" -UseBasicParsing -Headers @{"Cache-Control" = "no-cache" }).Content).Split([Environment]::NewLine)
$SiteContent | Select-String -Pattern $Keywordlist | Select-Object -Property *