Powershell help for extracting specific links from a list of websites.


I’m hoping someone can help me change a string of code from working with one site to another.

this is what I have working for one site:

$InputLinksFile = "c:\temp\InputLinks.txt"
$OutputLinksFile = "C:\temp\OutputLinks.txt"
$InputLinks = @()

$BasePage = "https://www.fanfiction.net/tv/Buffy-The-Vampire-Slayer/?&srt=2&lan=1&r=10&p="
[int]$FirstPageNumber = "600"
[int]$LastPageNumber = "601"
$CurrentPageNumber = $FirstPageNumber

# Make a list of all the pages we want to input, counting from FirstPageNumber to LastPageNumber
while ($CurrentPageNumber -le $LastPageNumber) {
	$InputLinks += "$BasePage$CurrentPageNumber"

# If you want to manually input a list of pages instead, remove # in front of the next line:
$InputLinks = Get-Content -Path $InputLinksFile

ForEach ($InputLink in $InputLinks) {
	# Fetch the entire page. Get links in page with ().Links. Page is compressed with gzip, so we'll have to account for that
	$InputPageLinks = (Invoke-WebRequest -Uri $InputLink -Headers @{"Accept-Encoding"="gzip"}).Links
	# Filter the link list to only contain links with the sequence "/1/" in it.
	$FilteredOutputLinks = $InputPageLinks | Where-Object {$_.href -like "*/1/*"}
	# The provided links are relative and not absolute, so we need to add the domain name to the output
	foreach ($OutputLink in $FilteredOutputLinks) {
		$FinalLink = "https://fanfiction.net$($Outputlink.href)"
		Out-File -Append -FilePath $OutputLinksFile -InputObject $FinalLink
	Clear-Variable InputPageLinks

Example link from the new site: Buffy the Vampire Slayer (TV) - Works | Archive of Our Own
And this is the type of links that need to be extracted: The Wish - DragonLdy - Buffy the Vampire Slayer [Archive of Our Own]

I’m hoping someone can help me.


You would need to post an example link here without it it’s impossible to help. It looks like you may have tried and the forum filtered it out so perhaps try again and use the code insert.

Example link for new site: TOLKIEN J. R. R. - Works & Related Fandoms - Works | Archive of Our Own

Example link for a story: Shield Queen - Chapter 1 - SexiestLamp - The Lord of the Rings - All Media Types [Archive of Our Own]

I’ve now edited so the two links in the first post are visible.