Powershell help for extracting specific links from a list of websites.


I’m hoping someone can help me change a string of code from working with one site to another.

this is what I have working for one site:

$InputLinksFile = "c:\temp\InputLinks.txt"
$OutputLinksFile = "C:\temp\OutputLinks.txt"
$InputLinks = @()

$BasePage = "https://www.fanfiction.net/tv/Buffy-The-Vampire-Slayer/?&srt=2&lan=1&r=10&p="
[int]$FirstPageNumber = "600"
[int]$LastPageNumber = "601"
$CurrentPageNumber = $FirstPageNumber

# Make a list of all the pages we want to input, counting from FirstPageNumber to LastPageNumber
while ($CurrentPageNumber -le $LastPageNumber) {
	$InputLinks += "$BasePage$CurrentPageNumber"

# If you want to manually input a list of pages instead, remove # in front of the next line:
$InputLinks = Get-Content -Path $InputLinksFile

ForEach ($InputLink in $InputLinks) {
	# Fetch the entire page. Get links in page with ().Links. Page is compressed with gzip, so we'll have to account for that
	$InputPageLinks = (Invoke-WebRequest -Uri $InputLink -Headers @{"Accept-Encoding"="gzip"}).Links
	# Filter the link list to only contain links with the sequence "/1/" in it.
	$FilteredOutputLinks = $InputPageLinks | Where-Object {$_.href -like "*/1/*"}
	# The provided links are relative and not absolute, so we need to add the domain name to the output
	foreach ($OutputLink in $FilteredOutputLinks) {
		$FinalLink = "https://fanfiction.net$($Outputlink.href)"
		Out-File -Append -FilePath $OutputLinksFile -InputObject $FinalLink
	Clear-Variable InputPageLinks

Example link from the new site: https://archiveofourown.org/tags/Buffy%20the%20Vampire%20Slayer/works
And this is the type of links that need to be extracted: https://archiveofourown.org/works/13345065

I’m hoping someone can help me.


You would need to post an example link here without it it’s impossible to help. It looks like you may have tried and the forum filtered it out so perhaps try again and use the code insert.

Example link for new site: https://archiveofourown.org/tags/TOLKIEN%20Jd%20Rd%20Rd%20-%20Works%20a%20Related%20Fandoms/works

Example link for a story: https://archiveofourown.org/works/13427472

I’ve now edited so the two links in the first post are visible.