Download Files from MP3 Blog?

by Ricky Bobby at 2012-08-19 02:44:31

Hi,

I’m fairly new to PowerShell, and I’ve no real background in coding or VBScript for what it’s worth, but am a pretty quick learner all the same. I have dabbled with PowerShell now and then over the years as an IT person and in the past month or two have started getting into it a lot more wondering why I’ve been avoiding it for so long when it’s so damn cool! In any case, as both a learning exercise, and due to actually wanting this, I’m looking for a little assistance as I’m stuck.

Basically, there’s a blog that posts tracks in mp3 or wav format that I never have time to visit regularly nor listen to all of them. I’d love to auto-download any new posts (and all past posts for that matter) which can later be synced to my music library and mobile player for listening at my leisure.

So, I started digging in and with a little sleuthing, I found that I could get all of the download links with:

$iwr = Invoke-WebRequest http://lagasta.com
$iwr.Links | select href | where {$.href -like "*download"}


which returned the first/main page of the blog like this:

href
----
http://soundcloud.com/smile-recordings/new-found-land-wings-edit/download
http://soundcloud.com/thehouseofdisco/monitor-66-ambient-blackbird/download
http://soundcloud.com/splendour/marathon-hannulelauri-remix/download
http://soundcloud.com/soul-button/mercury-ft-robert-owens/download
....


So far, so great! Now, with a bit more sleuthing, I saw that "Start-BitsTransfer" was probably what I was after, so I tried piping it and got this:

PS> $iwr.Links | select href | where {$
.href -like "*download"} | foreach {start-bitstransfer $_ C:\Users\RickyBobby\Music\LaGaSta}

Which gave me:

start-bitstransfer : An incorrect value is specified in the Source parameter or in the Destination parameter. Verify
that the directory and file names in the Source and Destination parameters are correct.
At line:1 char:73
+ $iwr.Links | select href | where {$.href -like "*download"} | foreach {start-bi ...
+ ~~~~~~~~
+ CategoryInfo : NotSpecified: (:slight_smile: [Start-BitsTransfer], ArgumentException
+ FullyQualifiedErrorId : System.ArgumentException,Microsoft.BackgroundIntelligentTransfer.Management.NewBitsTrans
ferCommand


Hmmmm. So I stepped back for a second and just tried one of the URLs to simplify things:

Invoke-WebRequest http://soundcloud.com/smile-recordings/new-found-land-wings-edit/download | Start-BitsTransfer -Destination C:\Users\RickyBobby\Music\LaGaSta

Which started and completed a transfer (awesome!) and then threw the following error (grrrr!) with nothing in the specified folder either:

Start-BitsTransfer : The input object cannot be bound to any parameters for the command either because the command does not take pipeline input or the input and its properties do not match any of the parameters that take pipeline input.

Double hmmm. Ok, let’s just see what Start-BitsTransfer does with it since it’s new enough to me anyway:

Start-BitsTransfer -Source http://soundcloud.com/smile-recordings/new-found-land-wings-edit/download -Destination C:\Users\RickyBobby\Music\LaGaSta

That downloaded a file called "download" to the directory specified. Ummmm, ok. We’re close. Sorta.

Then I visited the link in a browser just to see what I got - which was a "save file" dialog with the file name pre-populated - as expected. Nothing special there really. Comparing sizes from the PowerShell download with the browser download, all looked to be the same. So, basically, PowerShell doesn’t know what it transferred, just that it was told to do so it appears. Right? I then went through the help file and examples on Start-BitsTransfer but didn’t see anything that helped me much or pointed me in the right direction in relation to extracting the file name or type and saving it as such. I’ve found examples of people doing it with direct linked filenames, but in this case . . . nothing so far.

Annnnd, that’s where I’m stuck. Any pointers on what the errors mean (google wasn’t helpful really) and how I achieve what I’m after?

Thanks in advance,
Ricky Bobby
by DonJ at 2012-08-19 08:09:35
You need to do this in a few steps.


$iwr = Invoke-WebRequest http://lagasta.com
$links = $iwr.Links | select href | where {$
.href -like "*download"} | select -expand href


Then run through them.


foreach ($link in $links) {
$pieces = $link -split ‘/’
$target = "C:\Your\Path$($pieces[-2]).mp3"
start-bitstransfer -source $link -destination $target
}


$pieces ends up containing an array of all the URL components, which are originally separated by a /. I’m assuming the next-to-last component is the filename - but you can construct $target however you like. The [-2] gets the next-to-last piece, for example.Notice how I added in an MP3.

The error is because you were attempting to pipe something to Start-BitsTransfer, and it isn’t rigged up to deal with piped-in input.

You’re not dealing with PowerShell; you’re dealing with the Background Intelligent Transfer Service (BITS), and commanding it via PowerShell. BITS isn’t a Web browser - while IE or Firefox may be able to pre-populate a download dialog, based on header information sent from the Web server, BITS doesn’t. With BITS, you need to supply everything.
by Ricky Bobby at 2012-08-19 09:13:12
Thanks Don!

Will make more sense to me after I’ve had some coffee, heh, just woke up. Also, just realized where I’d seen your name before, I’ve been reading YOUR book ("in a month of lunches" - not finished, about half way through). Really helpful stuff, so double thanks!
by DonJ at 2012-08-19 09:33:33
You’re very welcome ;). Glad it’s proving helpful!
by Ricky Bobby at 2012-08-19 09:41:16
So, I tried it as-is and it works! Cool, cool.

Now I just need to figure out how to do more than the first page (want the whole blog), and then how to somehow check for things it’s already downloaded so as to avoid duplicating it. Ultimately it’ll be something I’ll schedule to run daily or maybe weekly.
by DonJ at 2012-08-19 09:50:23
So, I’d write a function that takes the URL of a page. It should do what you’re already doing - extract the download links, maybe check to see if the file exists locally, and kick off the transfer. But it should also parse for links to other pages - and then submit those to itself. So, calling itself recursively. You’ll want to put some logic on how deeply it goes, of course, but that’s doable.
by Ricky Bobby at 2012-08-22 00:21:04
So I’ve made a little headway with it, at least the file check - I think.

It now looks like so (points off for ugly, I know):

$iwr = Invoke-WebRequest http://lagasta.com
$links = $iwr.Links | select href | where {$.href -like "*download"} | select -expand href
foreach ($link in $links) {
$pieces = $link -split ‘/’
$target = "C:\Users\RickyBobby\Music\LaGaSta$($pieces[-2]).mp3"
If(-not(Test-Path -Path $target))
{
start-bitstransfer -source $link -destination $target
}
Else
{
}
}


To test, I then deleted one of the files in the local directory and re-ran the script. Initially I thought it wasn’t working, but realized the file I had chosen had moved off of the main page. Tried again with a file on the main page, and good to go.
Cool, cool.

So, two issues now:

1. I’m getting this error in the script window, though it doesn’t seem to effect anything. "start-bitstransfer : HTTP status 403: The client does not have sufficient access rights to the requested server object."

2. I’m getting the dreaded Windows Security Warning saying "To allow this website to provide information personalized for you, will you allow it to put a small file (called a cookie) on your computer?". I figured it was an IE or Security Center setting, but so far no luck on shutting it up. Clicking it once is ok, but when I try to go scan the whole site, it wants to do it on every page. I did some digging on how many potential pages, and we’re talking 190+. That makes it a deal killer, especially if it’s going to be automated.

Anyway, what I found on the good old internets (not came up with) for scanning the blog pages was:

$blogUrl = "http://www.lagasta.com"
$archiveLinkPattern = ‘/page/\d/$’
$nextPageText = "Older Entries"

## Get the page
$r = Invoke-WebRequest $blogUrl

## Extract the archives links
$links = $r.Links | Where-Object href -match $archiveLinkPattern |
Foreach-Object href

## Go through each archive page
foreach($link in $links)
{
do
{
## Get the archives for that month
$month = Invoke-WebRequest $link

## Find the link to "Older Entires"
$link = $month.Links | ? innertext -match $nextPageText |
Foreach-Object href | Select-Object -First 1

## Keep on doing this when we find an "Earlier Entries" link
} while($link)
}


Running that, appears to work, but not sure as I got bored clicking "yes" for each cookie notification. How do I shut that off? And beyond that does that look solid in terms of digging through blog pages? It obviously all have to fit together at the end, but . . .
by DonJ at 2012-08-22 07:51:13
So, a 403 error is sent by the server. Not much you can do on your end except deal with it however you like.

Keep in mind that most Web sites expect to be dealing with a Web browser - not BITS. BITS isn’t a Web browser, and it doesn’t provide a lot of the functionality that browsers do - like automated cookie management. So you’re essentially having to write your own Web browser… which is, necessarily, gonna get complicated. I don’t know of any way to suppress that prompt or auto-accept it. It likely is something from IE or Windows itself, but that’s not stuff I dig into a lot myself.
by Ricky Bobby at 2012-08-22 08:01:43
Ah, ok. I’ll see what I can find and report back if there’s any "cure".

Otherwise looks to be solid though, yes?
by willsteele at 2012-08-22 08:06:26
Being a little lazy minded here, but, you can get those button clicks handled for your by PowerShell. Check out WASP for some cool interactive, desktop/PowerShell functionality. I am sure there are other ways to do this, but, being familiar with modules like WASP is never a bad thing.
by Ricky Bobby at 2012-09-02 01:56:30
Thanks, I did check that out, but haven’t gotten to implementing it yet. First and foremost I’m trying to expand the links that it searches for downloading, currently it’s just seeing links that end in "/download" - which is pretty much always SoundCloud, but the blog posts links to other places as well.

Second, still working on the "how" of where it’s breaking, but the script is now selectively downloading stuff it seems as the variable "target" below is apparently not always correct with the Xpath portion:


$iwr = Invoke-WebRequest http://www.lagasta.com/category/mpfree/
$links = $iwr.Links | select href | where {$
.href -like "*download"} | select -expand href
foreach ($link in $links) {
$pieces = $link -split ‘/’
$target = "C:\Users\Jk\Music\LaGaSta$($pieces[-2]).mp3"
If(-not(Test-Path -Path $target))
{
start-bitstransfer -source $link -destination $target
}
Else
{
}
}


I’ve been successful (sorta) with changing [-2] to [-1] - but that doesn’t seem to be a constant.

Ultimately, while doing some internets learning I came across this example where he uses wget, which reads to be exactly what I"m looking for: http://veen.com/jeff/archives/000573.html - but I’d love to do it in PowerShell if possible, largely as a learning exercise, but also . . . who doesn’t love PowerShell? :slight_smile:

Oh, and also during my search/internets learning, I came across something called DownloadString(), should I be looking into that instead of bitstransfer maybe? One other slight issue I’ve found is that it will grab any link above and make it an mp3, when sometimes they’re wav files - I suspect that’s just an issue with defining the variable above, but . . . . Small problem, but still, going for complete automation here. :slight_smile: