I think it best to start with an update of how things are going with my new find (see last post). Strawberry Perl has been running FINE now for almost a month. I have installed a bunch of different modules and have run a lot of scripts using it and it is running flawlessly so far.
That, was a lead in to my next update. I have a web site which I have paid a membership to so that I can access their information. The web site provides PDF documents (A LOT of them) that contain stamp albums that are created by the site owner. I am a philatelist and to not have to spend, literally, thousands of dollars on stamp albums makes a collector a very happy person.
Well, I said there were a lot of files and I was not kidding. There are a little over 200 countries in the world and each country is split up into multiple pdf files, each containing sections of the full album (this is done for a reason). To be exact, there are 2,137 files.
Now, wanting to get the most of my $20 (yes, its pretty cheap), I figured that instead of trying to download ALL of the files by hand that I would try and write a perl script that would do the job for me. So, that is just what I did. I was reading up on the different modules that I could use and I decided upon the WWW::Mechanize module written by Andy Lester.
It took me about 2 days to get the script coded, working and tested. The O'Reilly book "Spidering Hacks" definitely lead me down the right path with their hack(s) on WWW::Mechanize. I have to credit the author for the examples as I used some of his code in the script as well. After working out the typo's and little mistakes, plus, since I had never used this module, I had to figure out how to do the Authentication that was required in order to download files.
I did some searching around the internet and found a link to a page that has a couple of examples of how to do authentication, with one of the examples using WWW::Mechanize. This was PERFECT. It worked like a charm and before I could smile I was downloading all 2137 files from the site.
I must say, perl is a very complex and at times, complicated language but if it there is one thing I can say, it is that I LOVE PERL!!!!
1 comment:
The author of the Mechanize hacks in Spidering Hacks is the same guy who did the WWW::Mechanize module. I think there might be one or two non-Lester hacks in Spidering Hacks, but the main ones are mine.
I'm just glad you were able to find a copy.
Post a Comment