Thursday, December 20, 2007
Adding to the Repetoire
In the past 8 months my knowledge of Perl has grown from semi-beginner hobby coder, to, in my very modest opinion, a moderately intermediate Perl developer. I have a solid grounding in the Perl basics as well as insight enough to know how to find out about modules, which ones to use and how to use them.
That isn't all though. Back in May of this year(2007), I joined up on the the scripts development forum and only after 3 months on the forum, helping out others and learning all that I could, I was submitted to the membership to become a Moderator. On my birthday I was fully promoted to moderator of the Perl form on the scripts. I have since taken on moderation duties in several other forums (about 9 forums in all right now), but maintain Perl as my "home" forum.
As you can see from the above background, I have gotten myself to a slightly comfortable place in my Perl career. Not at all to a point where I will stop learning, but instead, to a point where I feel comfortable enough about my knowledge that I can now take on learning another language.
There was a toss up as to which language to learn next. I was trying to decide between Ruby, which is extremely popular, Python, again, extremely popular and from what I had heard and read, easy to learn, and something like C#. I know, why C#, its a Micro$oft language? Well, it may be, but you can code C# with the Mono project on Linux and its somewhat interesting.
I thought about it for a little while, trying to decide. C# was actually ousted from the lineup first. I had done some VB at one point and even touched some Java in one of my college courses and I wasn't all that plussed by it. So, that truly helped me make that decision.
Ruby is a very interesting programming language itself. It was touted as powerful and quite popular. Plus, the added benefit of Rails for making web development with Ruby much quicker.
Well, I had to decide, so I chose Python. I didn't have an overwhelming reason why, its just that it appealed, almost called, to me. I liked that everyone said it was pretty easy to learn and the user base was growning steadily. Besides, that would leave me with Ruby as the next one to learn.
So, for almost a week now I have been trudging along at a steady pace through one of my Python books, learning the basic ins and outs. So far, they were right, it is easy to pick up, but I am not going to count my chickens yet. I will give myself some time to get a lot more into the language, but I am quite confident that I will have a similar love for Python that I do for Perl. If that is the case, then I will be a doubly happy person.
Thursday, December 06, 2007
Miscellaneous Updates
I think it best to start with an update of how things are going with my new find (see last post). Strawberry Perl has been running FINE now for almost a month. I have installed a bunch of different modules and have run a lot of scripts using it and it is running flawlessly so far.
That, was a lead in to my next update. I have a web site which I have paid a membership to so that I can access their information. The web site provides PDF documents (A LOT of them) that contain stamp albums that are created by the site owner. I am a philatelist and to not have to spend, literally, thousands of dollars on stamp albums makes a collector a very happy person.
Well, I said there were a lot of files and I was not kidding. There are a little over 200 countries in the world and each country is split up into multiple pdf files, each containing sections of the full album (this is done for a reason). To be exact, there are 2,137 files.
Now, wanting to get the most of my $20 (yes, its pretty cheap), I figured that instead of trying to download ALL of the files by hand that I would try and write a perl script that would do the job for me. So, that is just what I did. I was reading up on the different modules that I could use and I decided upon the WWW::Mechanize module written by Andy Lester.
It took me about 2 days to get the script coded, working and tested. The O'Reilly book "Spidering Hacks" definitely lead me down the right path with their hack(s) on WWW::Mechanize. I have to credit the author for the examples as I used some of his code in the script as well. After working out the typo's and little mistakes, plus, since I had never used this module, I had to figure out how to do the Authentication that was required in order to download files.
I did some searching around the internet and found a link to a page that has a couple of examples of how to do authentication, with one of the examples using WWW::Mechanize. This was PERFECT. It worked like a charm and before I could smile I was downloading all 2137 files from the site.
I must say, perl is a very complex and at times, complicated language but if it there is one thing I can say, it is that I LOVE PERL!!!!
Tuesday, November 13, 2007
No More Perl Monopoly for Active State
In the past few months I have found a few modules that I was wanting/needing to use in a project and they could not be found anywhere in the PPM (Perl Package Manager - this is what Active State Perl uses). If you investigate further, you will find that PPM does not use CPAN directly (as I think it should) but instead uses the Active State PPM repository by default. There are a couple of others out there as well, but they only add a few hundred packages (modules) to the mix.
Well, I did some inquiring to see if anyone knew of a complete repository to use for PPM, but instead was thrown a link to something completely new. Strawberry Perl! My curiosity peaked and being completely intrigued, I decided to download this new offering and see if it actually worked.
From what I was told, Strawberry Perl used the CPAN interface that most of us are already familiar (or quite intimate with) from the Unix world. That said, for those of you who remember (no matter how vaguely) the CPAN initial setup, the configuration manager looks for the existance and path's of specific utilities in order to be able to do its job.
Well, I made a list of the utilities that it requires and have provided those below for your reference:
The following utilities are located at the gnuwin32 website (or use the gnuwin32 listing page):
- bzip
- gzip
- tar
- unzip
- wget
- less (my favorite pager)
That is the base path that you should then use for extracting the other 5 utilities as they have the same directory structure internally (ie: bin director, etc)
Here are other utilities that you will need (each is a link to its URL):
- curl
- lynx
- ncftpget (and ncftp) - Both are installed with the same single download. Be sure and grab the client software (version 3.2.1 as of this blog entry).
- gpg
perl -MCPAN -e shell
That will kick it off. I simply took all of the defaults, with the exception of the utility paths that were not displayed, I entered them manually.
Once I had completed the software installations and gone through the configuration, I entered the following command from within the CPAN shell environment:
install Bundle::CPAN
This confirmed to me that everything had worked as it installed beautifully. I then installed a few modules that I needed and ran a few scripts that were around from when I had Active State Perl on the machine (yes, it was uninstalled before installing Strawberry Perl. Sorry for not mentioning that earlier).
Anywho, I hope that those of you that code Perl on Windows will do your due diligence and go against what used to be a Perl / Windows Monopoly and switch from the limitations of Active State over to the freedom and (IMHO) superiority of Strawberry Perl.
Monday, October 29, 2007
Organize......UPDATE
The reason for this update though is not just to again let people know about it, but to also let you know that Incollector has now released its 1.0 version.
If you are wanting to organize those bits of data you have scribbled down on post-its and papers, need a place to store all your serial numbers, have code snippets you need to save, then this is the place to store it. The database is exportable and can be imported between Linux and Windows versions.
Download it, install it, play with it, and enjoy it. It is an incredible tool!
Friday, September 28, 2007
Its been that long?
Well, here is my update. I have been particularly busy this month between my daughter's birthday, projects at work and the finding of the online learning tools here at work. That has been of particular interest to me since it is the same system that I was using to learn a bunch of topics at my last job (before my access was pulled due to me terminating my employment by resigning).
Well, I have finished up the Perl courses on CGI that I had been taking and am working to become a lot more familiar with that aspect of Perl coding. It is rather handy when it comes to Web Development to be able to understand the inner workings of CGI. Having that down pat is a huge advantage in the job market as well. The next course(s) that I am taking is/are all about XML. That is another thing that I have been wanting to learn more about and learn to parse with Perl.
Unfortunately, they do not have any courses on Perl DBI but I have a book I am reading/studying that is teaching me what I need to know. Once I get all that figured out I can go ahead and start working on a website that will help me carve the CGI and DBI knowledge in stone into my brain.
I had a project placed into my lap earlier this week and was told that it is something that needed to be done in short order. Ok, I have no problem with that. I was also told that there was another project for the same client that could be cloned to get this new project up and running quicker. Now, having a project to clone is great, so long as the code used in that other project isn't about unmaintainable. I emailed my colleague the other day to say that between coding the project from scratch and trying to figure out the cloned code, I would rather code it from scratch as it would take less time and also be correct.
My colleague came back and gave me the thumbs up to just code it and said that he usually didn't touch the old code because of its unmaintainability. (Then WHY would you tell me to use it if you knew it was like that?) Thankfully my colleague is understanding and flexible (not to mention easy going about things, so long as things happen and get done). Well, after a day of mulling over my keyboard yesterday I completed the initial coding of the project and even whipped out an estimate for another job that is coming down the pipe.
Anywho, that's what's been going on.
Tuesday, September 11, 2007
A post about Perl?
Please know though, that this IS a geek blog and not limited to Perl alone. I did just want to let everyone know that I am still doing Perl most of the time. In fact the amount I do has increased, if it can that is.
Anywho, keep watching.
Sunday, September 09, 2007
How to save that flash file you really like
Well, one thing that I always hate is not knowing how to do something that should not be that trivial. One of those things that I had resolved earlier on was how to save a flash file to your hard drive from a website. Believe it or not, its really not that difficult.
NOTE: I am only going to show you how to do this in Firefox as I refuse to fire up that crap program called IE. If you want to know how to do it in there, then you figure it out. I don't use it and I am not going to use it even just to find this out.
That said, the first think you should do is to load the page containing the flash animation that you wish to download. Once you have that done, you now have two choices of how to do this, which I will explain here.
The less geeky/nerdy way:
Once the page is loaded, in your Firefox menu bar, go to "Tools -> File Info". This will bring up a box with a few tabs. One of the tabs is "Media". Click on it. You will see anywhere from one to a bunch of files listed in the Address section of the Media tab. What you want to do is scan through and click on the ".swf" file that you are wanting. Then, once you have done that, simply click on the "Save as" button and select a place to save it on your computer and click "Save".
Believe it or not, thats it.
The more geeky/nerdy/power user way:
Again, once the page is loaded, you want to open a new tab (Ctrl-T for those not in the know). Then, in that tab type the following in the URL bar: about:cache?device=disk
This will bring up a list of EVERYTHING that is in the Firefox cache on your computer. Simply browse it for the file you want. It will have the path to the website it is hosted on. All you have to do is right click and click on "Save as" and save the file wherever you want on your computer.
Again, it is that easy.
So, the next time someone asks about saving flash files, you can prod them in the right direction on how to do it.
Friday, September 07, 2007
The New Printer
So, I remembered the Sunday ads and that Best"Try" had an HP Laserjet on sale for $99 that printed 15 ppm and was relatively small. So, last night we went to look at it. Sure it was quaint, and I was about to see if they had any in stock when the wife says, "Hey, check this one out, its even a copier." Well, far be it for me to not look at another model. It was a laser printer, copier and color scanner all in one. It was nice and it was a "Brother".
We tested it and it seemed to work great, printing 20 ppm. So, we talked and decided to go with it. Sure, it was double the other one in price, but we could now use it to print out copies of stuff from kids books for the kids to write and color on without destroying the original. Plus, printing was quicker and a breeze.
Well, got it home and last night I hooked it up and were printing in only a few minutes. Gosh I just love new toys. Now I have to hook my Linux laptop up to it and see if I can get it printing.
Friday, August 31, 2007
New Status: Moderator
I have been actively posting and participating in the Perl forum on thescripts.com since about May of this year. I have only posted one or two questions myself, but spend 95+% of my time on their answering others questions as some of them challenge me to research and learn before answering.
Well, I made a bid to the main Moderator for the Perl forum about a week and a half ago to join him in the duties. Well, he put me in for it the other day and this morning I was delighted to find that I had been awarded the honor.
This is GREAT!! I have since been cleaning up posts, moved one posting, and just all around having a blast with my new found moderator status. No, I am in no way abusing it, but I am trying to keep up on all the goings on so as to keep the forum running as smooth as possible.
Happy Birthday to ME!!!
Tuesday, August 21, 2007
Opinions, opinions
You may get a couple of responses in the beginning, where people are telling you what IDE they use, the pros, the cons, and why they think it is so wonderful, but then it starts. You get this multitude of fascist dictator types that absolutely insist that "there is no better IDE than (input editor here) and that all other editors are crap in comparison!". You even have the old school folks, some of whom can remember creating punch cards, who believe that command line editors or vi are the best editors.
If you are one of those that is getting ready to ask that time-(de)tested question of "Which IDE is better for ...?", then just DON'T!
Here is what I believe, and no, I am not going to go and follow the masses, preaching what I think is the best editor. Instead I am going to sum it up with this..... try them all. Download and install a number of editors. Play with them, write code with them, debug with them, get to know them. While you are doing this, take notes on what you like and dislike about each one. Then, when you are done, compare all of your notes. You have to not only look at the notes, but you have to think to yourself, "will I still like this editor in a 6 months? a year? " The answer may very well be, I don't know.
I am old-school unix. I believe that the command line rules and vi is the best day-to-day editor. All of the coding that I have learned has been by hand. I prefer not to learn with a fancy, shmancy do-it-all-for-you editor as I won't learn anything. I like learning a new language in an editor like vi because I get to debug my code by hand and not rely on a program to tell me what is wrong. This allows me to assess the errors and get my coding (by hand) down to a science. After I am more than comfortable, then I migrate to a more comfortable editor that will save me time.
While vi will always have a place in my heart and my editing world (being the first editor I used on Unix), I must say that I have leaned toward Active State's Komodo for my day to day coding in Perl ( and other misc languages, including HTML). Yes, some will tell you it is a beast and clunky slow. Personally, it takes a minute to start up, but after that, I don't have any issues. I don't have this insatiable need to have my editor at my fingertips within a nanosecond of clicking on the link to launch it. I am patient enough that I can wait the 30 or so seconds that it takes to launch. I use it because I like its syntax highlighting, code sense (hints, kind of like Micro$oft's Intellisense), and overall comfortable feel.
That my friends, is what I think the key is..... comfort! You have to pick an editor that you like and not listen to the skewed views of the mass critics out there.
In a posting to the Boston Linux User's group, Uri Gutman wrote, "so my main point is that coders need to be smarter about their analysis, architecture and design and less caught up in tools like IDE's and syntax highlighting. you can have the greatest IDE in the world and
still come up with crappy code and solutions. whereas a good software design can be written and debugged with any set of tools."
That is one of the best statements that I have read on the subject and it is something that I have believed in for some time. If you aren't able to write good code and be able to debug it thoughtfully, then no editor in the world is really going to help you!
Happy coding!
Tuesday, August 14, 2007
Things that I learned yesterday
For instance, I was working on a script last week that took a file and parsed out of it a string that was ocourring ( that should not have been there). Well, my script removed said offending lines to another file for "safe keeping", while outputting the good lines to their own file. To make sure that everything worked correctly, I had to balance the new file to ensure that ONLY the offending lines were removed and all other lines ended up in the new file.
So, I delved into the File::Util module, which has a function called line_count() in it, which takes a file as input and outputs the number of lines in the file. What I discovered was that the function was working fine with the first file processed (the original file), but on each subsequent file ( the offending lines file and the new outputted file), the counts were totally off, even so much as the offending lines file's count being zero (0).
So, I emailed the developer who produced the module to get his advice and see if there was an issue with the module. After he did his typical tests and did not discover anything wrong, he came back to ask me to ensure a couple of things:
1. That I ran the close() function on each file handle before actually acting upon the file that each file handle was referencing. Well, this was definitely an issue. I had the close routines after everything was said and done. So, I migrated them to close the file handle(s) before doing the line count.
2. He asked me to turn off buffering for I/O. I was a little new to that and asked him to explain further. He said that all I had to do was to set the variable "$|" to any true value:
ie: $| = 1;
$|++;
This would tell Perl that, instead of storing date in memory, that the data going to file handles would go directly to the file handles and not get stored in memory. This not only ensures that all data is written to the file handle(s) as it should be, but also has the added benefit of clearing up any memory usage from the stored information. Also, one other note, you need to set this variable at the beginning of your script so that everything in the script is effected.
So, after modifying the auto flush variable and closing all file handles, the function seemed to work just fine and outputs everything perfectly.
Many, many thanks to Tommy Butler, the author of the File::Util module on CPAN. Without his help, I would probably still be scratching my head over the issue. Now though, I have a bit more knowledge and experience with which to draw upon with my next project.
Wednesday, August 01, 2007
Checking for duplicates
In my last post, I mentioned about one liners and that I was working with some code that was rather puzzling to figure out. Well, I figured it out and with the help of Learning Perl, 3rd Edition. I have said it many times before and I will say it again. As much as the Camel book is famed as the "Bible of Perl", I tend to keep the Learning Perl book much closer to my keyboard.
There one liner that I was working on figuring out was as follows:
perl -e '$count=0; while (<>) {if (! ($var{$_}++)) {print $_; $count++;}} warn "\n\nRead $. lines.\nTook union and removed duplicates, yielding $count lines.\n"' ./file1 ./file2.txt > ./combined.txt
This code is supposed to take in the two files (file1 and file2) and combine them into one file (combined.txt), all the while, removing any duplicate entries. What puzzled me was HOW IS IT DOING IT? Yes, if you are wondering, it does work. Any Perl guru's out there are already nodding their heads as they probably already know how.
The magic of this code is in the "$var{$_}++". What happens is this, the code takes in the first file and reads it line by line. It then takes each line in turn and creates a key in the hash with it, but it is UNDEF as there is no value assigned. This ends up being a true test. The next line is read in and again it creates a hash key with the line as the key, only this time, if the key already exists, then the test is false as it is already existing and undef, so, the line will not be added to the output file. Its a little confusing, I know, but it works and it is how it was designed to work. Personally, its a great, short system for removing duplicates.
If you still have questions, I recommend you look at the example on page 153 of Learning Perl, 3rd Edition. Yes, I know they are up to 4th Edition, but I have my 3rd edition copy with me at the moment.
Happy Coding!!
Tuesday, July 31, 2007
Bring on the one liners!
Since getting this job, I have written a couple of scripts for work here that are a few hundred lines long and others that are only a couple hundred lines in length. Not too shabby and they work great.
Recently though (starting last week), I have been working on a Production issue. I did a fix for the issue but have since then been learning how the scripts work. The project is mainly shell scripts, but one of the scripts that is called is a ".sh" file that only contains... yup, you guessed it... a Perl one liner. This one liner is supposed to compare two files and return the lines that aren't duplicates.
Well, in my examination, I will be darned if I can find any where that the one liner is actually doing a comparison. To me, it looks like the script just simply outputs the first file and then appends the second file.
Anywho, that issue aside, during the whole research that I have been doing, I have had the opportunity to really delve into how to do one liners in Perl. It is really rather interesting. Granted, its not my favorite way to code Perl, but for the quick dirty job that takes seconds, it is definitely the way to go.
Tuesday, July 24, 2007
Heads up --> Opinion Ahead
"Ok", I said, so what is the issue? I guess that one of his colleagues make the comment that Perl just isn't the way to go for such an application. And I quote: "The database interface is "eh" and the graphical capabilities are not that good either. Perl is really only good for text modifications."
I somewhat choked. Sure, text is what Perl was designed for, but to make a statement so completely confounded without first seeing what has changed with the language is just lame!
It seems that this guy hadn't touched ANY Perl in well over 10 years. Ok, 'nuff said. That means that:
#1. He really hasn't seen Perl lately
#2. He really hasn't seen Perl lately
#3. Perl 10+ years ago and Perl today are two completely different beasts.
Yes, I know #1 and #2 are the same, that was completely on purpose. I listened to the list of requirements for the application and they sound totally feasible for Perl. I cannot divulge them here as it is work related ( I am sure you understand), but believe me, Perl wouldn't have many, if any, issues. Granted, the coding would be a little intense, haveing to use TK, DBI and other assorted modules, but aside from that, it sounds like it would be a sweet application.
The big plus for Perl is that it is a language approved and used inside the organization, so no having to get approval for it. That should make them lean in the right direction.
Thursday, July 19, 2007
An Enigma machine on eBay....Oh My!
It seems that someone in Italy is trying to cash in on a bit of history by offering one of these machines to the public. If this turns out to be a true Enigma machine, then this is an incredible find. I can only imagine how lucky the winning bidder will be feeling.
Here is the link: http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=270146164488
Granted, in my opinion, something as monumental as the Enigma machine definitely belongs in a museum. My only hope is that, whomever buys it does not keep it locked away, but instead puts it on display for others to see and enjoy.
Of course, if you happen to be a hobby cryptographer, and a handy one at that, then you could always build your own Enigma machine in your workshop. That is definitely an interesting prospect for a project. Hmmm.
Anywho, enjoy the auction. It will be interesting to see how high the bidding goes.
Friday, July 06, 2007
Watch that punctuation
I am working on a script to automatedly download ALL of CPAN. ( I know.... WHY? Because I want to, that's why. Plus, my laptop isn't always able to get online and when I need a module, if I have it readily available, I can manually install it.)
Basically, the script fetches a copy of the CPAN modules list, and I am then parsing out of the HTML that was returned, the paths to the modules, including the module names. Here is the section of code that was bugging me:
if($line =~ m/\.gz/)
{
@elements = split(/"/, $line);
print PATHS ("$elements[1]\n");
}
I was really wanting this to work ( and the code above does), but when I IM'd Merlyn (Randal Schwartz), he noticed that I was missing a double quote in the print path. Man, after all day coding, I don't blame myself for missing that one. Thanks Merlyn for the second set of eyes on that one!
Once I have the code completed I will post it here.
Tuesday, July 03, 2007
[TAOREs] File or Directory from a listing
I was working on a script a couple of weeks ago where the script had to FTP into a server, grab a directory listing from a specific directory, and then send an email to a distribution list if a file was found in the directory.
Seems pretty straightforward, but I quickly realized that this was an opportunity for me to get a little practice writing regex's. The remote server was windows, so when the FTP connected and ran the "dir" command, the output had the usual FTP banter, but also a line similar to the following when a file was found:
drwxr-xr-x 7 Administrators group 0 Jun 04 22:07 myfile.txt
This is the typical output format for FTP on a Windows machine. Yes, it is the same format as the long listing provided by an "ls -l" on a unix machine. So, knowing that, I quickly set out to write a regular expression to match just the "myfile.txt" file name of that line.
Here is the regular expression that I came up with to match the file name in that line:
m/^.+\s+\d\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+\d{1,2}:\d+\s+(\w+\.\w+)$/
This is pretty straight forward (if you know something about regex's). Let me break it down with the regex formatted slightly differently:
m/ # start the match
^ # start from the beginning of the line
. # match any single character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d # match a digit
\s # match a space
+ # match preceding element one or more times
\w # match a word character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\w # match a word character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d # match a digit
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\w # match a word character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d # match a digit
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d{1,2} # match a digit, at least once, but up to twice
: # match a colon
\d # match a digit
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
( # Start subexpression group for capturing
\w # match a word character
+ # match preceding element one or more times
\. # match a period, yes, this needs to be escaped as the period is a regex character too
\w # match a word character
+ # match preceding element one or more times
) # End the subexpression group for capturing
$ # match the end of line
/ # end the regex
I know, it seems a bit messier, but this is one way of writing a regular expression so that you can annotate what is happening. Personally, I like the first version better.
Now, with this regex, if your file name (or directory name) was in a different format that something like "myfile.txt", then you would have to edit the regex to reflect that difference, or risk the code not working.
In the above though, the part of the regex that is enclosed in ( ) will be placed in the special variable $1 so that its value can be referenced elsewhere in the script.
The Art Of Regular Expressions
Now, I am ALWAYS learning and can usually be found with a computer book somewhere in my vicinity pretty much most of the day. I never miss a chance to learn something new if I can help it. That said though, I still will admit that there is nothing like some of the training you can find that gets right down to the nitty gritty without beating around the bush. This is especially true when you are trying to get a solid base in the basics of whatever you are learning.
The CBT's that my last job provided me access to did just that. Not only did they give me a really good understanding and grounding in Perl basics, but they also gave a really good lesson on Regular Expressions.
Regex's (as they are also affectionately referred to) can be some of the most elusive topics to get a grasp on in Perl. I am no expert by any stretch of the imagination, but the course that I took gave me an incredible head start into the world of regex's.
Before that course, I knew a fair amount about using pattern matching in Unix. I was quite adept at shell scripting and could write an awk script or two to do what I needed to. But, when it came to Perl's regex capabilities, I was lacking. ***BAM*** Then came that course. It truely opened my eyes to the overwhelming beauty and seductive power that Perl regex's have.
With a little "to-the-point" training and guidance, I think anyone can easily get a grounding in Perl regex's. In fact, I think I am going to add on to this blog and start a column regarding regex's. I will not only post some mini lessons on regex's, designed to further your knowledge in how to assemble regex's, but also examples of regex's that I have written as well, that work for the purpose they were created. Hopefully all of that will provide you a new route for reference when trying to use regex's.
I think I will call the new column "TAOREs" (for The Art Of Regular Expressions). I hope that you enjoy and also contribute any suggestions that you may have as well.
Monday, June 11, 2007
The Art of the Automated Download
In the case of the site that I am referring to, there are a little over 2000 files that I wanted to grab. The thing is, I didn't want to have to site there and "right-click-save as" for each and every one as that would have taken "days" to complete. So, noticing that all of the files had actual URLs that led right to them, I looked at the page source. There it hit me. Perl!
So, I copied the source and quickly drummed up a regular expression which grabs all the URLs of all of the pdf files on the page. After I grabbed the URLs, I put together some code which quickly went out and grabbed each and every one in turn and saved it to my hard drive.
This sounds all simple and stuff, but for someone like me who still considers him a novice in the Perl world, it did take a couple of hours of research. First I tried to use the "WWW::Mechanize" module and was able to retrieve a complete list of the pdf files and their paths, but not the actual files. I tried other packages and such, delving into LWP itself, but I could not for the life of me get this code to actually download the files.
I found "lwp-download" and gave it a shot. Wow! It looked like it was working, up until the 904th document, where it died. I couldn't figure it out nor understand what was going on. Why was this dying at the same point every time. Well, I did eventually figure it out and was able download ALL of the over 2000 files to my hard drive. I couldn't believe it, but the routine I used only took a few minutes to download all of the files (granted, I have 15 Mb FiOS for my internet, so please bear that in mind as well).
Just as an FYI (and you can get this looking at the code, I used the LWP::Simple::getstore() routine to download the files. It was a lot easier than going through the process of figuring out why my WWW::Mechanize wasn't working, believe me. I will figure that module out later, but for now, this did exactly what I wanted.
This is probably a bit much and others would more than likely have a better way to do it, but here it is. Here is the code I used to parse HTML code for its links and download them.
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use LWP::Simple;
#############################################
# Read the entire HTML file into an array, line by line so that we can
# parse out the information we need, one line at a time.
#############################################
my @code = `cat /home/jlk/development/perl/stampAlbums/code.txt`;
if (-e "/home/jlk/development/perl/stampAlbums/files.txt")
{
`rm /home/jlk/development/perl/stampAlbums/files.txt`;
`touch /home/jlk/development/perl/stampAlbums/files.txt`;
}
foreach my $line (@code)
{
############################################
# The following code takes the site's html file(in this case, it
# is the stampalbums.com download site) and parses out all of the
# downloads URL's.
############################################
if($line =~ m/^\s+ {
my @splitLine = split(/\"/, $line);
##############################################
# Now, open the file in which you will store all the URL's to
# the files and write each URL on a seperate line.
##############################################
open(FILES, ">>/home/jlk/development/perl/stampAlbums/files.txt");
print FILES ("$splitLine[1] \n");
close(FILES);
}
}
###########################################
# Open the file containing all of the URLs
###########################################
open(FILES, "< /home/jlk/development/perl/stampAlbums/files.txt");
###########################################
# Do the download of the files
###########################################
foreach()
{
my $localdir = "/home/jlk/development/perl/stampAlbums/albumPages/";
my $path = "$_";
my($filename, $directories, $suffix) = fileparse($path);
LWP::Simple::getstore($_, $filename);
}
Wednesday, June 06, 2007
Further IP Validation information
I had posted a question at one point to thescripts.com Perl forum, regarding an issue I was having. One of the more knowledgeable users over there (he goes by Miller), enlightened me as to a CPAN module that does the same thing without re-inventing the wheel. Please don't think I regret writing my script. On the contrary, I am very happy I did as I learned a couple of lessons in doing so.
Anywho, the module mentioned above actually validates IP addresses for you to check validity just as my script from yesterday does, it just does so slightly differently.
I thank Miller for his gracious input and the link to the CPAN module!!
Tuesday, June 05, 2007
IPv4 Address validation
Well, there are two criteria that IP addresses need to really meet:
1) They need to have 4 octets, each octet containing from 1 to 3 digits.
2) Each octet's digits must make up a number between 0 and 255.
With that in mind, I went directly after the first of the two, validating the format of the IP address entered. The regex that I initially came up with is listed in #1 and the other two regex's came from the "Mastering Regular Expressions" book:
#1 m/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
#2 m/\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?/
#3 m/\d(\d\d?)?\.\d(\d\d?)?\.\d(\d\d?)?\.\d(\d\d?)?/
What is so great about Perl is TIMTOWTDI, which affords us being able to have 3 different regex's that come to exactly the same solution.
You may be asking yourself, what do I do with the regex(s) above? Well, you can take them and use them in your code to validate an entered IP address, or put it into a loop to test an entire file full of IP's to test their validity.
Here is an example of how to incorporate the above regex(s) into some code. Please know that this is only how I do it and my differ considerably from how you implement it:
#!/usr/bin/perl use strict; use warnings; my $ipaddr =; { if($1 <= 255 && $2 <= 255 && $3 <= 255 && $4 <= 255) { } else { } } else { }
Happy Perl Coding!!!
Monday, June 04, 2007
Update on curing the hunger
Over the weekend, I listened to an older Perlcast that Josh McAdams made, where he interviewed Randal Schwartz. Whether or not that was made before or after Stonehenge started supporting Perlcast is for them to know. Either way, in the interview, Josh asked Randal his opinion on Perlcast and what he said was that he actually really like Perlcast and the idea of reporting the Perl Community news for all to listen to. He also mentioned that it wasn't trying to do Perl instruction over an audio feed, which Randal did not find any use in.
Hearing that I thought longer and harder about what I was leaning towards and realized that he is completely and totally correct. You have to see code to learn it, that is why there are so very many resources and documents on the web and also so very many books on the topic of Perl.
I am one of those people that when I come across something I want to know more about or if I somehow think of an idea for a pet project, I write it down somewhere. These scribblings are actually gathered into lists. Yes, I do eventually get to them, but sometimes it takes a while. Usually if I need something and its on a list or if I get bored I tend to reference my list(s) for something to do. I took a look this weekend at my list of pet projects and did see a couple on there that would afford me the opportunity to delve into some modules that I have been wanting to play with. So, that is where I will turn to to cure my hunger pangs, instead of trying to create a podcast that wouldn't be all that useful. (sure, a video podcast would do the trick, but I have neither the resources nor the hosting to hold the files, so that is definitely out of the question at the moment)
Any who, back to work at learning Perl. Happy Monday everyone!!!
Friday, June 01, 2007
The hunger and thirst for everything Perl
Thankfully, the new job that I started a couple of weeks ago has afforded me nothing but the opportunity to code in Perl. Most of what I do is Perl coding and I am happier than a clam. I have learned so much in the last two weeks that I am beside myself at my adjustment to the learning curve. Before starting the job I had obtained Brian D Foy's "Student Workbook" that accompanies the Llama book (Learning Perl). The Llama book has questions and exercises at the end of every chapter, but the Student Workbook contains more exercises to assist with getting you in a better Perl state of mind.
Well, having completely turned my 'learning' attentiveness towards the realm of Perl, I have been thirsting for everything that I can get my hands on to satisfy my burgeoning hunger for Perl knowledge. One of the resources I turned to in hopes of "data....input" was podcasts. Upon searching though, I only found one true podcast related to Perl and that is "Perlcast". Don't get me wrong, Perlcast is absolutely AWESOME, even having a very prominent members of the Perl community, Randal Schwartz, as the roving reporter reporting all the Perl news that's fit to report. Other than reporting the latest Perl news, Perlcast also does interviews with members of the Perl community and segments from conferences.
For me though, this just isn't enough. I want MORE! I have searched a fair amount of the web and unfortunately cannot find any other Perl related podcasts. What was I hoping for? beleive it or not, I was hoping for a podcast on Perl that covered a different topic in each episode. Maybe go over topics such as:
- Scalars
- Arrays
- Hashes
- Modules (creating)
- many different episodes that each time choose a different module and go over it in detail
- Regular Expressions
- and so on, and so forth
Sure, some are out there probably saying, "So why aren't you producing these podcasts"? I have thought of that and believe it or not, I am still thinking about it. My biggest issue would be where to host it as I would need enough space to hold all of the podcast files. Yes, this area would be new to me and I don't have much expertise in creating these, so please bear with me. If anyone has any information, such as software recommendations for creating podcasts(OSS preferably) and also a good place to host it or share it as well would be great.
No, this is not an "I will do it", this is more of a "I will look into the possiblity". This would benefit me as well as I am still up and coming. I guess there is no better way to cure your hunger than to make your own meal(s).
Wednesday, May 23, 2007
What a difference a job makes
Thing with my old job had steadily changed. The workload steadily increased (which isn't a bad thing), but with it came the stress of that increase. The clients would come into the queue and inevitably, the higher ups that were associated with the project would push my manager to assign the project immediately because they waited too long to get us involved to implement the project in a normal time line. The stress level really counts for a good percentage of my reason for leaving, not to mention that after our most recent merger, that ended up with the company we bought literally taking over the place, a whole slew of cutbacks were made in order to save everywhere they could. Now, cutting back is part of the large corporate environment and its expected. But, when they are cutting and cutting and then hiring a 'ba-zillion' middle managers because their organization believes in the "how many tiers can our organization have" approach with the "too many chiefs not enough indians" side effect that goes along with it, then you have raise a questionable eyebrow.
Sure, stopping the purchase of things like tissues and other goodies that made life a little easier is piddly because you can supply those yourself, but when things in the organization change so much that you are watching people walk out the door every week, even those with 20+ years, you get to thinking about #1.
So, I did and I took a look at what was out there. Believe it or not, it wasn't very long before I started going to some interviews, but the interview I was most anxious for was for the position I am in now.
I went from doing client implementations at a large bank, on an e-commerce platform with solutions such as AS2, HTTPS, SFTP, FTPS and a couple of other miscellaneous solutions, to working AT a client site for a company that does most of that companies tech work. What I am doing now it Secure FTP development using Perl.
The biggest pluses to my new job are that not only am I working what so far seems to be a great bunch of people, but I am doing coding with Perl almost every day. I have come to really like Perl in the last couple of years and to have a job that allows me to code in it is just INCREDIBLE!!!
Granted, there are those out there who do it every day and are either tired of it or are skewed by their development experiences who have told me not to be so excited about it, but I cannot help it. Try to remember back when you were excited about it, nobody could stop you from talking about it or keep you from your keyboard or the Perl forms. That is the state I am in now, just Loving what I am doing.
Wednesday, May 16, 2007
Of coding Perl in relation to Ksh scripting
In scripting, you can do something like the following:
##### begin code #####
myVar = this
myNextVar = ${myVar}andthat
print $myNextVar
##### End Code #####
The output of this would be: thisandthat
The point above is that you can enclose the variable(except for the dollar sign) in curly braces and it is interpolated to its value and then whatever comes after the curly braces will be appended to the name.
Well, this becomes impossible to do as shown above as the curly braces are used for other things in Perl. So, I did some thinking and figured that I could use the concatenatin character to solve my problem. Here is the above code as how I would have written it in Perl:
##### Begin Perl Code #####
my $var = "this";
my $nextVar = "$var" . "andthat";
print $nextVar;
##### End Perl Code #####
The output of the above Perl code is exactly the same as the shell script, but as you can see, we have to put things together in a slightly different way.
Happy Perl coding!
Monday, May 07, 2007
What drives OSS?
How many of us have researched for a production that does what we want, only to find that the software that does EXACTLY what we want is so far out of our price range that you would have to take out a loan to afford it? So, that said, isn't cost another thing that drives OSS? Most of the software out there is free, but if you want support, that is where you pay $$$, or find an online community. That is the advantage that OSS has over commercial software is even though you could pay for support if you wanted to, the community is large enough that you can normally find the support you need in a forum.
So it is boredom AND cost in my opinion, that drives OSS software developers. Granted, I have searched for a product and not found the features I would want. If I were a full fledged OSS developer, I would take to developing my own software in that case, leading to a third, albeit a little less immediate source of drive, but still another source that needs mentioning.
All in all, it is not just one thing, but many things that drives OSS developers, just like innovation.
Saturday, April 28, 2007
Module Installation(s): UPDATE
Well, Dan was kind enough to respond( and quickly I might add), telling me that it didn't look like my system had the OpenSSL Development libraries and headers installed.
Having recently switched to Ubuntu I have been finding out how much I truely take for granted with distro's like Suse that let you pre select the software to install. Ubuntu just installs a base system and lets you configure it from there. It is one thing that would lead a newbie to Linux to absolutely drink, but it doesn't bother me, I am just learning what the OS's needs are.
Anywho, I installed the dev libraries for OpenSSL, deleted the unpacked module from the "~/.cpan/Build" directory and did a re-installation of the module. This time, after the libraries and headers were installed, the installation went flawlessly.
I did get curious though and tried to do the same to the OpenSSL module ( even though the documentation is.... well..... less than adequate) and it failed miserably as expected. Yes, I have given up hope for that package and suggest nobody else try to install it until it gets some MAJOR updates done.
So, my thanks go out to Dan Sully for his assistance with my missing libraries and headers! Thanks Dan!
Perl Module Installation Woes and Lessons
Thankfully for the lot of us, Perl modules are kept in one standard repository called CPAN(which stands for Comprehensive Perl Archive Network) and is located at cpan.org. You can use the search page on cpan in order to search for Perl modules related to your specific task.
For instance, say you are working with X509 based keys, you could search for X509 modules on CPAN. Funny, but that topic is what led me to write this post. I am working on a script that will examine X509 certificates and parse them for their expiration date, returning the certificates that expire within a certain time frame given by the user.
Upon my first searching for x509 on cpan, I came across the "OpenSSL" module, which includes a number of sub modules for dealing with different aspects of OpenSSL. One specific module is "OpenSSL::X509".
To install a perl module there are typically two routes that you can take:
1. using the command line you issue the command:
perl -MCPAN -e 'install moduleName
replacing
2. using the cpan interface. To access this, you need to type "cpan" on the command line and hit enter. This will bring you to the cpan prompt where you simply type:
cpan> install moduleName
Please be warned that with both of the above methods, if you have never run either of these, that you will need run through the initial configuration that it brings up in order to use it. The configuration has information in it, but personally, with the exception of a couple of questions, I simply took the defaults.
***Note: I am running on Ubuntu and since a lot of stuff is done using sudo, I typically use sudo at the beginning of either of the two above commands as cpan tends to install things where only Root can create items. So, it is recommended that you use it, unless your installation directory is a place you have full access to.
Now, after the cpan interface is configured, you may be warned to install the Bundle::CPAN module. This is a necessity as it has a number of updates to the different packages it comes standard with, so install it before continuing.
When installing the OpenSSL module, I ran into a whole slew of errors. After looking around for a while and not finding the files it was complaining about, I emailed the Perl Beginners mailing list with the issue to see if they could help. Well, Tom Phoenix was first to respond and informed me that modules below version 1.0 are typically considered pre-releases. That said, the module I was installing was pre-release. Not only that, look at all the documentation of your Perl module you are trying to install. The README of this module was haphazardly written with cursing and did not seem complete. I examined the README at the suggestion of Mumia W on the Perl Beginners mailing list.
All in all, the module is not even close to finished and is completely unusable. So I began a search again and came across Crypt::OpenSSL::X509, written by another author, but, ended up with the same issues and same errors. I believe it may have been written around the other one but is at an even lower version number than OpenSSL.
So, after more research and reading, I came across Crypt::X509. The module looked to be extremely well documented with examples and explanations. Not only that, it was a module specifically for parsing x509 certs. I did the install and there were no issues.
So, my lesson most important bit of advice, before actually installing a Perl module, is to do your research. Read all the associated modules in your search and try to find one that is well documented and also does almost if not exactly what you want to do. Crypt::X509 is only version 0.32, but at least it is fully functional from what I can see ( from the installation). Only coding will tell I guess.
Thanks to Tom Phoenix and Mumia W on Perl Beginners mailling list for all the input on this issue. It really helped guide me to an answer.
If you are a beginner to Perl, I highly recommend that you join the Beginner's mailing list. It is quite a handy resource!!!
Thursday, April 26, 2007
Ah yes, now I remember..... I have a blog.
It ends up being a mixture of work, kids and generally life that are the causes, but I am dealing and doing what I have been wanting to.... update my blog.
Well, some of you may remember that I was playing with Ruby a short time ago. Well, the back burner(on an extremely low flame) is where that language headed. I found myself truely delving feet and head first into Perl. I have always had a soft spot for Perl and love the language for its power and versatility. I have always wished I could just do more with it.
Well, I finally started about 1 1/2 months ago by taking some Perl courses online that my work has. They are really good at giving you a solid base and I plan on taking that to the next level( and the next, and the next).
I will elaborate in my next post where I am going technically, right now, I must recharge. Night!!!
Saturday, March 24, 2007
What...Where....... I'm Back.......
What have I been doing? Well, going stir crazy would probably sum it up pretty good at the moment. Between work, family and other craziness, its been since January 15th that I last posted. I thought about it yesterday and couldn't believe it had been so long. **hangs head**
On an update note, I was learning Ruby, but due to my work actually having an online training site again, I have signed up for a bunch of Perl courses. I love Perl and structured training being a better way to learn than OYO (On Your Own) for some things, I am quite pleased that they have available the courses that they do. I have put my Ruby learning on hold so that I can take the courses I signed up for.
The courses are actually quite good and give a lot of examples and coding tips/tricks. Yes, the information is in books like the Camel book and Llama book, but without the extraneous stuff. I have been going through the courses and learning the straight up basics and also reading the Llama book on the side. Learning all of the syntaxes and how to use the control structures, sub routines, and functions ahead of time make understanding what Llama book is putting forth a lot easier to take in without having questions flow through my mind about how something actually works or its syntax since those questions are already answered.
I just finished the part of the course that covered Regular Expressions and I must say that I understand them a LOT better now than I did before. Now, delving into the Mastering Regular Expressions book is more enjoyable because I have a basic understanding of how to assemble a regex to match what I need to. Granted, I am no professional regex'er, but I can wing it while I get more experience doing so.
The training site also has other interesting course such as a number of XML courses that I will definitely be taking. So, Ruby is kind of back-burnered until I finish the other training.
Where have I been? He he, either at work or at home watching the kids while my wife is at work. And to think that this is all due to bills. Once we get all those pesky credit cards paid off, she won't have to work so much, but that is still off in the distance.
With any luck, I will now have the time to make more regular updates to this blog, and maybe even post some useful code snip its that I have picked up.
Monday, January 15, 2007
Oh look.....an update.
So, a belated Happy New Year to everyone and yes, its about time that I continued my postings.