Tuesday, August 14, 2007

Things that I learned yesterday

If there is one thing about being a geek that keeps me going every day, is that saying, "You learn something new every day!". Why? Because it is so true. Being a Perl developer has me writing more code than I can keep up with, and I love it. But the best part is, through all of the coding that I am doing, I seem to learn a minimum of one new thing with each program I produce. Now, "1 new thing" seem pretty low, and don't worry, it is as lately it has been a few new things each time, but learning 1 new thing every day keeps the mind in good shape.

For instance, I was working on a script last week that took a file and parsed out of it a string that was ocourring ( that should not have been there). Well, my script removed said offending lines to another file for "safe keeping", while outputting the good lines to their own file. To make sure that everything worked correctly, I had to balance the new file to ensure that ONLY the offending lines were removed and all other lines ended up in the new file.

So, I delved into the File::Util module, which has a function called line_count() in it, which takes a file as input and outputs the number of lines in the file. What I discovered was that the function was working fine with the first file processed (the original file), but on each subsequent file ( the offending lines file and the new outputted file), the counts were totally off, even so much as the offending lines file's count being zero (0).

So, I emailed the developer who produced the module to get his advice and see if there was an issue with the module. After he did his typical tests and did not discover anything wrong, he came back to ask me to ensure a couple of things:

1. That I ran the close() function on each file handle before actually acting upon the file that each file handle was referencing. Well, this was definitely an issue. I had the close routines after everything was said and done. So, I migrated them to close the file handle(s) before doing the line count.

2. He asked me to turn off buffering for I/O. I was a little new to that and asked him to explain further. He said that all I had to do was to set the variable "$|" to any true value:

ie: $| = 1;
$|++;

This would tell Perl that, instead of storing date in memory, that the data going to file handles would go directly to the file handles and not get stored in memory. This not only ensures that all data is written to the file handle(s) as it should be, but also has the added benefit of clearing up any memory usage from the stored information. Also, one other note, you need to set this variable at the beginning of your script so that everything in the script is effected.

So, after modifying the auto flush variable and closing all file handles, the function seemed to work just fine and outputs everything perfectly.

Many, many thanks to Tommy Butler, the author of the File::Util module on CPAN. Without his help, I would probably still be scratching my head over the issue. Now though, I have a bit more knowledge and experience with which to draw upon with my next project.

No comments:

 
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.