Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Friday, October 06, 2023

Pointing CPAN To Your Local Mirror

 I have been out of the Perl development arena for quite a while (about 10 years or so, to be brutally honest).  That's what happens when your career takes you in a different direction.  

Recently I re-found my wanting to code in Perl after reading some postings by a friend, which made me realize that I was kind of missing it.  

Anywho, that brings me to the reason for this article.  I am re-learning some things, and one of those things is the setting up of your own, local CPAN repository.   There are times when one may not have an internet connection, but wants to work, none-the-less.  This is where having a mirror of CPAN on your local machine is a good thing.  

I am not going to go into how to create the mirror, because there is quite the plethora of documentation and write-ups on that topic, but instead, I am going to write a quick bit on actually pointing to your local mirror. 

Now, it matters not where you have it, but for the sake of this article, let's say that you have the mirror located at /tmp/cpan_mirror  (yes, I am assuming that you are on Linux.  If you are on Windows, I hate to say it but you are on your own as I don't touch the thing).

The first thing you need to do is drop into the cpan cli.  You can do this simply by typing cpan at your command line and hitting enter.  You'll be dropped to the cpan prompt:

 cpan[?]>

Once there, you'll need to do the following things.  I will create a list of tasks first, and then put the commands after. 

  • check the urllist and see what other mirrors are listed
  • remove all other mirrors (saving the url's aside if you wish to reset them later)
  • add your local repo path
  • commit your changes
  • reload cpan

Ok, so here are the commands:

$ cpan       (to drop into the cpan prompt)
cpan[?]> o conf urllist
cpan[?]> o conf urllist pop   <--- run this till any urls are out of the list
cpan[?]> o conf urllist "file:///tmp/cpan_mirror/"
cpan[?]> o conf commit
cpan[?]> reload cpan

After that, you should be able to, at the cpan prompt or using the cli, install modules.  You can test its pointing to your local repo by turning off your wifi and ensuring you cannot get to the internet.

To reset it back to internet based mirrors, just follow the same procedure, adding the mirror you like to use in place of your local path, and without the double quotes.

Enjoy!  And TMTOWTDI!!

 

 

Wednesday, March 01, 2017

I give you.... PyBlueprint

About 3 years ago (or so), I created a bash script for starting a new Python project.  I was tired of setting up each new project by hand and wanted a quicker way to do it as a time saver.   That endeavor yielded a script that I have used for the last 3 years.

While the script works fine from a bash perspective, I was not yet happy with it in that state.  So, I embarked on re-writing it in Python.  I didn't get the chance to work full force on it right away, but over the last week or so, I have made an immense amount of progress on it.  So much, that I am ready to announce the project for people to download, play with and hopefully enjoy.

Overview:

PyBlueprint's whole purpose is to create a base project directory for new Python projects.  Some of the features that the script has are:

    - Creates project directory for you and populates it with a base set of files (script, README, etc)
    - Initializes a new git repo (or not, if that is your choice)
    - Create a new Github or Bitbucket repository for you to push your code to

I could be over estimating, but I would say that this project saves me about 5-10 minutes of initial project setup so that I can get to coding my projects quicker.

I know that not everyone works in the same manner and we each have our own requirements.  I just ask that if you are a python developer, please give this a shot and see if it works for you.  If there are suggestions, I am open to them, please just open an issue in the project.

Project Link:  https://bitbucket.org/numberwhun/pyblueprint

UPDATE: I realized a short time ago that I still had the repository set to be private.  I have changed it to be a public repo.  Sorry about that!

Monday, November 23, 2015

Run Pip Against Your Local Pypi Server

A couple of weeks ago I posted about creating your own pypi mirror.  Being the anal type and wanting a complete mirror (in case I needed a slightly older version of any module), I gave a quick tutorial on how to do this using the bandersnatch module.

Hopefully you have had time to download everything and have sitting nicely in a directory.  I say hopefully as it was 180gb when I downloaded it and I know that not everyone has a fast connection.

As a note, anything done from this point is done inside of a virtual environment.  If you haven't used virtual environments and are not familiar with them, I highly suggest you read about them as they have worthy benefits.

Now that you have your mirror created you need to put it to good use.  To do this there are a couple of things you need to do:
  • install a module for running a pypi server
  • edit the appropriate config file(s) to point to the new repo
  • start the pypi server
  • install a module while configured to point to your new repo
 For the first piece, you will need to install a module that provides you a pypi server.  There are a few such modules out there, so feel free to research them and find one that works for you.  For our basic purposes (as I don't currently need much beyond serving the mirror) I am using the 'pypiserver' module.  You can install that with:

    $ pip install pypiserver

Once the server is installed, you will want to start it with the following command:

    $ pypi-server -p 7777 ~/pypi/packages

That will start the server on:  http://localhost:7777/

You now need to make some config changes.  At a minimum you need to change  ~/.pip/pip.conf
You will want to mofify 'index-url' in the 'global' section of the config file to point to 'http://localhost:7777/'.   (I do suggest you remember where your config file was pointing to before you change it.  Its always a good idea to make a backup copy of the config file before you make your edits).

Now that you have that configured things and have the server running, its time to turn off your internet connection and do a 'pip install' of a module and see if it works.  



Tuesday, November 10, 2015

Creating A Pypi Mirror

One of the things I have read a lot of people like to do, is to create a local pypi mirror.  That way, when the real pypi is on the fritz or you don't have internet access, you can still install modules and work on your pet projects.

I worked through a bunch of different modules for creating a pypi mirror, and most of them seemed to make a mirror of the most recent versions of modules.  Which would be fine, except I am anal and wanted as complete of a mirror as I could get. 

So, after playing around with a number of different modules I discovered Bandersnatch.  At first it looked promising (claiming that the mirror would be about 120 Gb).  Considering that the module documentation was probably written (and not necessarily updated) a couple of years ago, I could only imagine what that number is now. 

I followed the installation from the above link and installed the module (in a virtual environment) and got it running and let it run until completion (which was about 3-4 hours later).  I did a df of the directory and BLAM!, a little over 180Gb of moudules.  Just WOW!!  Now that is the mirror I was looking for. 

I still want to play with some of the other methods (as having a mirror of the most recent set of modules is also handy), but this is definitely what I was looking for.  Hopefully this information allows others to create their own Pypi mirror as well.  I would say quickly, but that will depend on the internet connection you are using.

Wednesday, July 30, 2014

Python Virtualenv Setup: Updated

I used the script I created to start a new project and realized very quickly that something wasn't right.  I had noticed that the autoenv wasn't working, while it had previously.  I checked other projects I used it with and it worked fine, so I started digging through the code.

Then I saw it.  With all the other projects, there was either a github or bitbucket repository created.  For this project I opted for neither.  Unfortunately, I completed forgot that even if I don't start a project, I need to still initialize the project as a virtualenv.

So, I decided to functionalize the github and bitbucket portions and moved the final step outside of the code that creates the repo's.  I tested it and voila!!!  Worked like a champ.

So, if you downloaded previously, please update your checked out version so you have the latest copy.


Thursday, May 22, 2014

Update: Python Virtualenv Setup Project

I have been doing a bit of work the last day or so on my script for setting up a python development environment called "Python Virtualenv Setup".  When I first mentioned the project on here several posts ago, I mentioned that I was planning on support for auto-creating a bitbucket repository during the script's run.

Well, I not only added support for creating a bitbucket repository, but I added support for github as well.  So, the two popular and competing services are now supported.  I even went so far that if you create one of the remote repos, the script also does a 'git init' in the project directory for you and also provides you the command for setting your origin for pushing your code to your new, remote repository.

I feel like Tim Robbins in Antitrust, "God, I love this stuff!".

Monday, May 12, 2014

A Couple Of virtualenv Notes

A little while ago I posted about starting a python project, and posted a link to a bitbucket project I posted that can be used to create a virtual environment (and have the environment auto activate upon entering the project directory).  I wanted to post just a couple of things that I learned, one of which bit me a touch when working on a recent python script.

I was working on said script and kept getting erroneous errors.  The errors were not making sense, telling me that the module I was working with was not installed.  I called 'bull' as I know I had installed it in the environment using pip.  Well, I thought I had.  It bit me in that I did not remember that when you are working in a python virtual environment, you NEED to use the pip and python executables installed under the environment (in the environment's bin directory).  If you don't, and just call pip or python, it will reference the system version by default.

This was brought even more to my attention when I ran 'pip freeze' to see what modules I had installed, and the list was looooooong.  I quickly realized that I was not using what I thought I was.  So, always make sure you reference the correct application.   Of course, one gotcha to remember is that once you develop your application, if you move it out of the virtual environment, you need to fix any path's that point to the python interpreter.

Consider that a lesson learned for me.  Hopefully it will allow you to forego learning it too painfully (or just annoyingly).

Sunday, April 13, 2014

Free Programming Books and Links

There is no question, Free is just one of those words in the English language that when see, people take notice.

I stumbled across a link to some Free programming books and links that I feel just needs sharing.  Now before you get too excited, the links on that page either take you to either an online version of a book, or a link to a page with a tutorial or article.

To be honest, I have found a bunch of good information in here since finding it and hope that you do as well.

Saturday, March 29, 2014

Starting A Python Project And Enabling Git For Source Control

Sorry for the long title, but it encompassed what the article is about.

I have been doing a lot with Python in order to better learn the language.  As part of my learning, I read about the virtues of virtualenv.  For those who aren't aware, virtualenv is this sweet toy that makes a project directory an environment, and as such, puts an installation of Python into that environment.  You can install modules and such using the commands under that environment and they won't effect the python installation that is on your computer itself.  Its even better because if you screw up and want to start all over, its a matter of simply deleting the directory and starting over.  

But, to do so, you need to:

- create the directory
- run the virtualenv command in the directory
- activate the environment each time you go into the directory and want to work
- deactivate the environment when you want to leave the directory and stop working on the project.

For the last two pieces, there is a nice bit of automation that exists called autoenv.  Its easy to install and cake to setup.  What makes it so excellent is that when setup properly, cd'ing into the project directory activates the environment and cd'ing out of the project directory deactivates it.  Its flippin' sweet! 

Even though that level of automation exists for the {de}activation, I thought it would be nice to have a bit of automation for simply starting a project.  Something that sets up the project directory and even creates the necessary bits in the directory for autoenv to work.  So, I sat down and did it with a little bit of bash.  

You view the project, called Python Virtualenv Setup and even download the code.  I have been doing some refining on the script, so if you decide to play with it, please check back now and again for any updates.  I opened an enhancement ticket under issues as the script currently assumes a specific project directory structure already exists.  I am going to make it a touch more dynamic and have it check for that directory and ask for one if it doesn't exist.  

If you decide to play with the script and find any issues/errors or know of some enhancements, then please feel free to open a ticket under the issues tab.  If its an enhancement request, I will definitely take it under advisement, but please know that the script, as it is written, does what it was designed to.  I haven't put any thought into further expansion of its duties, but its also not out of the realm of possibility.

As the title suggested, I have been using git for my source control for my projects.  As you can also tell from the project link above, I am using bitbucket as well for remote hosting of the code.  I know, you probably asking "Why not Github?".  Well, I did a bit of homework on this, as you can well imagine.  I do have an account on Github and follow a number of projects.  I even have a couple of things I have posted up there.  But in my review of both Github and Bitbucket (among others), Bitbucket was the only one that allowed not only unlimited storage, but also unlimted public AND private repositories, all while under the free account.  That was quite attractive from a money conscious mind, I have to say.

Ok, back to what I was actually getting at, and that was that I wanted to cover a bit of the quick basics of how to start a project in git, especially for those who are just starting out with it.

Like starting any project, you want to make sure you at least have you project directory setup with at least a "README.md" file. Both git and bitbucket will read this file and use it as your project's page.  I would make sure that you put in there all about the project, including things like what its about, how to use it and even any examples and insight.   The people who download your code will be relying on it for answers.  Don't forget to put installation instructions, no matter how rudimentary you think that might be.  (Thinking about that, I should do that for the above project.)

After you have the directory ready, you are going to go into the directory and issue this command:

    git init

That will initialize the project with git and enable source control.   The next thing to do is to add all the files(or file) that you have created.  You can do that with the following:

    git add .

Once you are ready, you will want to do your initial commit and get your files under source control.  Usually when commiting files, you list everything you changed or added.  That way you can look through the log and find the revision you need.  For the initial commit, you can comment just that:

    git commit -m "Initial commit message"

The -m allows you to add your comments in double (or single) quotes.  Listing no files after the closing double quotes will commit all files that are pending.   If you are at all unsure about what you have touched and want a quick recap, use:

    git status.

That will print out what is pending and other useful information.

Ok, at this point, you have initialized your repo, added files and checked them into source control.  What you may not realize though, is that all of this has taken part on your local machine and is not yet pushed to any remote servers.  Why?  That's how git works.  To get the files to a remote server, you have to tell it where to go and then you will need to push it.

As said before, I am using bitbucket.  So, this example is for that site.  This is how you would tell git that you want this specific project to be pushed to bitbucket:

    git remote add origin git@bitbucket.org:/.git

Note:  In order to get this information, you will have to have already had the project created on bitbucket.  Believe it or not, bit bucket will give you the actual above command so you know what to specify.  They are nice like that.

Once you have the project defined and the origin added, you should then be able to do the following to push the project up to bitbucket:

    git push -u origin --all

If everything was already setup correctly on bitbucket, then that should work just fine for an initial push and all pushes thereafter for the project would simply be a matter of issuing a "git push" in the project directory.  If this is the first time you are using your keys, or if there are problems, then you might see something like this:

    Permission denied (publickey).
    fatal: Could not read from remote repository.

This simply means that there is an issue with your connecting to bitbucket.  It could be any number of things.  First thing is to check what identify your passing.  You can check that with:

    ssh-add -l

If that returns nothing, then you aren't passing anything and that isn't good.  Since we are using ssh to push the files up to bitbucket, you are definitely going to have to make sure that you have already added your ssh public key to your account in bitbucket.  If you haven't, then do so.

After that, you'll have to do the following in the project directory:

    ssh -i ~/.ssh/ssh_keyname -T git@bitbucket.org

If you have done everything you need to, then you should see output that looks like this:

    Identity added: /Users/xxxxxx/.ssh/ssh_keyname (/Users/xxxxxx/.ssh/ssh_keyname)
    logged in as xxxxxxxx.

Obviously identities have been changed to protect the innocent, but you get the idea.  You can now check if you have an identity added to ssh with which to connect with:

    ssh-add -l

That should output your key if it was added correctly.  You should now be able to go ahead and issue the initial push command above to add all your files to bitbucket.org.  If you are still having issues, I would certainly suggest you take any errors you are getting and plug them into Google and see what comes up for results.  

Saturday, July 23, 2011

Thoughts on HTML5

Let me preface this post by stating that I am one of those coders that believes in doing things the right way. It is absolutely appalling to come across messy, uncommented code that leaves you scratching your head trying to figure out exactly what it does. Its just as appalling to come across a website, look at the code and discover that they used tables for the layout of the site. To top it off, none of their tags are closed making people with my coding beliefs shudder.

I know there are people out there reading this who say "So what? What's the big deal?". There are standards out there for a reason. Granted they are a bit more enforced in HTML when you use the XHTML standard as you are forced to close your tags, but that still doesn't stop people from using tables for the layout of their site.

I have been looking at HTML5 as a means for creating my new website and am finding that there are things I do and don't like about it currently. One of the things that I am non-plused by, but am going to have to live with is that the people responsible for HTML5 regressed and decided not to enforce the need to close all of your tags, as XHTML required.

I found this to be a good thing, forcing coders to make their code a bit nicer and actually pay attention to the details. By opening this up and allowing people to use thier own styling choice, this is going to make supporting someone elses HTML5 code a bit of a headache.

That point aside, I am finding that to code in HTML5, you have to add a lot of checks into your code to see if certain new add-ons are supported in the browser that is accessing your site. For instance, with forms, they have added a lot of new types which make browsers that support them have a bit more intuitive reactions to those fields (like date pickers or color pickers for dates, or even the email type that tells a mobile browser to configure it keyboard to support email addresses specifically). Unfortunately, at this time, the only browser out there that has support for all the forms additions in Opera. While Opera supports a lot of HTML5, not a lot of people use Opera. Its use is dwarfed by that of Firefox and Chrome.

I have been really thinking about whether I want to use HTML5 and write the code to do the tests, but that really isn't going to be an option going forward. HTML5 is out there and is here to stay. The support in browsers will only continue to increase but the concern is going to be users. There are too many people out there using older versions of browsers. Listen people (you know who you are), just because its working for you doesn't mean its right. You are not only forcing a lot of developers to code for the fact that you are refusing to update, you are actually not getting the proper experience out of a lot of websites that the rest of us are.

I guess I will just have to suck it up, code it once and reuse it.

Monday, April 12, 2010

Catching up.... and a link

As any code monkey and/or geek can tell you, work can certainly cause you to not have time for much else.  Having gone back to the company I was at almost 3 years ago (which I love working for, but had to leave to realize that), I find that I am incredibly busy, but its a good thing.

When I was here before, I was doing client implementations work.   I was one of the two original members of the team who ended up staying around the longest and became the expert on the system and the solutions we provided.  When I came back this past October (after 7 1/2 months unemployed), I came back on the other side of the wall and am now doing Production support for the platform I was implementing on before.  

Don't get me wrong, things have changed a fair amount (while some things remained the same or had gotten worse).  To say that I really like my new position would be a complete understatement.... I LOVE IT!!!  It is always amazing to be able to walk into a job and already know the platform (for the most part).  I just had to learn the intricacies of how my new group works.  

Recently, after having a talk with my boss, it was determined that we needed a database of our own with a web interface to it so that we can all access it, and our boss can run his metrics. Far too often do we have to look in a plethora of different databases and sites to try and find all the information that we need. Then, even when we do find it, it is not always easy to get that information in one general place where it can be referenced again later.  This project will change that.  It will be one place where we can put all the information we gather and have, so that we can reference it and even update it as needed.  

That said, the project that I mentioned back in my last post has been modified to be this project.  So far, I have gotten a second machine out of it (nice quad processing machine with 4 gig ram and a third flat screen monitor for my desk.  (the first two are attached to my dock for my laptop).  I have the machine installed with Ubuntu and configured with Amache2, MySQL, Subversion, and Bugzilla.  

The initial data that will be populated into the database table(s) is coming from a couple of spreadsheets that I was sent.  I am right now writing some Perl code to not only parse the Excel spreadsheets (that wasn't too aweful to do, especially using the Spreadsheet::ParseExcel module), but also populate it into the table(s).   I am being careful about this code because I want to be able to use it to produce a script for periodic updates.  It will need to check for data already existing so as not to duplicate entries.

Ok, the link I promised.  I stumbled across this earlier tonight.  Its an interesting page where you can look up functions for PHP and JavaScript.  It is quite the language reference.  It even has a reference for CSS and MySQL.  So far, I am really glad to have found it.

Thursday, January 21, 2010

New Project in the works

I have been at my new (old) job for a little over 3 months now, but that is more than enough time for me to see that there is A LOT that goes on every day. That platform that we primarily support is constantly having to be worked on due to bugs or issues that we find. Its like every week we are finding something else that is not right or is just plain wrong, or we are adding instances of an existing but, providing more proof of the bug itself.

Unfortunately, other than the notes that my team keeps in their spiral bound notebooks, we do not currently have any way to track the reports that we make and the bug tickets that get opened.

I have been researching the many (and I mean MANY) issue / bug tracking systems that are out there as a possible way for us to make entries and track each one, and also have the ability for our manager to run a report against it and get his own excel/csv spreadsheet for his weekly meetings.

Unfortunately, I have not been able to find any system that meets my exact needs. One of the most important requirements, being that I do not own the box where my site is hosted, is that it be wicked easy to install. It can require Perl modules or such, but I do not have root access and I have seen a few systems that require it.

The reporting piece is also an extremely important piece. Most of the systems that have reporting produce things like graphs and charts, but do not produce spreadsheets or csv files. The one that I did find is a Windows based system and I am working on Linux systems.

So, I am at the moment working on a full list of requirements and features that need to be implemented into this new system. This will certainly be one of the biggest projects I have ever worked on if it takes off as I expect. Of course, it will also be a lot of fun to boot.

Tuesday, November 13, 2007

No More Perl Monopoly for Active State

As anyone who code's Perl on the Windows platform knows, the choice that you have (had) for Perl on Windows was Active State. It worked great for your scripting but when it came to installing modules it wasn't always the easiest thing in the world. Granted, the graphical installer was a godsend for those who don't like command line, but what about when you couldn't find the module you needed to use?

In the past few months I have found a few modules that I was wanting/needing to use in a project and they could not be found anywhere in the PPM (Perl Package Manager - this is what Active State Perl uses). If you investigate further, you will find that PPM does not use CPAN directly (as I think it should) but instead uses the Active State PPM repository by default. There are a couple of others out there as well, but they only add a few hundred packages (modules) to the mix.

Well, I did some inquiring to see if anyone knew of a complete repository to use for PPM, but instead was thrown a link to something completely new. Strawberry Perl! My curiosity peaked and being completely intrigued, I decided to download this new offering and see if it actually worked.

From what I was told, Strawberry Perl used the CPAN interface that most of us are already familiar (or quite intimate with) from the Unix world. That said, for those of you who remember (no matter how vaguely) the CPAN initial setup, the configuration manager looks for the existance and path's of specific utilities in order to be able to do its job.

Well, I made a list of the utilities that it requires and have provided those below for your reference:

The following utilities are located at the gnuwin32 website (or use the gnuwin32 listing page):

  • bzip
  • gzip
  • tar
  • unzip
  • wget
  • less (my favorite pager)
When you download the above utilities, be sure to grab the binary packages. When you unpack the first one, it should go into c:\Program Files\gnuwin32
That is the base path that you should then use for extracting the other 5 utilities as they have the same directory structure internally (ie: bin director, etc)

Here are other utilities that you will need (each is a link to its URL):

  • curl
  • lynx
  • ncftpget (and ncftp) - Both are installed with the same single download. Be sure and grab the client software (version 3.2.1 as of this blog entry).
  • gpg
It is up to you whether or not you put each utilities path into your system path, but if you don't, you will have to enter the path to where each utility is located manually during the CPAN setup, just as I did. Once you get all of these installed, and also have Strawberry Perl installed, you can then run the CPAN configuration by entering the following command:

perl -MCPAN -e shell

That will kick it off. I simply took all of the defaults, with the exception of the utility paths that were not displayed, I entered them manually.

Once I had completed the software installations and gone through the configuration, I entered the following command from within the CPAN shell environment:

install Bundle::CPAN

This confirmed to me that everything had worked as it installed beautifully. I then installed a few modules that I needed and ran a few scripts that were around from when I had Active State Perl on the machine (yes, it was uninstalled before installing Strawberry Perl. Sorry for not mentioning that earlier).

Anywho, I hope that those of you that code Perl on Windows will do your due diligence and go against what used to be a Perl / Windows Monopoly and switch from the limitations of Active State over to the freedom and (IMHO) superiority of Strawberry Perl.

Tuesday, August 21, 2007

Opinions, opinions

We have all seen the postings on almost every single forum that is out there. You know what I am speaking of, those postings that read, "What is the best IDE for BLAH language?". If you have read any of the tirades.....er.....responses to those questions, then you are fully aware that it is a "my opinion is better than your opinion" atmosphere.

You may get a couple of responses in the beginning, where people are telling you what IDE they use, the pros, the cons, and why they think it is so wonderful, but then it starts. You get this multitude of fascist dictator types that absolutely insist that "there is no better IDE than (input editor here) and that all other editors are crap in comparison!". You even have the old school folks, some of whom can remember creating punch cards, who believe that command line editors or vi are the best editors.

If you are one of those that is getting ready to ask that time-(de)tested question of "Which IDE is better for ...?", then just DON'T!

Here is what I believe, and no, I am not going to go and follow the masses, preaching what I think is the best editor. Instead I am going to sum it up with this..... try them all. Download and install a number of editors. Play with them, write code with them, debug with them, get to know them. While you are doing this, take notes on what you like and dislike about each one. Then, when you are done, compare all of your notes. You have to not only look at the notes, but you have to think to yourself, "will I still like this editor in a 6 months? a year? " The answer may very well be, I don't know.

I am old-school unix. I believe that the command line rules and vi is the best day-to-day editor. All of the coding that I have learned has been by hand. I prefer not to learn with a fancy, shmancy do-it-all-for-you editor as I won't learn anything. I like learning a new language in an editor like vi because I get to debug my code by hand and not rely on a program to tell me what is wrong. This allows me to assess the errors and get my coding (by hand) down to a science. After I am more than comfortable, then I migrate to a more comfortable editor that will save me time.

While vi will always have a place in my heart and my editing world (being the first editor I used on Unix), I must say that I have leaned toward Active State's Komodo for my day to day coding in Perl ( and other misc languages, including HTML). Yes, some will tell you it is a beast and clunky slow. Personally, it takes a minute to start up, but after that, I don't have any issues. I don't have this insatiable need to have my editor at my fingertips within a nanosecond of clicking on the link to launch it. I am patient enough that I can wait the 30 or so seconds that it takes to launch. I use it because I like its syntax highlighting, code sense (hints, kind of like Micro$oft's Intellisense), and overall comfortable feel.

That my friends, is what I think the key is..... comfort! You have to pick an editor that you like and not listen to the skewed views of the mass critics out there.

In a posting to the Boston Linux User's group, Uri Gutman wrote, "so my main point is that coders need to be smarter about their analysis, architecture and design and less caught up in tools like IDE's and syntax highlighting. you can have the greatest IDE in the world and
still come up with crappy code and solutions. whereas a good software design can be written and debugged with any set of tools."

That is one of the best statements that I have read on the subject and it is something that I have believed in for some time. If you aren't able to write good code and be able to debug it thoughtfully, then no editor in the world is really going to help you!

Happy coding!

Wednesday, August 01, 2007

Checking for duplicates

If there is one thing that I love about Perl, it is that there is always something new to learn. In my case, I like it to be a few things every day, but that is just me.

In my last post, I mentioned about one liners and that I was working with some code that was rather puzzling to figure out. Well, I figured it out and with the help of Learning Perl, 3rd Edition. I have said it many times before and I will say it again. As much as the Camel book is famed as the "Bible of Perl", I tend to keep the Learning Perl book much closer to my keyboard.

There one liner that I was working on figuring out was as follows:

perl -e '$count=0; while (<>) {if (! ($var{$_}++)) {print $_; $count++;}} warn "\n\nRead $. lines.\nTook union and removed duplicates, yielding $count lines.\n"' ./file1 ./file2.txt > ./combined.txt

This code is supposed to take in the two files (file1 and file2) and combine them into one file (combined.txt), all the while, removing any duplicate entries. What puzzled me was HOW IS IT DOING IT? Yes, if you are wondering, it does work. Any Perl guru's out there are already nodding their heads as they probably already know how.

The magic of this code is in the "$var{$_}++". What happens is this, the code takes in the first file and reads it line by line. It then takes each line in turn and creates a key in the hash with it, but it is UNDEF as there is no value assigned. This ends up being a true test. The next line is read in and again it creates a hash key with the line as the key, only this time, if the key already exists, then the test is false as it is already existing and undef, so, the line will not be added to the output file. Its a little confusing, I know, but it works and it is how it was designed to work. Personally, its a great, short system for removing duplicates.

If you still have questions, I recommend you look at the example on page 153 of Learning Perl, 3rd Edition. Yes, I know they are up to 4th Edition, but I have my 3rd edition copy with me at the moment.

Happy Coding!!

Tuesday, July 31, 2007

Bring on the one liners!

I have been playing with Perl for a couple of years, off and on, not really that intensely. About 6 months ago I really picked up the pace when my last job released its online training courses. I ended up taking the 3 main Perl courses and learned A LOT, all at the companies expense. Almost 2 months ago I landed the job I am in now, as a Perl Developer, and watch as my learning curve took a sharp incline towards the ceiling. I love it when it does that and I really into what I have to learn.

Since getting this job, I have written a couple of scripts for work here that are a few hundred lines long and others that are only a couple hundred lines in length. Not too shabby and they work great.

Recently though (starting last week), I have been working on a Production issue. I did a fix for the issue but have since then been learning how the scripts work. The project is mainly shell scripts, but one of the scripts that is called is a ".sh" file that only contains... yup, you guessed it... a Perl one liner. This one liner is supposed to compare two files and return the lines that aren't duplicates.

Well, in my examination, I will be darned if I can find any where that the one liner is actually doing a comparison. To me, it looks like the script just simply outputs the first file and then appends the second file.

Anywho, that issue aside, during the whole research that I have been doing, I have had the opportunity to really delve into how to do one liners in Perl. It is really rather interesting. Granted, its not my favorite way to code Perl, but for the quick dirty job that takes seconds, it is definitely the way to go.

Friday, July 06, 2007

Watch that punctuation

I'll tell you, there is nothing like coding all day and having a really good time doing so, and then, right around mid-afternoon you hit an error that just stops you. You can't figure it out for the life of you.

I am working on a script to automatedly download ALL of CPAN. ( I know.... WHY? Because I want to, that's why. Plus, my laptop isn't always able to get online and when I need a module, if I have it readily available, I can manually install it.)

Basically, the script fetches a copy of the CPAN modules list, and I am then parsing out of the HTML that was returned, the paths to the modules, including the module names. Here is the section of code that was bugging me:


if($line =~ m/\.gz/)
{
@elements = split(/"/, $line);
print PATHS ("$elements[1]\n");
}


I was really wanting this to work ( and the code above does), but when I IM'd Merlyn (Randal Schwartz), he noticed that I was missing a double quote in the print path. Man, after all day coding, I don't blame myself for missing that one. Thanks Merlyn for the second set of eyes on that one!

Once I have the code completed I will post it here.

Tuesday, July 03, 2007

[TAOREs] File or Directory from a listing

Welcome!! This is the first article in my new column on regex's, where I will cover regex's that I have written or found(obviously giving credit where credit is due) and also lessons on writing regex's. I know that I am starting this column without even a lesson on regex's, so if you are new to regex's and really don't know much or anything about them, then this will seem quite foreign and I appologize for that. Please know that the lessons will follow shortly. Without further ado, lets jump in feet first.

I was working on a script a couple of weeks ago where the script had to FTP into a server, grab a directory listing from a specific directory, and then send an email to a distribution list if a file was found in the directory.

Seems pretty straightforward, but I quickly realized that this was an opportunity for me to get a little practice writing regex's. The remote server was windows, so when the FTP connected and ran the "dir" command, the output had the usual FTP banter, but also a line similar to the following when a file was found:

drwxr-xr-x 7 Administrators group 0 Jun 04 22:07 myfile.txt

This is the typical output format for FTP on a Windows machine. Yes, it is the same format as the long listing provided by an "ls -l" on a unix machine. So, knowing that, I quickly set out to write a regular expression to match just the "myfile.txt" file name of that line.

Here is the regular expression that I came up with to match the file name in that line:


m/^.+\s+\d\s+\w+\s+\w+\s+\d+\s+\w+\s+\d+\s+\d{1,2}:\d+\s+(\w+\.\w+)$/


This is pretty straight forward (if you know something about regex's). Let me break it down with the regex formatted slightly differently:


m/ # start the match
^ # start from the beginning of the line
. # match any single character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d # match a digit
\s # match a space
+ # match preceding element one or more times
\w # match a word character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\w # match a word character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d # match a digit
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\w # match a word character
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d # match a digit
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
\d{1,2} # match a digit, at least once, but up to twice
: # match a colon
\d # match a digit
+ # match preceding element one or more times
\s # match a space
+ # match preceding element one or more times
( # Start subexpression group for capturing
\w # match a word character
+ # match preceding element one or more times
\. # match a period, yes, this needs to be escaped as the period is a regex character too
\w # match a word character
+ # match preceding element one or more times
) # End the subexpression group for capturing
$ # match the end of line
/ # end the regex


I know, it seems a bit messier, but this is one way of writing a regular expression so that you can annotate what is happening. Personally, I like the first version better.

Now, with this regex, if your file name (or directory name) was in a different format that something like "myfile.txt", then you would have to edit the regex to reflect that difference, or risk the code not working.

In the above though, the part of the regex that is enclosed in ( ) will be placed in the special variable $1 so that its value can be referenced elsewhere in the script.

The Art Of Regular Expressions

I have been playing with Perl now for the last couple of years, somewhat actively. My playing turned into a job when I accepted the position that I am presently in. Before I found this job though, my last job was kind enough to offer online, self-paced CBT type training that you could do in your own time. So, I took a gander at it and they had a number of Perl courses.

Now, I am ALWAYS learning and can usually be found with a computer book somewhere in my vicinity pretty much most of the day. I never miss a chance to learn something new if I can help it. That said though, I still will admit that there is nothing like some of the training you can find that gets right down to the nitty gritty without beating around the bush. This is especially true when you are trying to get a solid base in the basics of whatever you are learning.

The CBT's that my last job provided me access to did just that. Not only did they give me a really good understanding and grounding in Perl basics, but they also gave a really good lesson on Regular Expressions.

Regex's (as they are also affectionately referred to) can be some of the most elusive topics to get a grasp on in Perl. I am no expert by any stretch of the imagination, but the course that I took gave me an incredible head start into the world of regex's.

Before that course, I knew a fair amount about using pattern matching in Unix. I was quite adept at shell scripting and could write an awk script or two to do what I needed to. But, when it came to Perl's regex capabilities, I was lacking. ***BAM*** Then came that course. It truely opened my eyes to the overwhelming beauty and seductive power that Perl regex's have.

With a little "to-the-point" training and guidance, I think anyone can easily get a grounding in Perl regex's. In fact, I think I am going to add on to this blog and start a column regarding regex's. I will not only post some mini lessons on regex's, designed to further your knowledge in how to assemble regex's, but also examples of regex's that I have written as well, that work for the purpose they were created. Hopefully all of that will provide you a new route for reference when trying to use regex's.

I think I will call the new column "TAOREs" (for The Art Of Regular Expressions). I hope that you enjoy and also contribute any suggestions that you may have as well.

Monday, June 11, 2007

The Art of the Automated Download

How many of us have found a site that ended up having a whole bunch of files on the site, for download, that you really wanted? Well, I found a site just like that. Too many sites use PHP or something else, to where it used ID's and other codes to refer to the documents, instead of just putting links into their web pages. Once in a while though, you do come across a site that DOES put the actual links to the files on the page.

In the case of the site that I am referring to, there are a little over 2000 files that I wanted to grab. The thing is, I didn't want to have to site there and "right-click-save as" for each and every one as that would have taken "days" to complete. So, noticing that all of the files had actual URLs that led right to them, I looked at the page source. There it hit me. Perl!

So, I copied the source and quickly drummed up a regular expression which grabs all the URLs of all of the pdf files on the page. After I grabbed the URLs, I put together some code which quickly went out and grabbed each and every one in turn and saved it to my hard drive.

This sounds all simple and stuff, but for someone like me who still considers him a novice in the Perl world, it did take a couple of hours of research. First I tried to use the "WWW::Mechanize" module and was able to retrieve a complete list of the pdf files and their paths, but not the actual files. I tried other packages and such, delving into LWP itself, but I could not for the life of me get this code to actually download the files.

I found "lwp-download" and gave it a shot. Wow! It looked like it was working, up until the 904th document, where it died. I couldn't figure it out nor understand what was going on. Why was this dying at the same point every time. Well, I did eventually figure it out and was able download ALL of the over 2000 files to my hard drive. I couldn't believe it, but the routine I used only took a few minutes to download all of the files (granted, I have 15 Mb FiOS for my internet, so please bear that in mind as well).

Just as an FYI (and you can get this looking at the code, I used the LWP::Simple::getstore() routine to download the files. It was a lot easier than going through the process of figuring out why my WWW::Mechanize wasn't working, believe me. I will figure that module out later, but for now, this did exactly what I wanted.

This is probably a bit much and others would more than likely have a better way to do it, but here it is. Here is the code I used to parse HTML code for its links and download them.


#!/usr/bin/perl

use strict;
use warnings;
use File::Basename;
use LWP::Simple;


#############################################
# Read the entire HTML file into an array, line by line so that we can
# parse out the information we need, one line at a time.
#############################################
my @code = `cat /home/jlk/development/perl/stampAlbums/code.txt`;

if (-e "/home/jlk/development/perl/stampAlbums/files.txt")
{
`rm /home/jlk/development/perl/stampAlbums/files.txt`;
`touch /home/jlk/development/perl/stampAlbums/files.txt`;
}


foreach my $line (@code)
{


############################################
# The following code takes the site's html file(in this case, it
# is the stampalbums.com download site) and parses out all of the
# downloads URL's.
############################################
if($line =~ m/^\s+ {
my @splitLine = split(/\"/, $line);

##############################################
# Now, open the file in which you will store all the URL's to
# the files and write each URL on a seperate line.
##############################################

open(FILES, ">>/home/jlk/development/perl/stampAlbums/files.txt");
print FILES ("$splitLine[1] \n");
close(FILES);
}

}

###########################################
# Open the file containing all of the URLs
###########################################
open(FILES, "< /home/jlk/development/perl/stampAlbums/files.txt");

###########################################
# Do the download of the files
###########################################
foreach()
{
my $localdir = "/home/jlk/development/perl/stampAlbums/albumPages/";
my $path = "$_";
my($filename, $directories, $suffix) = fileparse($path);

LWP::Simple::getstore($_, $filename);
}

 
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.