Thursday, May 15, 2014

How To Recursively Download A Website / Web Directory

On the web you can find plenty of open directories to browse, or directories where a company keeps its files for download.  Using Google, you can use its built in advanced search functionality to tell it what you want to find.  For instance, to find open directories that have jpg and gif images, you could use:

     -inurl:(htm|html|php) intitle:”index of” +”last modified” +”parent directory” +description +size +(jpg|gif)

If you are interested in what you can do with your Google search, check out Google's Search Operators page.  It is certainly enlightening for those wanting to go beyond the basic search.

The problem with the site in the browser is that you need to click on each file to download it.  You could even use a coding language to write a script to go to the directory and download all files of a specific type.  Or, if you have the wget application on your computer, you can download that directory listing without having to click on every file.  wget has a number of different options to do its job, and you should spend a little time reading its associated manpage in order to familiarize yourself with them.
For our purposes, we are going to use wget with the following options:

     wget -r --no-parent

The '-r' tells wget to retrieve files recursively.  There is a '-l' or '--level' option to specify how many levels to max out on.  The default is 5 levels.  We are sticking with the max of 5.

The --no-parent option tells wget that it should not follow the parent directory link ( '..' ) in the directory.  That way it limits the recursion to just the directory structure you are in, making where you started it (better, where you specified to start in the url), the top level.The is the url to the directory where you want to download from.  

Now, wget, by default, obeys robots.txt.  That said, website owners can control where you can and cannot go, among other things.  Of course you can also turn off the obeying of robots.txt by using the '-e robots=off' option.   Here is an example of the above command, disobeying robots.txt:

     wget -r --no-parent -e robots=off 

Please know that using this option is rude and could possibly get you (and/or your IP) banned from accessing a site, should the site owner find out.  Especially if you overload their site with your downloading.  Regardless of your intentions, remember to keep in mind that those who can't see you or don't know you, have no idea if your a good guy or a bad guy.

wget is quite flexible and useful.  Read the manpage and it will become one of the many awesome tools in your arsenal.


Update:  One more thing.....

If you are attempting to download the content from an open directory structure and your download fails, you are probably going to want to continue the download from where you left off, without re-downloading.

You will want two other options for this, the '-nc' option and also '-t 0'.  The '-nc' means no-clobber.  Seeing this, wget will not try to download a newer copy of the file if a copy already exists locally.  No, this is not the option for updating, just for not re-downloading if you don't care about updates.

The '-t 0' option basically tells wget to wait, infinitely, for a download to become available, and keep re-requesting.  You may find that a site might shut you out or maybe get bogged down.  This will keep the requests going for the files.  You may find that if it hangs, you'll have to kill the process and restart it.

Monday, May 12, 2014

A Couple Of virtualenv Notes

A little while ago I posted about starting a python project, and posted a link to a bitbucket project I posted that can be used to create a virtual environment (and have the environment auto activate upon entering the project directory).  I wanted to post just a couple of things that I learned, one of which bit me a touch when working on a recent python script.

I was working on said script and kept getting erroneous errors.  The errors were not making sense, telling me that the module I was working with was not installed.  I called 'bull' as I know I had installed it in the environment using pip.  Well, I thought I had.  It bit me in that I did not remember that when you are working in a python virtual environment, you NEED to use the pip and python executables installed under the environment (in the environment's bin directory).  If you don't, and just call pip or python, it will reference the system version by default.

This was brought even more to my attention when I ran 'pip freeze' to see what modules I had installed, and the list was looooooong.  I quickly realized that I was not using what I thought I was.  So, always make sure you reference the correct application.   Of course, one gotcha to remember is that once you develop your application, if you move it out of the virtual environment, you need to fix any path's that point to the python interpreter.

Consider that a lesson learned for me.  Hopefully it will allow you to forego learning it too painfully (or just annoyingly).

Wednesday, April 16, 2014

Add Services In Linux Using chkconfig

As a SysAdmin, you are responsible for everything to do with the systems under your control.  These responsibilities range from system installations to all kinds of maintenance and updates.  You may find, now and again, that you want to have something start up when the machine boots, and the best way to do that is to have an /etc/init.d script to do that for you.  In these scripts, you can define start, stop, restart options, with the commands you want to happen for each option.

To begin, a really handy command that you should get to know is chkconfig.  This command will show you whether a service is set to be on or off in each runlevel.  If you simply type the command with no options, you will see output like the following:

     httpd         0:off   1:off   2:on    3:on    4:on    5:on    6:off

As you can see, the runlevel's are shown, with the corresponding on/off option shown for each.  In order to add a service, you need to first create the script to execute and put it into the /etc/init.d directory.  If you have software that you have unpacked that has its own script already written, you can simply create a link to that script in the /etc/init.d directory.  Either way, once you have done that you can add it as an active service by issuing:

     chkconfig --add

The is the script or link name that you just put/created in /etc/init.d.  After adding it, check that it was successfully added by doing:

     chkconfig | grep

You should see the above output if the service was successfully added.  If you need to change a service to be on or off in a specific runlevel, then the format of the command is:

     chkconfig --level 345 httpd off

The numbers after the '--level' are the runlevels to modify.  As you can see, you can list whichever levels you want, but no spaces, commas, or anything else. After that is the service to modify, followed by whether it is off or on in the listed runlevels.

To remove a service, simply use:

     chkconfig --del

Again, check that it was removed per the above command.

The chkconfig command does have other options available to it, but this should give you a basic overview of how to use it.  If you wish to read further, please feel free to read the man page.

It is important to note that chkconfig does not exist on all systems and is typical on Red Hat based systems.  If you are on, say, Debian based machines (such as Ubuntu), then you will need to use 'update-rc.d'.

We will save that for another post.....

Removing Files Older Than So Many Days In Linux

On our own home systems, we tend not to run into the issue of files from this product or that product, building up and eating your disk space.  But, when you are dealing with servers and the software that people run on them, preserving space by deleting unnecessary logs and other files, is a necessary skill.

Just as an example, we use Puppet where I work to manage system configurations.  While the puppet logs tend to take up a bit of space on our puppet server after a while, its the puppet reports that end up eating the most space.  

The puppet reports are located in /var/lib/puppet/reports.  Under that directory is/are (potentially) a whole slew of directories, one for each machine that puppetizes off of that master.  In each of those directories are *.yaml files.  A yaml file is created each time puppet runs on a machine and connects to the puppet master.  

So what is the first step in purging the files?  Well, lets start by seeing how many files we are actually talking about.  To do this, you can use the find command:

   find /var/lib/puppet/reports *.yaml | wc -l

What that command does is search the reports directory for all yaml files.  It then reports the total cound of all files found.  Next, lets see how many files we are looking at getting rid of.  Let's say that we are going to keep the last 14 days of files.  For that we would simply modify the above command to be:

   find /var/lib/puppet/reports *.yaml -mtime +14 | wc -l

Again, it will report the total.  You will notice that the number is smaller than the previously reported number.  Now, if you are ready to remove those file, a simple modification will do that for you:

   find /var/lib/puppet/reports *.yaml -mtime +14 -exec rm {} \;

You have to just love the power of the command line in unix.  With just a few keystrokes, you can purge the unneeded files with a single command and a few options.  

Sunday, April 13, 2014

Free Programming Books and Links

There is no question, Free is just one of those words in the English language that when see, people take notice.

I stumbled across a link to some Free programming books and links that I feel just needs sharing.  Now before you get too excited, the links on that page either take you to either an online version of a book, or a link to a page with a tutorial or article.

To be honest, I have found a bunch of good information in here since finding it and hope that you do as well.

Tuesday, April 08, 2014

NEW OpenSSL Vulnerability

For those who haven't heard, there is a new OpenSSL vulnerability that was found, dubbed Heartbleed.
If you haven't done any patching yet, you'll want to if you have an effected version of OpenSSL installed on your system(s).

You can test your sites with this software, released today.

To check your systems to see which version of openssl is installed, simply run 'openssl version' and check what it reports.  Versions 1.0.0 and 0.9.8 are NOT effected, but if you are at version 1.0.1 or above, you will need to patch to version 1.0.1g (the newest, released version to fix the issue).

If you are using Amazon AWS, here is how you can update your instances.  Also, Amazon has launched a new AMI that contains the fix as well.

Just a note about the Amazon instructions, you'll need to use the following command to unpack the tarball:

     tar -xvf

The article incorrectly states a command that simply hangs and the above will extract correcty.

NOTE:  Since the writing of this post, the article has been updated to include the 'f' option.

If you are using openvpn, then you may find the application was pre-compiled with openssl 1.0.1e or another effected version, making it a static build.  I heard that OpenVPN is supposed to be releasing an update that uses 1.0.1g.

UPDATE:  Here is a link to a reddit post that provides further information on the bug.

Friday, April 04, 2014

Getting Back To Some Linux Basics

Yogi Berra is famous for many '-isms' which make people laugh.  One of those '-isms' that I like is:

       "Life is a learning experience, only if you learn"

One of the things I love about my job is that I learn something new every day.  While I have learned a lot over the years (thus far), I also am humble enough to know that there are so many things that I don't know.  

In the spirit of learning, l wanted to share with everyone a link to some Linux basic commands.  While Linux has a friendly GUI front end to it which makes it easy for someone new to the operating system to get around and get used to it, the true power of the operating system lies in the command line.  Some would say its a dying art, but I refuse to believe that.  There is so much you can do on the command line, faster than you can through a gui, that it will never truly go out of style.  

Here is a link to a nice set of Linux commands that not only give a quick overview, but they also give examples of the commands being used as well.  What was even nicer about the folks over on that site, they even created a pdf version of the page.  Please keep in mind though that the pdf version does not contain all of the examples that are linked to in the main page.  

That page (and accompanying pdf file) are only a small subset of the commands available in Linux.  If you are looking for a more complete reference, here is one that goes over the commands that are part of the Bash shell (one of the more popular shell environments in Linux).

And, last but certainly not least, for those of you who are anal enough (like me) to want a complete reference, here is a link to an online version of the Linux Complete Command Reference (pdf version).

Even if you don't make Linux part of your career, its beneficial for you to check it out.  Not only is it far more secure than Windows and none of the virus' effecting Windows based systems can effect Linux, but its also FREEEE!!!!!   I know, as my Dad asked me before I switched him over, "Where will I get support?"  My answer to him was "Me!".  To you, my answer is "Google!".  Its the best support reference I can give other than knowing someone who knows it.  

Enjoy!

Saturday, March 29, 2014

Starting A Python Project And Enabling Git For Source Control

Sorry for the long title, but it encompassed what the article is about.

I have been doing a lot with Python in order to better learn the language.  As part of my learning, I read about the virtues of virtualenv.  For those who aren't aware, virtualenv is this sweet toy that makes a project directory an environment, and as such, puts an installation of Python into that environment.  You can install modules and such using the commands under that environment and they won't effect the python installation that is on your computer itself.  Its even better because if you screw up and want to start all over, its a matter of simply deleting the directory and starting over.  

But, to do so, you need to:

- create the directory
- run the virtualenv command in the directory
- activate the environment each time you go into the directory and want to work
- deactivate the environment when you want to leave the directory and stop working on the project.

For the last two pieces, there is a nice bit of automation that exists called autoenv.  Its easy to install and cake to setup.  What makes it so excellent is that when setup properly, cd'ing into the project directory activates the environment and cd'ing out of the project directory deactivates it.  Its flippin' sweet! 

Even though that level of automation exists for the {de}activation, I thought it would be nice to have a bit of automation for simply starting a project.  Something that sets up the project directory and even creates the necessary bits in the directory for autoenv to work.  So, I sat down and did it with a little bit of bash.  

You view the project, called Python Virtualenv Setup and even download the code.  I have been doing some refining on the script, so if you decide to play with it, please check back now and again for any updates.  I opened an enhancement ticket under issues as the script currently assumes a specific project directory structure already exists.  I am going to make it a touch more dynamic and have it check for that directory and ask for one if it doesn't exist.  

If you decide to play with the script and find any issues/errors or know of some enhancements, then please feel free to open a ticket under the issues tab.  If its an enhancement request, I will definitely take it under advisement, but please know that the script, as it is written, does what it was designed to.  I haven't put any thought into further expansion of its duties, but its also not out of the realm of possibility.

As the title suggested, I have been using git for my source control for my projects.  As you can also tell from the project link above, I am using bitbucket as well for remote hosting of the code.  I know, you probably asking "Why not Github?".  Well, I did a bit of homework on this, as you can well imagine.  I do have an account on Github and follow a number of projects.  I even have a couple of things I have posted up there.  But in my review of both Github and Bitbucket (among others), Bitbucket was the only one that allowed not only unlimited storage, but also unlimted public AND private repositories, all while under the free account.  That was quite attractive from a money conscious mind, I have to say.

Ok, back to what I was actually getting at, and that was that I wanted to cover a bit of the quick basics of how to start a project in git, especially for those who are just starting out with it.

Like starting any project, you want to make sure you at least have you project directory setup with at least a "README.md" file. Both git and bitbucket will read this file and use it as your project's page.  I would make sure that you put in there all about the project, including things like what its about, how to use it and even any examples and insight.   The people who download your code will be relying on it for answers.  Don't forget to put installation instructions, no matter how rudimentary you think that might be.  (Thinking about that, I should do that for the above project.)

After you have the directory ready, you are going to go into the directory and issue this command:

    git init

That will initialize the project with git and enable source control.   The next thing to do is to add all the files(or file) that you have created.  You can do that with the following:

    git add .

Once you are ready, you will want to do your initial commit and get your files under source control.  Usually when commiting files, you list everything you changed or added.  That way you can look through the log and find the revision you need.  For the initial commit, you can comment just that:

    git commit -m "Initial commit message"

The -m allows you to add your comments in double (or single) quotes.  Listing no files after the closing double quotes will commit all files that are pending.   If you are at all unsure about what you have touched and want a quick recap, use:

    git status.

That will print out what is pending and other useful information.

Ok, at this point, you have initialized your repo, added files and checked them into source control.  What you may not realize though, is that all of this has taken part on your local machine and is not yet pushed to any remote servers.  Why?  That's how git works.  To get the files to a remote server, you have to tell it where to go and then you will need to push it.

As said before, I am using bitbucket.  So, this example is for that site.  This is how you would tell git that you want this specific project to be pushed to bitbucket:

    git remote add origin git@bitbucket.org:/.git

Note:  In order to get this information, you will have to have already had the project created on bitbucket.  Believe it or not, bit bucket will give you the actual above command so you know what to specify.  They are nice like that.

Once you have the project defined and the origin added, you should then be able to do the following to push the project up to bitbucket:

    git push -u origin --all

If everything was already setup correctly on bitbucket, then that should work just fine for an initial push and all pushes thereafter for the project would simply be a matter of issuing a "git push" in the project directory.  If this is the first time you are using your keys, or if there are problems, then you might see something like this:

    Permission denied (publickey).
    fatal: Could not read from remote repository.

This simply means that there is an issue with your connecting to bitbucket.  It could be any number of things.  First thing is to check what identify your passing.  You can check that with:

    ssh-add -l

If that returns nothing, then you aren't passing anything and that isn't good.  Since we are using ssh to push the files up to bitbucket, you are definitely going to have to make sure that you have already added your ssh public key to your account in bitbucket.  If you haven't, then do so.

After that, you'll have to do the following in the project directory:

    ssh -i ~/.ssh/ssh_keyname -T git@bitbucket.org

If you have done everything you need to, then you should see output that looks like this:

    Identity added: /Users/xxxxxx/.ssh/ssh_keyname (/Users/xxxxxx/.ssh/ssh_keyname)
    logged in as xxxxxxxx.

Obviously identities have been changed to protect the innocent, but you get the idea.  You can now check if you have an identity added to ssh with which to connect with:

    ssh-add -l

That should output your key if it was added correctly.  You should now be able to go ahead and issue the initial push command above to add all your files to bitbucket.org.  If you are still having issues, I would certainly suggest you take any errors you are getting and plug them into Google and see what comes up for results.  

Friday, March 21, 2014

Enable vi mode editing in python, irb (ruby) and others

Everyone who uses Unix/Linux, will at one point or another, choose their editor of choice.  It is either out of necessity or, as in my case, all of the people using the system(s) around you used the same editor. 

There are several different editors, but the two most popular are emacs and vi.  I have recently started learning emacs, as I am in a group where most already use emacs.  No, I am not being pressured, I am just actually getting to experience some of the cool aspects of the editor other than someone standing around ranting that "emacs is cool!" over and over. 

Overall though, I am still very much a vi person.... no question.  If I do make the switch to emacs at some point, it will be wholely.  But for the mean time, until my comfort level is up, I am sticking with what I know best. 

That said, when I am in things like the python shell, I think having the ability to have a 'vi mode', is something beneficial to me.  I stumbled on a quick setup with will allow you to have the 'vi mode' in most, if not all, of your different shells (python, irb, etc).

What you need to do is create in your home directory, a file called .inputrc.  In the file, put the following lines:

    set -o vi
    set editing-mode vi

You don't need to source it, just simply start up something like irb or the python shell and then, when in there, hit the ESC key and you'll be in vi mode.  Nice, huh?  Enjoy!

Thursday, March 20, 2014

iOS 7.1 Problems & Fixes

It's that time again, time to update your iPhone to the newest version of the iOS software (Per Apple's announcement). 


As with any iOS update, there are certainly problems that people are having.  Thankfully, for some of the most common problems, people like ZDNet have compiled a list of issues with possible solutions.

Looking over the list, I would like to mention a couple of issues that a colleague experienced that are NOT on the list.

1.  90% of my colleagues contacts simply disappeared.

This issue SUCKS!!  You take the time to add the contacts you need on your phone, only to have them simply go away with an update.  Well, I believe I have the solution to this after my wife's iPhone decided to remove hers for no reason what so ever.  Sync them with iCloud.  If you sync all of your contacts with iCloud, you should be able to easily re-sync with the cloud and have your contacts back on your phone rather quickly. 

2.  His corporate Good email app stopped working.

The company we work for uses Good for secure, corporate email on mobile devices.  I am always hearing about the app stopping working after an update, so it came as no surprise to me.  If this happens, you'll have to re-install and get a new code from your company's Administrator. 

Hopefully this list of problems and solutions will help to lighten the blow if you experience them. 
 
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.