{ ParsedContent };: automation

Wednesday, July 09, 2014

Getting started with Ansible

There are a ton of buzzwords thrown around every single day. So many that its sometimes hard to determine what is legit and what isn't. One of the words that I have paid attention to and come to really love is ansible.

To quickly sum up what ansible is, it's an automation tool that allows you to run a command or a set of commands across multiple machines. This is extremely handy, for instance, if you have a bunch of machines acting as web servers and you need to shut down the web server portion for maintenance.

Previously you would have had to log in to each machine and issue the necessary commands. But with ansible, you can simply put the commands in one file (called a playbook) and then run that playbook again the machines in question.

I would cover how to install ansible, but considering people work on different systems, I will just say that ansible has a pretty good set of docs on this already. After you have installed the software, you will need a directory structure that looks something like the following:

ansible
|
|___ playbooks/
| |_____ .yml
|
|___ .ansible_hosts
|
|___ ansible.cfg

You don't have to call the top directory ansible, but it helps so that you remember what is in there. The playbooks directory is needed and under that is where you store the playbooks, which are in yaml format (thus the .yml extension).

The ansible.cfg file has a lot of options and you are going to want to read up on how to configure that. As for the .ansible_hosts file, this is where you list the hosts that are under your conrtol and that you want to act upon. In there you can list a single host or a group of hosts. You can ready about how to specify your hosts in the ansible sites Inventory documentation.

The way that I have one of mine configured is so that it prompts me for the sudo password so that it can run all the commands that it needs to, with sudo. As an example, I have a playbook called df.yml that will do a df on the set of specified hosts. The df.yml file looks like this:

---
- hosts: "{{ group }}"
gather_facts: false
tasks:
- name: df
sudo: yes
command: 'df -h'

Please keep in mind that this is a yaml file and the format above is specific to yaml. If you look at the hosts line, there are no hosts specified. Instead, it simply says {{ group }}. This is a variable that will be expected from the command line when I run the playbook. In my .ansible_hosts file, I have a section that looks like this:

[ hostgroupname ]

host1

host2

host3

To run the df.yml play on that group, I run it as follows:

ansible-playbook playbooks/df.yml -v -i ./.ansible.hosts -k --extra-vars "group=hostgroupname"

That is being run from the ansible directory and referencing everything with that being the root. Notice the '--extra-vars' option and the 'group=hostgroupname' at the end. That is where the {{ group }} is pulled from. Ansible will take that and run the commands in the file on each host in that group.

There is a lot more configuration that can (and will) be done for ansible. Particularly, I will be setting up my ssh key on all the servers, that way the script just runs, other than prompting for the sudo password. (Yes, my script currently prompts me for my password to connect via ssh, but that is my current configuration and destined to change when I find the two minutes to do it).

So, that is a slight intro to ansible. Remember to bookmark the ansible documentation. Hopefully you are able to get it working quickly and enjoy its use as I am.

Monday, June 11, 2007

The Art of the Automated Download

How many of us have found a site that ended up having a whole bunch of files on the site, for download, that you really wanted? Well, I found a site just like that. Too many sites use PHP or something else, to where it used ID's and other codes to refer to the documents, instead of just putting links into their web pages. Once in a while though, you do come across a site that DOES put the actual links to the files on the page.

In the case of the site that I am referring to, there are a little over 2000 files that I wanted to grab. The thing is, I didn't want to have to site there and "right-click-save as" for each and every one as that would have taken "days" to complete. So, noticing that all of the files had actual URLs that led right to them, I looked at the page source. There it hit me. Perl!

So, I copied the source and quickly drummed up a regular expression which grabs all the URLs of all of the pdf files on the page. After I grabbed the URLs, I put together some code which quickly went out and grabbed each and every one in turn and saved it to my hard drive.

This sounds all simple and stuff, but for someone like me who still considers him a novice in the Perl world, it did take a couple of hours of research. First I tried to use the "WWW::Mechanize" module and was able to retrieve a complete list of the pdf files and their paths, but not the actual files. I tried other packages and such, delving into LWP itself, but I could not for the life of me get this code to actually download the files.

I found "lwp-download" and gave it a shot. Wow! It looked like it was working, up until the 904th document, where it died. I couldn't figure it out nor understand what was going on. Why was this dying at the same point every time. Well, I did eventually figure it out and was able download ALL of the over 2000 files to my hard drive. I couldn't believe it, but the routine I used only took a few minutes to download all of the files (granted, I have 15 Mb FiOS for my internet, so please bear that in mind as well).

Just as an FYI (and you can get this looking at the code, I used the LWP::Simple::getstore() routine to download the files. It was a lot easier than going through the process of figuring out why my WWW::Mechanize wasn't working, believe me. I will figure that module out later, but for now, this did exactly what I wanted.

This is probably a bit much and others would more than likely have a better way to do it, but here it is. Here is the code I used to parse HTML code for its links and download them.


#!/usr/bin/perl

use strict;
use warnings;
use File::Basename;
use LWP::Simple;


#############################################
# Read the entire HTML file into an array, line by line so that we can
# parse out the information we need, one line at a time.
#############################################
my @code = `cat /home/jlk/development/perl/stampAlbums/code.txt`;

if (-e "/home/jlk/development/perl/stampAlbums/files.txt")
{
    `rm /home/jlk/development/perl/stampAlbums/files.txt`;
    `touch /home/jlk/development/perl/stampAlbums/files.txt`;
}


foreach my $line (@code)
{


    ############################################
    # The following code takes the site's html file(in this case, it
    # is the stampalbums.com download site) and parses out all of the
    # downloads URL's.
    ############################################
    if($line =~ m/^\s+    {
        my @splitLine = split(/\"/, $line);

        ##############################################
        # Now, open the file in which you will store all the URL's to
        # the files and write each URL on a seperate line.
        ##############################################

        open(FILES, ">>/home/jlk/development/perl/stampAlbums/files.txt");
        print FILES ("$splitLine[1] \n");
        close(FILES);
    }

}

###########################################
# Open the file containing all of the URLs
###########################################
open(FILES, "< /home/jlk/development/perl/stampAlbums/files.txt");

###########################################
#  Do the download of the files
###########################################
foreach()
{
    my $localdir = "/home/jlk/development/perl/stampAlbums/albumPages/";
    my $path = "$_";
    my($filename, $directories, $suffix) = fileparse($path);

    LWP::Simple::getstore($_, $filename);
}

{ ParsedContent };

Wednesday, July 09, 2014

Getting started with Ansible

Monday, June 11, 2007

The Art of the Automated Download

DISCLAIMER

About Me (the 411)

ParsedContent Labels

Blog Archive

{ ParsedContent };

Wednesday, July 09, 2014

Getting started with Ansible

Monday, June 11, 2007

The Art of the Automated Download

Subscribe To ParsedContent

DISCLAIMER

About Me (the 411)

ParsedContent Labels

Blog Archive