{ ParsedContent };: Project Update

Sunday, April 18, 2010

Project Update

Well, I have been working on the project actively for a few days now and unfortunately, I am not as far as I would have loved to have been. My experience working with Perl DBI is not extensive, and is limited to the job that I got laid off from last year. What makes that experience a bit lessened is the fact that the DBI code was already written and all I had to do was extend what was already existing.

So, I have these spreadsheets, one in particular which is rather large, containing almost 2000 lines. Each line is the information that we have for a customer/client. I created a database, created a table in the database, and had written some code to parse out the data from its csv file format (which I exported from the Excel format it was originally in.

The problem that I ran into was that the prepare() statement kept throwing an error. So, I posted my question up on to the Bytes.com in my Perl forum in hopes of getting some help, and I got it. One of the experts in the forum was able to throw a couple of ideas my way. RonB was able to throw a couple of ideas my way to get the data initially loaded.

I took a look at what he provided for ideas and decided to do the initial load via the MySQL interface. Doing the load the following way, I specified the file to use to get the data from, the field delimiter (which is a : when I exported the Excel data to csv) and the fields that I needed populated, in the order that the data was presented. I needed to specify the fields because the 1st column of data is a unique ID that is auto incremented. Here is the command that I used:

mysql> LOAD DATA INFILE "/path/to/file.csv" INTO TABLE client_info FIELDS TERMINATED BY ':' (PartnerName,CorporationName,ContactGivenName,ContactPhone,ContactEmailID,CommunicationProtocol,SecurityType,ECID,Source,GEID,GEMatchType,CrossoverWithGTI,ECIDv1,CustomerName,UltimateECID,UltimateName,CostCenter,DomicileRegion,SalesRegion,SalesHead,SalesManager,TMO1SID,TMO1Name,TMO2SID,TMO2Name,BankerSID,BankerName,CSOMgrSID,CSOMgrName,CSOSID,CSOName,CSPSID,CSPName,LOB,SubLOB,Segment,Market,Industry,ClientType,ClientStatus);

This actually seemed to work quite well. See, the script I was working with had a big issue with special characters. Some names contained an apostrophe, names were entered into the original spreadsheet in the format of "last, first middle", which caused an issue when I originally used a comma as the delimeter. The LOAD DATA command in mysql is quite useful, I must say and offers other options for specifying the format of the data. I recommend checking it out.

Now, I have to worry about the interface that we are going to use for accessing the data. I plan on designing it first, so that it is built and operational, then I will worry about things like a login interface and sessions.

I also still need to work on a Perl script to load data from a spreadsheet in the case of updates. Our client base is about to grow astronomically over this year and this needs to be in place soon. The update script will need to compare against the db for duplicates and only load the new entries. So, that should be interesting.

I sooooooo have a lot of work ahead of me, but must say that its nice to have a project to work on that I know is going to be actively used by others.