Jumpstart Lab > Course Resources > DCWeek

DCWeek

In this project you’ll work with the attendee data for a conference supplied in a CSV file. This data comes from an actual conference, though identifying information has been masked.

The techniques practiced in this lab include:

You’ll need these two source files:

Bootstrap

If you haven’t already setup Ruby, visit http://jumpstartlab.com/resources/general/environment/ for instructions.

Go to your command line and enter:

Next open RubyMine and…

Start with this code framework:

require "rubygems"
require "fastercsv"

class JSAttend
  attr_accessor :file
  attr_accessor :headers
  def initialize
    puts "JSAttend Initialized."
  end

end

jsa = JSAttend.new

In RubyMine, click the RUN menu at the top, then RUN again. Your output should look something like…

JSAttend Initialized.

Process finished with exit code 0

Iteration 0: Basics of a CSV File

CSV files are great for storing and transporting large data sets. They’re most commonly created from Excel spreadsheets, but since a CSV is really just a plain text file, they’re pretty easy to interact with from ANY program. First thing to do is to get an object created for the file. Add this method to your class:

def open_file
  filename = "event_attendees.csv"
  puts "Trying to open the file with FasterCSV"
  @file = FasterCSV.open(filename, {:headers => true, :return_headers => true, :header_converters => :symbol})
  @headers = @file.readline
end

Then go into your initialize method and add open_file as the first line. The whole file should now look like this:

require "rubygems"
require "fastercsv"

class JSAttend
  attr_accessor :file
  attr_accessor :headers

  def initialize
    open_file
    puts "JSAttend Initialized."
  end

  def open_file
    filename = "event_attendees.csv"
    puts "Trying to open the file with FasterCSV"
    @file = FasterCSV.open(filename, {:headers => true, :return_headers => true, :header_converters => :symbol})
    @headers = @file.readline
  end
end

jsa = JSAttend.new

Run the program and you should get an error that starts like this:

Trying to open the file with FasterCSV
/Library/Ruby/Gems/1.8/gems/fastercsv-1.5.0/lib/faster_csv.rb:1193:in `initialize': No such file or directory - event_attendees.csv (Errno::ENOENT)

This is what we call a stack trace. When your program generates an error, the stack trace is the best way to figure out what went wrong. Sometimes they’re hard to read, but this one is pretty easy. When it says No such file or directory - event_attendees.csv you can tell that it can’t find the event_attendees.csv file that it’s trying to open. You could write the entire filename path (like C:\JumpstartLab\RubyJumpstart\Projects\JSAttend\event_attendees.csv), but that’s junky for a few reasons. It’s better to copy the file you want, event_attendees.csv into the same directory where you js_attend.rb file is. Do that now, then re-run your program. You should see something like this:

Trying to open the file with FasterCSV
JSAttend Initialized.

Now that our file is getting loaded properly we have a name for that variable – @file. We can now talk to that object named @file and ask it all kinds of information or tell it to do things. Let’s create a method that will just go through the file and print out the first names of all the people:

  def print_names
    @file.each do |line|
      puts line[:first_name]
    end
  end

Then to actually execute that instruction, you need to call it. At the bottom of your project file add the line jsa.print_names underneath jsa = JSAttend.new. Now RUN your program and you should see the first names from the CSV file fly by.

Once that’s working we can beef it up a little by printing out their first name AND last name. Modify the puts line of your print_names method so it reads like this:

puts line[:first_name] + " " + line[:last_name]

What you’re doing here is adding together three strings: the first name, a space, and the last name. Once those are added together they go to the puts instruction to print out. Run your program again and you should see the attendee first and last names scroll by. Now, rewrite that puts line using string interpolation (HINT: remember using the #{}?).

Iteration 1: Cleaning up the Zip Codes

When we got this data the zipcode data was a little surprising.

Step 0: Print out What’s There & Diagnosis

Why were so many of the zipcodes entered incorrectly? Look at a few example addresses and zipcode…

1 Old Ferry Road, Box # 6348	Bristol	RI	2809
90 University Heights , 401h1	Burlington	VT	5405
123 Garfield Ave	Blackwood	NJ	8012
239 S Prospect St	Burlington	VT	5401
50 Ledgewood Dr	York	ME	3909

See the pattern? All the short zipcodes are in New England. Now that we know the problem, we can fix it.

Let’s write a method to print out the current zipcodes from the CSV. We’ll call it print_zipcodes like this:

  def print_zipcodes
    @file.each do |line|
      zipcode = line[:zipcode]
      puts zipcode
    end
  end

Run that and you should see the list of uncleaned zipcodes scroll by.

Step 1: Zero-Padding Existing Zipcodes

Let’s write a little pseudo-code:

Turning that into code we should create a clean_zipcode method. Model it after the structure of your clean_number method. Name the parameter original like we did in clean_number. Assuming you have that structure we can start to map out the method’s code…

  def clean_zipcode(original)
    if original.length < 5
      # Add zeros on the front
      result = original # This is just TEMPORARY!
    elsif original.length > 5
      result = "00000"  # If it is greater than 5 digits, it's junk
    else
      result = original # If it wasn't less than 5, and wasn't more than 5, it was 5
    end
    
    return result
  end

So I’ve filled in the easy parts (for zips with length greater than or equal to 5). Now that the method exists you can go into your print_zipcodes method and change the zipcode = line[:zipcode] to zipcode = clean_zipcode(line[:zipcode]). Also change your instruction at the very bottom to jsa.print_zipcodes. You can then RUN this code and see if it fixes the missing and too-long zipcodes.

Uh-oh. Did you get an error? I did. The program got through a bunch of the zipcodes then spat this out:

`clean_zipcode': undefined method `length' for nil:NilClass (NoMethodError)

What the heck is nil:NilClass? Refer to the Ruby Tutorial if you’ve forgotten about nil.

In this case, FasterCSV is giving us a nil if the CSV file doesn’t have any information in the zipcode. I was really expecting the empty string, "", but I can see nil making sense. Dealing with nil values in your code can be a huge pain in the neck. The situation we have now is very typical: you write code that works great for most of the cases, then it hits one nil and blows up. Most frequently these problems are generated by trying to call methods or access attributes of nil. If I asked you “What is the length of nothingness?” you’d probably give me a confused look. That’s what Ruby is saying in our error message here. “I don’t know how to find the length for nil”. The solution, then, is to check and see if our zipcode is nil before doing other things to it.

Check out how I’ve reworked the if/elsif/else instructions to check for a nil first:

  def clean_zipcode(original)
    if original.nil?
      result = "00000"  # If it is nil, it's junk  
    elsif length < 5
      # Add zeros on the front
      result = original # This is just TEMPORARY!
    elsif original.length > 5
      result = "00000"  # If it is greater than 5 digits, it's junk
    else
      result = original # If it wasn't less than 5, and wasn't more than 5, it was 5
    end

    return result
  end

Now the trickier part: adding on the zeros.

The zipcodes that are missing their leading zeros are mostly four digits long, so just adding one zero to the front would probably fix it. But 00601, for instance, is a valid zipcode. In our data there are a few of these two-leading-zero zipcodes. The easiest solution here is to use a while loop.

A while loop repeats one or more instructions as long as a condition is true. When the condition becomes false, the while loop ends. Here’s an example:

counter = 0
while counter < 5
  puts counter
  counter = counter + 1
end

If you were to execute this code, you’d see output like this:

0
1
2
3
4

Simple enough? We’ll use that same idea to implement our zero-padding. Go to your code and take out the line that says result = original # This is just TEMPORARY! and replace it with this:

      result = original
      while result.length < 5
        result = "0" + result
      end

If you were to read that outloud, it would sound like this: “While the length of result is less than five, keep putting a @”0"@ on the front of it and saving it back into result." Now RUN the code and make sure all the zipcodes look clean.

Step 2: Test & Refactor

This time our code is pretty clean and the results look good. There’s only one thing that stands out for refactoring — duplication. The Ruby community has a saying “Keep it DRY” where DRY = Don’t Repeat Yourself. Whenever you do the same thing twice you introduce the possibility for mistakes down the road.

Let’s imagine that, tomorrow, we want junk zipcodes to be marked “99999” instead of “00000”. How many places would we have to change the code? Two is one too many for something this simple. We can compress our code down a little bit to avoid this repetition by using an ||(OR) instructions in our conditions like this:

  def clean_zipcode(original)  
    if original.nil? || original.length > 5
      result = "00000"  # If it is nil OR longer than 5 digits, it's junk
    elsif original.length < 5
      # Add zeros on the front (omitted)
    else
      result = original # If it wasn't less than 5, and wasn't more than 5, it was 5
    end

    return result
  end
end

So you got rid of one elsif statement and removed duplication in the code. RUN it and make sure there are no errors and the data looks good, then we’re ready for the next iteration!

Iteration 2: Congressional Lookup

This conference was about a political issue, and we wanted the participants to interact with their congresspeople. Since we already have their zipcodes we can take advantage of an API from the Sunlight Foundation to lookup the appropriate congresspeople.

Step 0: Framework

We’ll need an additional library to help us connect to and read the data from the API. Add this line below the existing require lines at the beginning of your program file:

require 'sunlight'

Then go down and create a method that looks like this:

  def rep_lookup
    lines = []
    @file.each do |line|
      lines << line
    end
    lines[0..20].each do |line|
      representative = "unknown"
      # API Lookup Goes Here
      puts "#{line[:last_name]}, #{line[:first_name]}, #{line[:zipcode]}, #{representative}"
    end
  end

This is a little different than the other methods we’ve started with. The first part that will stand out is this:

    lines = []
    @file.each do |line|
      lines << line
    end

When we use the FasterCSV library to read a CSV file, it reads one line at a time. Now that I’m accessing a public API, it’s unkind to generate the traffic of looking up thousands and thousands of attendees each time I run my program — not to mention that’ll take a long time. FasterCSV doesn’t give us a way to just grab a certain number of lines from the CSV file, so what I’ve done here is created a list named lines then gone through the CSV and just put each line into the list of lines.

Now that I have the lines as a normal list, instead of going through every single line I can say lines[0..20] which means “grab only the first twenty elements inside the lines list” then for each of them, name it line then do these instructions.

    lines[0..20].each do |line|
      representative = "unknown"
      # API Lookup Goes Here
      puts "#{line[:last_name]}, #{line[:first_name]}, #{line[:zipcode]}, #{representative}"
    end

RUN this code and you should see output like this:

JSAttend Initialized.
Nguyen, Allison, 20010, unknown
Hankins, SArah, 20009, unknown
Xx, Sarah, 33703, unknown
Cope, Jennifer, 37216, unknown

Step 1: Experimenting with the Sunlight API

Most APIs work by requesting a very complicated web address. In the case of the Sunlight API, the API we’ll be accessing can be found here:

http://services.sunlightlabs.com/api/legislators.allForZip.xml?apikey=e179a6973728c4dd3fb1204283aaccb5&zip=22182

Take a close look at that address. We’re accessing the legislators.allForZip method of their API, we send in an apikey which is the string that identifies JumpstartLab as the accessor of the API, then at the very end we have a zip. Try modifying the address with your own zipcode and load the page. Using 22182 as a sample I see this mess:

10RepP0-001-000016674-91VAN00002073R27120http://www.house.gov/wolf/202-225-0437400435FrankRudolphWolf241 Cannon House Office Building202-225-5136http://www.house.gov/formwolf/contact_email/emailzip.shtmlhttp://www.youtube.com/RepFrankWolfW000672H6VA10050MfakeopenID429http://www.opencongress.org/wiki/Frank_Wolf 11Rep1VAN00029891D95078http://connolly.house.gov/202-225-3071412272GeraldE.Connolly327 Cannon House Office Building202-0225-1492https://forms.house.gov/connolly/contact-form.shtmlGerryC001078Mhttp://www.opencongress.org/wiki/Gerald_Connolly Senior SeatSenP0-001-000016754-41VAN00028058D60043http://webb.senate.gov/202-228-6363412249JamesH.Webb144 Russell Senate Office Building202-224-4024http://webb.senate.gov/contact/http://www.youtube.com/SenatorWebbJimW000803S6VA00127MIJr.fakeopenID533http://www.opencongress.org/wiki/James_Webb Junior SeatSen1VAN00002097D535http://warner.senate.gov202-224-6295412321MarkR.WarnerB40c Dirksen Senate Office Building202-224-2023http://warner.senate.gov/public/?p=EmailSenatorWarnerW000805MIImarkwarnerhttp://www.opencongress.org/wiki/Mark_Warner 8RepP0-001-000016517-51VAN00002083http://www.house.gov/apps/list/press/va08_moran/RSS.xmlD27118http://www.moran.house.gov202-225-0017400283JamesP.Moran2239 Rayburn House Office Building202-225-4376http://moran.house.gov/zipauth.shtmlJimM000933H0VA08040MJr.fakeopenID283http://www.opencongress.org/wiki/James_Moran

This data is, in reality, very structured. My browser (Safari) just doesn’t interpret it very well — Firefox does a much better job of displaying XML. If you have it, open Firefox and load that address. If you’re familiar with writing HTML then this XML document probably makes some sense to you. You can see there is a response object that has a list of legislators. That list contains five legislator objects which each contain a ton of data about each legislator. Cool!

Step 2: Dealing with XML and JSON

All I want for this API lookup is a comma separated list of the first initial and the last name like this: F.Wolf, G.Connolly, J.Webb. We could use the XML that we retrieved via the url. In Ruby, we could make use of the open-uri and hpricot libraries to fetch and parse this XML.

But the Sunlight Foundation has made it easy on us. Luigi Montanez, a developer at Sunlight Labs, created the sunlight gem. We call this a wrapper library because it’s job is to hide complexity from us. We can interact with it as a simple Ruby object, then the library takes care of fetching and parsing data from the server.

Add these lines to your rep_lookup method right under the line that says #API Lookup Goes Here:

      results = []
      zipcode = clean_zipcode(line[:zipcode])
      Sunlight::Base.api_key = "e179a6973728c4dd3fb1204283aaccb5"

First, we create an empty list named results. Each zipcode can have multiple congresspeople and senators, so we’ll store all the results in this list. Second we grab the zipcode from the CSV file. Then we tell the Sunlight library our api_key that I got from the Sunlight Foundation.

Now what can we actually DO with the Sunlight library? Check out the readme on the project homepage: http://github.com/sunlightlabs/ruby-sunlightapi

We’re interested in the Legislator object. Looking at the examples in the ReadMe you’ll see this:

congresspeople = Sunlight::Legislator.all_for(:address => "123 Fifth Ave New York, NY 10003")

So that’s how to fetch information for a specific address, but our task is to find them via Zipcode. Look back at the URL we used to view the XML. See how it has legislators.allForZip? The wrapper library should have a similar method. I don’t see it in the ReadMe, so I’m going to look at the source code. Near the top of the project page I click lib, then the sunlight folder, then legislator.rb. I search the page for zipcode and find a method that starts like this:

def self.all_in_zipcode(zipcode)

Perfect! It takes in a zipcode and returns a list of legislators. Let’s try it in our program. Add these lines below the api_key line:

legislators = Sunlight::Legislator.all_in_zipcode(zipcode)
puts legislators

Run your program and check out the results. Did it work? Kinda? Maybe? You’re probably seeing lines like this:

#<Sunlight::Legislator:0x102525280>

That’s ruby’s way of printing out a Legislator object. Not very informative, but it shows us that legislators are being found which is good! Next we should access the name of the legislator within that object — but how do I know what it’s called? Return to the Legislator source code and, near the top of the page, you’ll see this:

attr_accessor :title, :firstname, :middlename, :lastname, 
  :name_suffix, :nickname,:party, :state, :district,
  :gender, :phone, :fax, :website, :webform, :email,
  :congress_office, :bioguide_id, :votesmart_id, :fec_id,
  :govtrack_id, :crp_id, :event_id, :congresspedia_url,
  :youtube_url, :twitter_id, :fuzzy_score, :in_office,
  :senate_class, :birthdate

This is a list of all the attributes (attr_accessor means “attribute accessor”) that a Legislator has, all the information it knows. If we want their url we ask for .website, or fax number with .fax. Here we’re interested in their first and last name. We’ll need to loop through each of the legislators — replace the puts legislators line with this code:

legislators.each do |leg|
  puts leg.firstname
end

Run your program and check out the results. More impressive, right?

Step 3: Pulling Names and Formatting the Output

You can see that it’s finding one or more representatives and spewing their first names out. But we’re looking for first initial and last name. Let’s take out that puts leg.firstname line. In it’s same spot add these three lines:

lastname = leg.lastname
first_initial = leg.firstname[0..0]
results << "#{first_initial}.#{lastname}"

The first instruction says “find the lastname attribute of leg and store it into a variable named lastname.” Then the second line is “find the firstname attribute of leg, grab all the letters from position 0 to position 0 (giving you only the first letter), and store it into the variable named first_initial.” Next, “make a string with the data in first_initial, a period, and the data inside lastname then add it into the list named results.”

Lastly, change the final puts line in the method so it looks like this:

puts "#{line[:last_name]}, #{line[:first_name]}, #{line[:zipcode]}, #{results.join(", ")}"

The significant change being the last part that says results.join(", ") which means “take each thing in the list results and join them together with a comma and space between each one.” Run your program and you should see output like this:

JSAttend Initialized.
Nguyen, Allison, 20010, E.Norton
Hankins, SArah, 20009, E.Norton
Xx, Sarah, 33703, M.Martinez, B.Nelson, C.Young
Cope, Jennifer, 37216, J.Cooper, B.Corker, L.Alexander
Zimmerman, Douglas, 50309, T.Harkin, C.Grassley, L.Boswell

Pretty cool? If you’d like, go back to the Legislator attribute list and see what other interesting data you could include. What would it look like to have the names read like “E.Norton (D)” for their party? Or how about “Rep E.Norton (D)”?

And with that, you’re done!