In this project you’ll work with the attendee data for a conference supplied in a CSV file. This data comes from an actual conference, though identifying information has been masked.
The techniques practiced in this lab include:
You’ll need these two source files:
If you haven’t already setup Ruby, visit http://jumpstartlab.com/resources/general/environment/ for instructions.
Go to your command line and enter:
sudo gem install fastercsv sunlightgem install fastercsv sunlightNext open RubyMine and…
Start with this code framework:
require "rubygems"
require "fastercsv"
class JSAttend
attr_accessor :file
attr_accessor :headers
def initialize
puts "JSAttend Initialized."
end
end
jsa = JSAttend.new
In RubyMine, click the RUN menu at the top, then RUN again. Your output should look something like…
JSAttend Initialized. Process finished with exit code 0
CSV files are great for storing and transporting large data sets. They’re most commonly created from Excel spreadsheets, but since a CSV is really just a plain text file, they’re pretty easy to interact with from ANY program. First thing to do is to get an object created for the file. Add this method to your class:
def open_file
filename = "event_attendees.csv"
puts "Trying to open the file with FasterCSV"
@file = FasterCSV.open(filename, {:headers => true, :return_headers => true, :header_converters => :symbol})
@headers = @file.readline
end
Then go into your initialize method and add open_file as the first line. The whole file should now look like this:
require "rubygems"
require "fastercsv"
class JSAttend
attr_accessor :file
attr_accessor :headers
def initialize
open_file
puts "JSAttend Initialized."
end
def open_file
filename = "event_attendees.csv"
puts "Trying to open the file with FasterCSV"
@file = FasterCSV.open(filename, {:headers => true, :return_headers => true, :header_converters => :symbol})
@headers = @file.readline
end
end
jsa = JSAttend.new
Run the program and you should get an error that starts like this:
Trying to open the file with FasterCSV /Library/Ruby/Gems/1.8/gems/fastercsv-1.5.0/lib/faster_csv.rb:1193:in `initialize': No such file or directory - event_attendees.csv (Errno::ENOENT)
This is what we call a stack trace. When your program generates an error, the stack trace is the best way to figure out what went wrong. Sometimes they’re hard to read, but this one is pretty easy. When it says No such file or directory - event_attendees.csv you can tell that it can’t find the event_attendees.csv file that it’s trying to open. You could write the entire filename path (like C:\JumpstartLab\RubyJumpstart\Projects\JSAttend\event_attendees.csv), but that’s junky for a few reasons. It’s better to copy the file you want, event_attendees.csv into the same directory where you js_attend.rb file is. Do that now, then re-run your program. You should see something like this:
Trying to open the file with FasterCSV JSAttend Initialized.
Now that our file is getting loaded properly we have a name for that variable – @file. We can now talk to that object named @file and ask it all kinds of information or tell it to do things. Let’s create a method that will just go through the file and print out the first names of all the people:
def print_names
@file.each do |line|
puts line[:first_name]
end
end
Then to actually execute that instruction, you need to call it. At the bottom of your project file add the line jsa.print_names underneath jsa = JSAttend.new. Now RUN your program and you should see the first names from the CSV file fly by.
Once that’s working we can beef it up a little by printing out their first name AND last name. Modify the puts line of your print_names method so it reads like this:
puts line[:first_name] + " " + line[:last_name]
What you’re doing here is adding together three strings: the first name, a space, and the last name. Once those are added together they go to the puts instruction to print out. Run your program again and you should see the attendee first and last names scroll by. Now, rewrite that puts line using string interpolation (HINT: remember using the #{}?).
When we got this data the zipcode data was a little surprising.
Why were so many of the zipcodes entered incorrectly? Look at a few example addresses and zipcode…
1 Old Ferry Road, Box # 6348 Bristol RI 2809 90 University Heights , 401h1 Burlington VT 5405 123 Garfield Ave Blackwood NJ 8012 239 S Prospect St Burlington VT 5401 50 Ledgewood Dr York ME 3909
See the pattern? All the short zipcodes are in New England. Now that we know the problem, we can fix it.
Let’s write a method to print out the current zipcodes from the CSV. We’ll call it print_zipcodes like this:
def print_zipcodes
@file.each do |line|
zipcode = line[:zipcode]
puts zipcode
end
end
Run that and you should see the list of uncleaned zipcodes scroll by.
Let’s write a little pseudo-code:
Turning that into code we should create a clean_zipcode method. Model it after the structure of your clean_number method. Name the parameter original like we did in clean_number. Assuming you have that structure we can start to map out the method’s code…
def clean_zipcode(original)
if original.length < 5
# Add zeros on the front
result = original # This is just TEMPORARY!
elsif original.length > 5
result = "00000" # If it is greater than 5 digits, it's junk
else
result = original # If it wasn't less than 5, and wasn't more than 5, it was 5
end
return result
end
So I’ve filled in the easy parts (for zips with length greater than or equal to 5). Now that the method exists you can go into your print_zipcodes method and change the zipcode = line[:zipcode] to zipcode = clean_zipcode(line[:zipcode]). Also change your instruction at the very bottom to jsa.print_zipcodes. You can then RUN this code and see if it fixes the missing and too-long zipcodes.
Uh-oh. Did you get an error? I did. The program got through a bunch of the zipcodes then spat this out:
`clean_zipcode': undefined method `length' for nil:NilClass (NoMethodError)
What the heck is nil:NilClass? Refer to the Ruby Tutorial if you’ve forgotten about nil.
In this case, FasterCSV is giving us a nil if the CSV file doesn’t have any information in the zipcode. I was really expecting the empty string, "", but I can see nil making sense. Dealing with nil values in your code can be a huge pain in the neck. The situation we have now is very typical: you write code that works great for most of the cases, then it hits one nil and blows up. Most frequently these problems are generated by trying to call methods or access attributes of nil. If I asked you “What is the length of nothingness?” you’d probably give me a confused look. That’s what Ruby is saying in our error message here. “I don’t know how to find the length for nil”. The solution, then, is to check and see if our zipcode is nil before doing other things to it.
Check out how I’ve reworked the if/elsif/else instructions to check for a nil first:
def clean_zipcode(original)
if original.nil?
result = "00000" # If it is nil, it's junk
elsif length < 5
# Add zeros on the front
result = original # This is just TEMPORARY!
elsif original.length > 5
result = "00000" # If it is greater than 5 digits, it's junk
else
result = original # If it wasn't less than 5, and wasn't more than 5, it was 5
end
return result
end
Now the trickier part: adding on the zeros.
The zipcodes that are missing their leading zeros are mostly four digits long, so just adding one zero to the front would probably fix it. But 00601, for instance, is a valid zipcode. In our data there are a few of these two-leading-zero zipcodes. The easiest solution here is to use a while loop.
A while loop repeats one or more instructions as long as a condition is true. When the condition becomes false, the while loop ends. Here’s an example:
counter = 0 while counter < 5 puts counter counter = counter + 1 end
If you were to execute this code, you’d see output like this:
0 1 2 3 4
Simple enough? We’ll use that same idea to implement our zero-padding. Go to your code and take out the line that says result = original # This is just TEMPORARY! and replace it with this:
result = original
while result.length < 5
result = "0" + result
end
If you were to read that outloud, it would sound like this: “While the length of result is less than five, keep putting a @”0"@ on the front of it and saving it back into result." Now RUN the code and make sure all the zipcodes look clean.
This time our code is pretty clean and the results look good. There’s only one thing that stands out for refactoring — duplication. The Ruby community has a saying “Keep it DRY” where DRY = Don’t Repeat Yourself. Whenever you do the same thing twice you introduce the possibility for mistakes down the road.
Let’s imagine that, tomorrow, we want junk zipcodes to be marked “99999” instead of “00000”. How many places would we have to change the code? Two is one too many for something this simple. We can compress our code down a little bit to avoid this repetition by using an ||(OR) instructions in our conditions like this:
def clean_zipcode(original)
if original.nil? || original.length > 5
result = "00000" # If it is nil OR longer than 5 digits, it's junk
elsif original.length < 5
# Add zeros on the front (omitted)
else
result = original # If it wasn't less than 5, and wasn't more than 5, it was 5
end
return result
end
end
So you got rid of one elsif statement and removed duplication in the code. RUN it and make sure there are no errors and the data looks good, then we’re ready for the next iteration!
This conference was about a political issue, and we wanted the participants to interact with their congresspeople. Since we already have their zipcodes we can take advantage of an API from the Sunlight Foundation to lookup the appropriate congresspeople.
We’ll need an additional library to help us connect to and read the data from the API. Add this line below the existing require lines at the beginning of your program file:
require 'sunlight'
Then go down and create a method that looks like this:
def rep_lookup
lines = []
@file.each do |line|
lines << line
end
lines[0..20].each do |line|
representative = "unknown"
# API Lookup Goes Here
puts "#{line[:last_name]}, #{line[:first_name]}, #{line[:zipcode]}, #{representative}"
end
end
This is a little different than the other methods we’ve started with. The first part that will stand out is this:
lines = []
@file.each do |line|
lines << line
end
When we use the FasterCSV library to read a CSV file, it reads one line at a time. Now that I’m accessing a public API, it’s unkind to generate the traffic of looking up thousands and thousands of attendees each time I run my program — not to mention that’ll take a long time. FasterCSV doesn’t give us a way to just grab a certain number of lines from the CSV file, so what I’ve done here is created a list named lines then gone through the CSV and just put each line into the list of lines.
Now that I have the lines as a normal list, instead of going through every single line I can say lines[0..20] which means “grab only the first twenty elements inside the lines list” then for each of them, name it line then do these instructions.
lines[0..20].each do |line|
representative = "unknown"
# API Lookup Goes Here
puts "#{line[:last_name]}, #{line[:first_name]}, #{line[:zipcode]}, #{representative}"
end
RUN this code and you should see output like this:
JSAttend Initialized. Nguyen, Allison, 20010, unknown Hankins, SArah, 20009, unknown Xx, Sarah, 33703, unknown Cope, Jennifer, 37216, unknown
Most APIs work by requesting a very complicated web address. In the case of the Sunlight API, the API we’ll be accessing can be found here:
Take a close look at that address. We’re accessing the legislators.allForZip method of their API, we send in an apikey which is the string that identifies JumpstartLab as the accessor of the API, then at the very end we have a zip. Try modifying the address with your own zipcode and load the page. Using 22182 as a sample I see this mess:
10RepP0-001-000016674-91VAN00002073R27120http://www.house.gov/wolf/202-225-0437400435FrankRudolphWolf241 Cannon House Office Building202-225-5136http://www.house.gov/formwolf/contact_email/emailzip.shtmlhttp://www.youtube.com/RepFrankWolfW000672H6VA10050MfakeopenID429http://www.opencongress.org/wiki/Frank_Wolf 11Rep1VAN00029891D95078http://connolly.house.gov/202-225-3071412272GeraldE.Connolly327 Cannon House Office Building202-0225-1492https://forms.house.gov/connolly/contact-form.shtmlGerryC001078Mhttp://www.opencongress.org/wiki/Gerald_Connolly Senior SeatSenP0-001-000016754-41VAN00028058D60043http://webb.senate.gov/202-228-6363412249JamesH.Webb144 Russell Senate Office Building202-224-4024http://webb.senate.gov/contact/http://www.youtube.com/SenatorWebbJimW000803S6VA00127MIJr.fakeopenID533http://www.opencongress.org/wiki/James_Webb Junior SeatSen1VAN00002097D535http://warner.senate.gov202-224-6295412321MarkR.WarnerB40c Dirksen Senate Office Building202-224-2023http://warner.senate.gov/public/?p=EmailSenatorWarnerW000805MIImarkwarnerhttp://www.opencongress.org/wiki/Mark_Warner 8RepP0-001-000016517-51VAN00002083http://www.house.gov/apps/list/press/va08_moran/RSS.xmlD27118http://www.moran.house.gov202-225-0017400283JamesP.Moran2239 Rayburn House Office Building202-225-4376http://moran.house.gov/zipauth.shtmlJimM000933H0VA08040MJr.fakeopenID283http://www.opencongress.org/wiki/James_Moran
This data is, in reality, very structured. My browser (Safari) just doesn’t interpret it very well — Firefox does a much better job of displaying XML. If you have it, open Firefox and load that address. If you’re familiar with writing HTML then this XML document probably makes some sense to you. You can see there is a response object that has a list of legislators. That list contains five legislator objects which each contain a ton of data about each legislator. Cool!
All I want for this API lookup is a comma separated list of the first initial and the last name like this: F.Wolf, G.Connolly, J.Webb. We could use the XML that we retrieved via the url. In Ruby, we could make use of the open-uri and hpricot libraries to fetch and parse this XML.
But the Sunlight Foundation has made it easy on us. Luigi Montanez, a developer at Sunlight Labs, created the sunlight gem. We call this a wrapper library because it’s job is to hide complexity from us. We can interact with it as a simple Ruby object, then the library takes care of fetching and parsing data from the server.
Add these lines to your rep_lookup method right under the line that says #API Lookup Goes Here:
results = []
zipcode = clean_zipcode(line[:zipcode])
Sunlight::Base.api_key = "e179a6973728c4dd3fb1204283aaccb5"
First, we create an empty list named results. Each zipcode can have multiple congresspeople and senators, so we’ll store all the results in this list. Second we grab the zipcode from the CSV file. Then we tell the Sunlight library our api_key that I got from the Sunlight Foundation.
Now what can we actually DO with the Sunlight library? Check out the readme on the project homepage: http://github.com/sunlightlabs/ruby-sunlightapi
We’re interested in the Legislator object. Looking at the examples in the ReadMe you’ll see this:
congresspeople = Sunlight::Legislator.all_for(:address => "123 Fifth Ave New York, NY 10003")
So that’s how to fetch information for a specific address, but our task is to find them via Zipcode. Look back at the URL we used to view the XML. See how it has legislators.allForZip? The wrapper library should have a similar method. I don’t see it in the ReadMe, so I’m going to look at the source code. Near the top of the project page I click lib, then the sunlight folder, then legislator.rb. I search the page for zipcode and find a method that starts like this:
def self.all_in_zipcode(zipcode)
Perfect! It takes in a zipcode and returns a list of legislators. Let’s try it in our program. Add these lines below the api_key line:
legislators = Sunlight::Legislator.all_in_zipcode(zipcode) puts legislators
Run your program and check out the results. Did it work? Kinda? Maybe? You’re probably seeing lines like this:
#<Sunlight::Legislator:0x102525280>
That’s ruby’s way of printing out a Legislator object. Not very informative, but it shows us that legislators are being found which is good! Next we should access the name of the legislator within that object — but how do I know what it’s called? Return to the Legislator source code and, near the top of the page, you’ll see this:
attr_accessor :title, :firstname, :middlename, :lastname, :name_suffix, :nickname,:party, :state, :district, :gender, :phone, :fax, :website, :webform, :email, :congress_office, :bioguide_id, :votesmart_id, :fec_id, :govtrack_id, :crp_id, :event_id, :congresspedia_url, :youtube_url, :twitter_id, :fuzzy_score, :in_office, :senate_class, :birthdate
This is a list of all the attributes (attr_accessor means “attribute accessor”) that a Legislator has, all the information it knows. If we want their url we ask for .website, or fax number with .fax. Here we’re interested in their first and last name. We’ll need to loop through each of the legislators — replace the puts legislators line with this code:
legislators.each do |leg| puts leg.firstname end
Run your program and check out the results. More impressive, right?
You can see that it’s finding one or more representatives and spewing their first names out. But we’re looking for first initial and last name. Let’s take out that puts leg.firstname line. In it’s same spot add these three lines:
lastname = leg.lastname
first_initial = leg.firstname[0..0]
results << "#{first_initial}.#{lastname}"
The first instruction says “find the lastname attribute of leg and store it into a variable named lastname.” Then the second line is “find the firstname attribute of leg, grab all the letters from position 0 to position 0 (giving you only the first letter), and store it into the variable named first_initial.” Next, “make a string with the data in first_initial, a period, and the data inside lastname then add it into the list named results.”
Lastly, change the final puts line in the method so it looks like this:
puts "#{line[:last_name]}, #{line[:first_name]}, #{line[:zipcode]}, #{results.join(", ")}"
The significant change being the last part that says results.join(", ") which means “take each thing in the list results and join them together with a comma and space between each one.” Run your program and you should see output like this:
JSAttend Initialized. Nguyen, Allison, 20010, E.Norton Hankins, SArah, 20009, E.Norton Xx, Sarah, 33703, M.Martinez, B.Nelson, C.Young Cope, Jennifer, 37216, J.Cooper, B.Corker, L.Alexander Zimmerman, Douglas, 50309, T.Harkin, C.Grassley, L.Boswell
Pretty cool? If you’d like, go back to the Legislator attribute list and see what other interesting data you could include. What would it look like to have the names read like “E.Norton (D)” for their party? Or how about “Rep E.Norton (D)”?