Dice

Background

Last week I had the pleasure of working with @j_scag_, @jamesjtong, @ivanbrennan, and @ahimmelstoss on this amazing gametable application. This rails based gaming platform lets users play chess, checkers, backgammon and chinese-checkers online with friends far and wide.

The game play is made possible by the Sync gem Pusher. Sync and Pusher allow for realtime refreshing and updating of partials on a given game page so that every time a user moves a game piece or enters text into the chat, those movements/chats can be seen by other players of the game on different computers.

For the backgammon game (and future games that require dice), we needed to implement a way for players to roll dice. We implemented this functionality using the existing chat functionality that had already been built.

Chat

The chat function routed messages to the following action on the MessagesController:

1
2
3
4
5
6
7
8
def create
  @game = Game.find_by(secure_room_code: params[:secure_room_code])
  @message = @game.messages.create(:content => params[:message][:content],
                                 :name => params[:message][:name],
                                 :source => "user")
  sync_new @message, scope: @game
  redirect_to game_path(@message.game.secure_room_code)
end

This action creates a new message with the desired contents and then syncs those contents across the different users of the game. When the message box partial is refreshed, the new message that was created by the action is now included among the messages for that game that get displayed in the message box to the right of the game board.

1
2
3
4
5
6
7
8
  <div class="messages-box">
    <ul id="chat">
      <% @game.messages.each do |message| %>
        <%= sync partial: 'message', resource: message %>
      <% end %>
      <%= sync_new partial: 'message', resource: Message.new, scope: @game %>
    </ul>
  </div>

Dice

Similarly, the dice functionality routed to the dice action on the MessagesController:

1
2
3
4
5
6
7
8
def dice
  @game = Game.find_by(secure_room_code: params[:secure_room_code])
  @message = @game.messages.create(:content => "#{rand(6)+1} #{rand(6)+1}",
                                 :name => "#{params[:name]} rolled",
                                 :source => "computer")
  sync_new @message, scope: @game
  redirect_to game_path(@message.game.secure_room_code)
end

This action is almost indentical to the create action save for two importance differences. First, the content is set to two randomly genereated numbers between 1 and 6 (#{rand(6)+1} #{rand(6)+1}). Second, the sources is taged as computer, instead of user :source => "computer" .

backgammon_board

Authentication

Given that our platform had no user authentication, we needed a way to make sure that users were actually rolling the dice instead of just changing their chat name and entering the dice rolls by hand. To accomplish this, we put some logic in the chat message partial that colored the text green when the source was the computer (i.e., dice roll) and left the text uncolored when the source was one of the users on the system. The logic in the partial for this is as follows:

1
2
3
4
 <li <% if message.source=="computer" %>
  style="color: #2C6043;"
  <% end %>
  ><strong><%= message.name %></strong>: <%= message.content %></li>

Somthing you may notice here is the strange line breaks within the opening <li> tag. This was necessary because when the erb code logic that checked for the source was all on one line it did not render appropriately.

Final thoughts

I really like how this dice functionality fits in with the existing messaging framework built into the application. It fits in with other simple but elegent solutions in this app like using secure game room codes http://gametable.co/games/56b3f2d83c82304ce036cec5c97435d7instead of users. The next step is to add some new games that need dice.

Shards

citibike rider

What we wanted to do

Create a bike rake availability prediction model for Citibike in NYC based on historical data station.

What we had

A snapshot of every station over a two and a half month period (81 days). This amounted to 18K snapshots of each of the 331 stations stored in json format as text files.

def parse_filename(file_name)
  DateTime.parse(file_name.gsub("citi_", "").gsub("_","")).to_s 
end

To shard or not to shard

With 331 stations being queried every 5 minutes (228 times a day), this meant we needed to create over 75K records for each day that we had data. With 81 days of data we ended up with approximately 6.1MM records. We had two options for how to store this data. First, we could make a single table with 6.1MM rows. The second option (which we chose given our use case) was to ‘shard’ the database into 331 seperate station tables that would each contain 18K records.

Our model

Our model takes the users current location and their desired destination and provides the nearest citibike stations to both the origin and destination. It then goes back in time and checks the bike and rack availability at those stations in the past, and uses that information to predict the availiblity of bikes or rack space today.

We needed to be able to quickly look up a history for each station. Instead of rooting through a 6MM rows in a single table, our model instead gets the necessary station ids and then queries the related tables which are considerabley smaller.

def origin_history(min)
  origin_stations.collect do |station| 
    cmd= "SELECT * FROM station_#{station.station_id} 
    WHERE station_time = \'#{rollback(56, min).to_s[0..18]}\'"
    connection.execute(cmd).field_values("bikes").join
  end
end

As you can see, our stations tables are not ActiveRecord models. Instead we had to write custom migrations to create the tables

def up
  Station.import
    Station.all.each do |station|
      create_table "station_#{station.station_id}" do |t|
        t.integer :bikes
        t.integer :free
        t.datetime :station_time
      end
    end
  end
end

and custom SQL to write and then read from these tables.

def build_row(number, bikes, free, station_time)
  begin
   cmd = "INSERT INTO station_#{number} (bikes, free, station_time) VALUES (?,?,?)"
   connection.execute(cmd)
  rescue
  end 

This was faster than dealing with hundreds of individual station models and then creating an instance for each station everytime we needed to write or read out of the database. It was also considerabley more cumbersome. For example, when we changed the database from sqlite to prostgres, we had to change all of our SQL statements so that they would work with the new database.

ToDo

Next steps are to fix the prediciton model. Currently we are predicting current availability based soley on one date in the past. We also need to set up a reaccuring task to seed our database with current data.

Seeds

ADK 46R Logo

Background: ADK 46-R

The ADK 46ers are a set of 46 mountains in upstate NY higher than 4000 feet. If you climb all of these mountains, keep records of your climbs, and then submit those records here, you can become a member of the 46-R club

ADK 46-R CLI

This project started as a Command Line Interface (CLI) that scrapped the following two sites for data on the mountains in the 46-R group and the associated hikes.

http://www.adk46er.org/peaks/index.html

http://www.everytrail.com/guide/adirondack-46er-list

The scrape loaded all the necessary data into a database and the CLI calls from that database when responding to user commands. The CLIT outputs the data in the following format:

47 mountains loaded.
23 hikes loaded
Type command (browse, show, help, or exit):
$ show
Enter mountain rank or ANY PART of mountain name: 
$ Marcy
Mountain(s)
Name: Marcy
Rank: 1
Elevation: 5344
Hike Name: Mt. Skylight, Mt. Marcy, and Gray Peak
URL: http://www.everytrail.com/guide/mt-skylight-mt-marcy-and-gray-peak
Difficulty: Difficult:
Miles: 18
Time: Full-day
Description: Two enormous bald summit 46ers and a trailess Gray Peak in a day loop hike form Heart Lake.  A state highpoint as well.
Type command (browse, show, help, or exit):

forty-sixer-on-rails

The next iteration of this project was using this data as the basis for a rails application that would allow users to view the mountains in the group, view the hikes in which those mountains could be hiked, and then keep track of their progress towards the goal of becoming an ADK 46-R. Before setting up the routes, controllers, views or any of that other good rails stuff, I needed to figure out how to get this data from my CLI applicaiton into my new rails application.

I didn’t want to just add the table to the rails database because I wanted the database to be integrated into ActiveRecord and the models I had designed for the application. I also didn’t want to hand write a massive seed file to get all this data into the database in my rails application.

Export to CSV

What I ended up doing was exporting the table from database that contained the scrapped data as a csv file and importating that csv file into the mountains and hikes tables in the rails database using the following commands:

sqlite3 forty_sixer.db 
sqlite> .mode csv
sqlite> .output mountains.csv
sqlite> SELECT * FROM mountains;

I did this step for both the moutains and hikes tables which resulted in following two files: moutnains.csv and hikes.csv. I then went to the database in the forty-sixer rails project and did the following to import these two tables:

sqlite3 development.sqlite3
sqlite> .separator ",""
sqlite> .import mountains.csv mountains
sqlite> .import hikes.csv hikes

The .seperator command was an important step in getting the sqlite3 to undestanding the format of the data I was importing. Another important step was not using MS Excel to open and manipulate the csv file before I tried importating the csv file into the database. Initially I had opened the csv file in Excel and added to columns for created_at and updated_at in order for the columns in the csv file to line up with the columns in the rails database. Long story short, this did not work. What I ended up doing was removing the timestamps from the rails database and remigrating before I imported the csv file so that the columns in the rails databse and the CLI database were identical.

seed_dump

The rails application database constantly gets changed during development often getting dropped or reset. I wanted a way to make a seed file out of the data I had just worked so hard to get into my database. Here is where the ruby gem seed_dump saved the day.

Once installed in your gemfile, the following command:

rake db:seed:dump

will make seed files out of all the tables in your rails database. In other words, it took 47 rows (MacNaughton mountain is not on the official list, but its over 4000 ft) of this from the table

id rank name height
1 1 Marcy 5344
1 2 Algonquin 5114

and turned it into 47 lines of ruby code in my seed file that looked like this

Mountain.create!([
  { :rank => 1, :name => "Marcy", :height => 5344, :trek_id => 14 },
  { :rank => 2, :name => "Algonquin", :height => 5114, :trek_id => 7 },

There are more elegent solution to this problem. For example, I could have added the scrape logic from the CLI application into the rails application and done the scrape directly from the rails applciation. On second thought, that would have been a lot easier. That’s what I get for working on this on the LIRR with no interent connection. At least now I have a backup plan next time I lose access to the internet or the data I need is only availabe through an existing databse.

Mta_status Gem

Should I run for the train?

parkour

Whenver I need to catch the LIRR, I usually end up running. And If I’m dodging and weaving around the fine people of NYC, I hope there’s a train warmed up and ready to go when I get to the station. Given that I spend so much time in the terminal (…on my computer) these days, I decided to build a gem that makes sure everything is ok with my train before the race begins by pulling the latest train service status data off the MTA webiste.

The gem can be installed by typing the following into your terminal:

gem install mta_status

Type in the following command into your terminal and you will get the subway service status by default.

mta_status 

If you type lirr, metro-north, or a host of other related terms after the mta_status command you should get the service status for that particular branch of the MTA. For example, the following will get you the service status for the LIRR.

mta_status lirr 

Data source

The link to the train service status page can be found on the MTA [MTA] developer downloads page.

In addition to service status, the developer site also has links to schedules and route information all meant to follow the General Transit Feed Specification (GTFS) specs. In other words, if you build something for one transit system, it will likely be transferrable to one of these other transits systems in the US and through the rest of the world.

Implementation

The MTA service status page is a text file with no css. In my previous (and limited) expierence with Nokogiri I had searched for nodes using css, so with this page I had to search for nodes using xpath.

# xpath
@doc.xpath("//name").collect do |name|
  name.children.text 

# css
student_page.css('div.social-icons a').collect do |link|
    link.attr('href')

The page was layed out well so it didn’t take too long to figure out how to get xpath to find what I was looking for. Let me rephrase that, the page was pretty well laid out expcept for this:

<text>
                &lt;span class="TitlePlannedWork" &gt;Planned Work&lt;/span&gt;
                 &lt;span class="DateStyle"&gt;
                &amp;nbsp;Posted:&amp;nbsp;10/29/2013&amp;nbsp; 9:43AM
                &lt;/span&gt;&lt;br/&gt;&lt;br/&gt;
              &lt;a class="plannedWorkDetailLink" onclick=ShowHide(50159);&gt;
&lt;b&gt;Busing at Melrose and Tremont Stations
&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;div id= 50159 class="plannedWorkDetail" &gt;&lt;/b&gt;Beginning Aug 19 until further notice
&lt;br&gt;
&lt;br&gt;As part of the Bronx Right-of-Way Improvements Project, substitute bus service
&lt;br&gt;will be provided at Melrose and Tremont Stations in both directions.  Buses will
&lt;br&gt;connect with trains at Fordham Station. Please visit &lt;a href=http://new.mta.info/mnr target=_blank&gt;&lt;font color=#0000FF&gt;&lt;b&gt;&lt;u&gt;mta.info&lt;/u&gt;&lt;/b&gt;&lt;/font&gt;&lt;/a&gt; for further details.
&lt;br&gt;
&lt;br&gt;Weekday Bus Schedule  |  &lt;a href=http://advisory.mtanyct.info/pdf_files/HARLEM_08-19-2013%20FINAL%20Buses%20Only%20Mon-Fri.pdf target=_blank&gt;&lt;font color=#0000FF&gt;&lt;b&gt;&lt;u&gt;pdf&lt;/u&gt;&lt;/b&gt;&lt;/font&gt;&lt;/a&gt;
&lt;br&gt;Weekend Bus Schedule  |  &lt;a href=http://advisory.mtanyct.info/pdf_files/HARLEM_08-19-2013%20FINAL%20Buses%20Only%20Sat-Sun.pdf target=_blank&gt;&lt;font color=#0000FF&gt;&lt;b&gt;&lt;u&gt;pdf&lt;/u&gt;&lt;/b&gt;&lt;/font&gt;&lt;/a&gt;
&lt;br&gt;&lt;b&gt;
&lt;br&gt;&lt;/div&gt;&lt;/b&gt;&lt;br/&gt;
            &lt;br/&gt;&lt;br/&gt;
          </text>

This is the “text” that explains the service disruption. My next task is to make sense of this mess and then potentially work it into a new version of the gem. Only issue there is the more features I add, the less useful the gem could become, especially since the service change information is often confusing.

TODO

I get the following gem related error everytime the gem runs. It doesn’t seem to effect how the gem runs, but it is ugly.

WARN: Unresolved specs during Gem::Specification.reset:
    mini_portile (~> 0.5.0)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.

Also, there’s a ton of information on the bus system that I didn’t incorporate and also some information on the bridges and tunnels operated by the MTA.

Anal v Methodical

POODR and Everybody Poops

POODR Everybody Poops

Everybody Poops

For those of you who don’t know, Everybody Poops is a popular potty training book for toddlers that catologues the pooping habits of man and beast.

I start here because when I first got interested in programming, I thought I might be good at it beacuse I was anal. Or more accuratly, anal retentive, a character trait that is meant to be aquired during the ‘anal’ stage of development. From wikipedia:

The term anal retentive (also anally retentive), commonly abbreviated to anal is used to describe a person who pays such attention to detail that the obsession becomes an annoyance to others, potentially to the detriment of the anal-retentive person.

I can definitely be obsessive. I belive there is a right and wrong way to do things. Heck, I can even tend towards the obsessive compulsive at times. Did I turn the stove off?

Anal retentive v methodical

The more I meditate on being anal, and the more I learn to program, the more I think that being anal could be a liability and not an asset. From what I can tell at this point, being methodical is much more important than being obessive, and yes, they are very different things.

Asserting control over things like cleaning or safety can appear a virtue. You know, being fastidious and organized. Failing to prepare is prepairing to fail. But obsessing over the details of knowable, controllable thinigs like cleaning and safety is relatively easy. This is why obsessive behavior is annoying to others. To the unobsessed, this type of obessession feels like overkill. An unwarrented and intrusive allocation of energy to relativey small problems.

Methodology, on the other hand, is about applying a systematic or established procedure (aka method) to solve a problem. You do not have to wholly understand the thing you are dealing with to be methodical about finding a solution. Take an example from vertebrate taxonomy:

Coelacanth

coelacanth

Thought to have been extinct for millions or years, coelacanths (aka the “living fossil”) were rediscoved swimming around the sea in the late 1930s. More closely related to landlubing creatures than typical fish, they are difficult to classify. Unable to quickly tidy them away into a preexisting understanding, the anal biologist might become overwhelmed when confronted with such a challange.

Althernatively, a methodical biologist could apply a series of procedures or methods to find a home for this old timer. Does it have feet? fins? lungs? gills? scales? skin? These individual questions are a set of methods for establishing the taxonomy of an animal. We can apply these methods to animals both known and unknown, and develop an understanding of their characteritics and habitats.

POODR

The complexity of technology and programming languages make them hard to obsessively control. And even if you manage to develop an encyclopedic understanding of today’s technology, the constant change would eventually overwhelm even the deepest capacity for memorization. By developing a systematic, methodical means of approaching code, you don’t have to worry about knowing everything or being in control. You can simply chip away at a big problem until it eventually becomes manageable.

I am coming to terms with the fact that the answers to code questions don’t have to be knowable or neat at first. What I seek to find now is not the anwser, but a methodical means to find those anwers. The more I understand the art of programming, the more comfortable I become with the unknown.

Tdd

An introduction to TDD

We have been doing a lot of Test Driven Development (TDD) at Flatiron the last few days. This blog post from Jason Arhart at the Las Vegas Ruby Group outlines some rules of TDD that will be good to keep in mind, especially as we get involved with more complicated projects.

Given that we have been writing code primarily to answer tests at this point (as opposed to writing the tests ourselves) this first rule hadn’t totally sunk in:

  • Never implement functionality until you have a failing test for it.

In other words, the first step after concieving a feature is to write a test for that feature, and coding the actual feature itself.

  • Only write enough of your test to make it fail

This one was more intuitive for me. Keep the test as simple as possible so that the test functions as an aid to good development instead of a hinderance.

  • Only write enough production code to make your test pass.

As a corollary to the previous rule, this also make sense. In both the test and the production code, the code should be a simple and efficient as possible.

  • Never refactor unless your tests are passing

This final rule is my favorite. As I struggle to write elegant code, it’s comforting to know that at the start the only thing that matters is that the code works.

I also really appreciate the philosophy behind this. Make it work first, then make it beautiful. This is also a good way to think about writing. Instead of perseverating over each sentence, just make sure to get all the ideas out. Then and only then, go back and make it eloquent.

Hello World