ChairNerd

Code, Design & Growth at

Keeping Track of Build Artifacts

easily track build artifacts from your continuous integration setup

In the process of building our Android application, we felt a need to send out test builds. As the Android ecosystem is much larger in terms of number of devices and versions of the operating system in use, we needed an easy way to distribute these builds. Internal android users and external testers needed the latest version of the android app to report bugs, suggest features, and suss out minor issues with the app.

What we did for the first few weeks was the following:

  • Have a developer make a build
  • Send out the build via email to interested users
  • Have them further disseminate that build to other testers

This didn’t work as by the time the last people retrieved a build, existing issues were already fixed in subsequent builds. It became the testing version of the telephone game.

We already had continuous integration for the Android app on TravisCI - if you haven’t used Travis and don’t yet have testing infrastructure, we wholeheartedly recommend it - and we wanted to hook into this process. Our testers should be able to view any build, filter by branch, and tell us exactly the build number they tested. This should all be automatic, with no other intervention from the developers.

In our case, we run the unit tests for the SeatGeek Android App on Travis. Travis notifies our hipchat account and uploads the latest APK to S3, at which point anyone in our dev room can download the app to test.

For non-developers - and testers not in our company - this doesn’t work as well. The dev room is a private room, and while we could rebroadcast the message, that doesn’t solve the issue of simplifying tests of multiple versions.

Thus, we built build-artifacts. build-artifacts is an app built with the Slim PHP framework - similar to Sinatra in Ruby or Flask in python - that can be deployed to Heroku or on your own infrastructure. The build-artifacts admin recieves a post-deploy webhook from travis, signaling that a build has completed - successfully or otherwise. We then store this data in ElasticSearch and show a small admin panel with the latest 10 builds, regardless of branch. We also showcase the latest stable release, so that our internal users can see the latest and greatest without mucking around with their systems.

Here’s a screenshot from our own instance of Android app build-artifacts running on Heroku:

SeatGeek Android app build-artifacts on Heroku

The app has worked well in our internal testing, and has definitely simplified our application testing. Download build-artifacts and give us some feedback. Pull requests welcome!

SeatGeek for Android

For the past seven months we’ve been building SeatGeek for Android. Today, we’re pumped to be able to share our work outside the walls of our office. The app just went live in the Play Store.

In this first release of the app we focused on the functionality most core to SeatGeek:

  • Event exploration
  • Ticket search across 100s of sellers
  • Interactive maps
  • Deal Score
  • Photo views from your seat

When we started working on the app, our #1 priority was to make sure it didn’t feel like a half-baked port of the existing SeatGeek iOS app. Everything was designed and built from the ground up to look and feel perfectly at home on Android. We wanted to make an app that felt like it might belong in the “Google” folder on your home screen.

App Tour

So what can it do?

Explore

Perhaps you’re feeling restless for a little live entertainment. Just open up SeatGeek for Android and tap the home screen’s “Explore Events” button. SeatGeek will provide a list of more upcoming live events than you can handle.

SeatGeek Android explore view screenshot

Maybe you’re out at a bar. “Party in the USA” is on and the TVs are showing an away game for the local team. You and your friends feel compelled to find out when Miley Cyrus’ tour is in town, and check ticket prices for the next several home games. That’s easy. Just do a quick search on SeatGeek. Voilà.

SeatGeek Android team view screenshot

Aggregation

Once you’ve found an event that’s up your alley, the last thing you want to do is overpay. Fortunately for you, SeatGeek for Android will save you some legwork and pull in tickets from all the major sites – for example, Ticketsnow, eBay, TicketNetwork et al. – plus hundreds more.

On top of that, Deal Score is built right into the app, so you can instantly zero in on the best deals in the house.

SeatGeek Android event view screenshot

Seat Views

For hundreds of the most popular venues in the US, SeatGeek for Android also has photographs taken from every section. So you can eliminate the guesswork and avoid arriving at your seat only to find yourself locked in a staring contest with a cement column obstructing half your view.

SeatGeek Android ticket view screenshot

Bonus facts

In the last month, seatgeek.com saw almost 300k visits from Android devices, a figure almost exactly 100% higher than in the same period last year. With Android users constituting such a large and fast-growing segment, we’re sure that the new app is bound to make a lot of SG users happy.

This Android release is the latest in a series of mobile-related milestones at SeatGeek from the past 12 months:

  • December ‘12: iPhone app release
  • July: iPad app release
  • September: iPhone app redesign for iOS 7
  • November: Android app release
  • Coming soon: iOS app 2.0 featuring user authentication and event/artist tracking features

Not an Android or iOS user? Don’t despair. We’ve also built a mobile-optimized version of our site that’ll work great on any device. Just head to seatgeek.com in your phone’s normal web browser.

SeatGeek for iPad

You have it on your iPhone, now get it on your iPad: today we’re launching the SeatGeek iPad app.

The app is filled with SeatGeek features you already know and love. You can use your iPad to search among 100+ of the biggest ticket sellers on the web. And you can also use Deal Score to find the best bargains. We also added some special features, just for iPad:

  • Huge, pannable and zoomable interactive venue maps for iPad with full Retina Display support
  • High-resolution seat view images for hundreds of venues
  • “Similar artist” recommendations
  • Team and artist biographies
  • Playable music samples for artists

Interactive Maps

Our interactive venue maps are a first for any iPad app. They allow you to easily explore ticket deals across all sections of a venue, with the ability to zoom in and see listings within sections down to the row level.

http://seatgeek.com/blog/wp-content/uploads/2013/07/full.jpg

Seat Views

For most major venues, we’ve included high-res seat view images for each section. That means you’ll know exactly how the field looks from your seats before you get to the game.

http://seatgeek.com/blog/wp-content/uploads/2013/07/s.jpg

Artist Bios & Similar Artists

For those looking to use the SeatGeek app to discover new shows, let our “similar artist” recommendations act as your guide. Each artist’s page features other artists that tend to have like-minded fans. Artist bios will give you the low-down on whatever obscure banjo-playing band you’re going to see in Brooklyn that night.

http://seatgeek.com/blog/wp-content/uploads/2013/07/grouplove.jpg

Playable Music Tracks

Want to double-check that you still love the band before you pull the trigger on tickets? You can play their music from directly within the app by tapping on tracks in the sidebar of their page.

http://seatgeek.com/blog/wp-content/uploads/2013/07/cwk.jpg

Download the App

Want to give it a try? The SeatGeek iPad app is available here in the App Store. It is, of course, 100% free.

Introducing Sixpack: A New A/B Testing Framework

seatgeek open sourced seatgeek/sixpack
Sixpack is a language-agnostic a/b-testing framework

Today we’re publicly launching Sixpack, a language-agnostic A/B testing framework with an easy to use API and built-in dashboard.

Sixpack has two main components: the sixpack server, which collects experiment data and makes decisions about which alternatives to show to which users, and sixpack-web, the web-based dashboard. Within sixpack-web you can update experiment descriptions, set variations as winners, archive experiments, and view graphs of an experiment’s success across multiple KPIs.

http://cl.ly/image/3d2F100f3y42/sixpack-web-large.png

Why did we do this?

We try to A/B test as much as possible, and have found that the key to running frequent, high-quality tests is to make it trivial to setup the scaffolding for a new test. After some discussion about how to make the creation of tests as simple as possible, we settled on the idea of porting Andrew Nesbitt’s fantastic Ruby Gem ‘Split’ to PHP, as the templating layer of the SeatGeek application is written in PHP. This worked for a bit, but we soon realized that only being able to start and finish tests in PHP was a big limitation.

SeatGeek is a service-oriented web application with PHP only in the templating/routing layer. We’ve also got a variety of Python and Ruby services, and plenty of complex JavaScript in the browser. In addition have a WordPress blog that doesn’t play nicely with Symfony (our PHP MVC) sessions and cookies. A/B testing across these platforms with our PHP port of Split was a hassle that involved manually passing around user tokens and alternative names.

If, for example, we wanted to figure out which variation of content in a modal window on our blog (implemented in JavaScript) led to the highest rate of clicks on tickets in our app (implemented in PHP), we’d need to create a one-off ajax endpoint to register participation and pass along a user token of some sort into the Symfony world. This kind of complexity was stopping us from running frequent, high-quality tests; they just took to long to set up.

Ideally we wanted to be able to start a test with a single line of JavaScript, and then to finish it with a single line of PHP. Since there was no tool that enabled us to do this, we wrote Sixpack.

How does is work?

Once you install the service, make a request to participate in an experiment like so:

$ curl http://localhost:5000/participate?experiment=bold-header&alternatives=yes&alternatives=no&client_id=867905675c2e8d54b6497ea5635ea94dca9fb415

You’ll get a response like this:

{
    status: "ok",
    alternative: {
        name: "no"
    },
    experiment: {
        version: 0,
        name: "bold-header"
    },
    client_id: "867905675c2e8d54b6497ea5635ea94dca9fb415"
}

The alternative is first chosen by random, but subsequent requests choose the alternative based on the client_id query parameter. The client library is responsible for generating and storing unique client ids. All of the official SeatGeek client libraries use some version of UUID. Client ids can be stored in MySQL, Redis, cookies, sessions or anything you prefer and are unique to each user.

Converting a user is just as simple. The request looks like:

$ curl http://localhost:5000/convert?experiment=bold-header&client_id=867905675c2e8d54b6497ea5635ea94dca9fb415&kpi=goal-1

You don’t need to pass along the alternative that converted, as this is handled by the sixpack server. The relevant response looks like this:

{
    status: "ok",
    alternative: {
        name: "no"
    },
    experiment: {
        version: 0,
        name: "bold-header"
    },
    conversion: {
        value: null
        kpi: "goal-1"
    },
    client_id: "867905675c2e8d54b6497ea5635ea94dca9fb415"
}

As a company we aren’t only interested in absolute conversions; we’re interested in revenue too. Thus, the next Sixpack release will allow you to pass a revenue value with each conversion which sixpack-web will use to determine a revenue-optimized winner of the experiment.

Clients

We’ve written clients for Sixpack in PHP, Ruby, Python and JavaScript which make it easy to integrate your application with Sixpack. Here’s an example using our Ruby client:

require 'sixpack'
session = Sixpack::Session.new

# Participate in a test (creates the test if necessary)
session.participate("new-test", ["alternative-1", "alternative-2"])
set_cookie_in_your_web_framework("sixpack-id", session.client_id)

# Convert
session.convert("new-test")

Note that while we must wait for a response from the participate endpoint to get our alternative necessary to render the page, we do not have to wait for the conversion action. By backgrounding the call to convert we can save a blocking web request.

What did we use to build this thing?

Sixpack is built with Python, Redis, and Lua.

The core

At the heart of Sixpack is a set of libraries that are shared between sixpack-server and sixpack-web. To keep things fast and efficient, Sixpack uses Redis as its only datastore. Redis’s built-in Lua scripting also gives us the ability to do some pretty cool things. For example, we borrowed ‘monotonic_zadd’ from Crashlytics for generating internal sequential user ids from UUIDs provided from the client libraries.

The Sixpack Server

We wanted to keep the server as lightweight as possible since making additional web requests for each experiment on each page load could quickly become expensive. We had originally thought to write Sixpack as a pure WSGI application, but decided that the benefits of using Werkzeug outweighed the cost of an additional dependency. In addition, Werkzeug plays very nicely with gunicorn, which we had already planned to use with Sixpack in our production environment.

Sixpack-web

Sixpack-web is slightly heavier, and uses Flask because of its ease of use and templating. The UI is built with Twitter Bootstrap, and the charts are drawn with d3.js.

How to get it and contribute

You can check out Sixpack here.

We’ve been using Sixpack internally at SeatGeek for over six months with great success. But Sixpack is young, and as such is still under active development. If you notice any bugs, please open a GitHub issue (http://github.com/seatgeek/sixpack), or fork the repo and make a pull request.

OpenCV Face Detection for Cropping Faces

Part of what makes SeatGeek amazing are the performer images which help turn it from something I would make as a developer (a database frontend) into a beautiful site. These previously required a lot of man power and a lot of time to collect and resize so we’ve recently created a new process using OpenCV face detection to automatically crop our images.

We use these images in our iPhone App, Explore pages as well as our performer pages throughout the site:

http://seatgeek.com/blog/wp-content/uploads/2013/06/performer_page.png

Old and Busted Way

Collection of images would take up to a month any time we wanted to add a new size. Outsourcers would be commissioned to collect 5000 images of a certain size. A simple step but expensive, inflexible, and time consuming.

New Hotness OpenCV!

OpenCV does the hard work of finding shapes that resemble a human face and returning the coordinates. It includes a python version but there are other libraries that wrap it up and make it a bit easier to work with.

Since we don’t have to do the hard work of finding faces why don’t we just get really big images and automate the process of making smaller ones!

faces_from_pil_images.py
1
2
3
4
5
6
7
8
9
def faces_from_pil_image(pil_image):
    "Return a list of (x,y,h,w) tuples for faces detected in the PIL image"
    storage = cv.CreateMemStorage(0)
    facial_features = cv.Load('haarcascade_frontalface_alt.xml', storage=storage)
    cv_im = cv.CreateImageHeader(pil_image.size, cv.IPL_DEPTH_8U, 3)
    cv.SetData(cv_im, pil_image.tostring())
    faces = cv.HaarDetectObjects(cv_im, facial_features, storage)
    # faces includes a `neighbors` field that we aren't going to use here
    return [f[0] for f in faces]

The haarcascade_frontalface_alt.xml file contains the results of training the classifier, which you can find as part of the OpenCV library or with a quick search online.

Starting with this picture of Eisley:

http://seatgeek.com/blog/wp-content/uploads/2013/06/example.jpg

we can use PIL to draw rectangles around the faces that OpenCV found:

draw_faces.py
1
2
3
4
5
6
7
8
9
10
def draw_faces(image_, faces):
    "Draw a rectangle around each face discovered"
    image = image_.copy()
    drawable = ImageDraw.Draw(image)

    for x, y, w, h in faces:
        absolute_coords = (x, y, x + w, y + h)
        drawable.rectangle(absolute_coords)

    return image

http://seatgeek.com/blog/wp-content/uploads/2013/06/faces.jpg

How to resize once we can find faces

Once we have a face we need to resize the image. In our case the images we collected are landscape format images and we use landscape images for our larger sizes. Staying in the format makes resizing a bit easier, we mostly make thinner images, a higher aspect ratio, so we can just resize to the correct width and crop it into a rectangle with the correct height that we want.

resize_to_face.py
1
2
3
face_buffer = 0.5 * (target_height - all_faces_height)
top_of_crop = top_of_faces - face_buffer
coords = (0, top_of_crop, target_width, top_of_crop + target_height)

The face_buffer is the amount of space we want to leave above the top-most face after finding the height from the top of the top face to the bottom of the bottom face to make sure we aren’t cropping anyone out of the photo.

Generally we want to include as much of the image as possible without cropping out anyones face so this works reasonably well for images where the final size is a higher aspect ratio then when you started. Now that you have the faces though you can use any sort of cropping that you need.

http://seatgeek.com/blog/wp-content/uploads/2013/06/final_banner.jpg

Installing into a Virtualenv

If you are installing this on your Ubuntu machine then the system packages should include everything you need. If you use virtualenvs you are going to run in to some issues. With virtualenvs I’ve found the steps for installing Simple CV to be incredibly helpful as a starting point.

Learnings

This was originally setup for some small images used in a few places around the site and would resize on-the-fly if the CDN didn’t have the image cached. Live resizing works for smaller images reasonably well, sometimes it would just take a couple of seconds to load an image, not ideal, but not horrible. As the images grew in size the face detection and resizing would take up to 20 seconds, safely a deal-breaker.

Resizing the width first and only cropping the height is an easy first step if the final aspect ratio will be greater. That will likely become an issue when other people find out they can get all of the images in whatever size they want. If you have to make images small enough that you can’t fit all of the faces into the image then you will really need to make something more intelligent.

We actually use ImageMagick instead of PIL since the service this is part of was already using it. ImageMagick is rather quirky and can sometimes ignore your commands without any mention of why.

3rd party services exist that can do this for you as well. With a little development work to integrate the service it is still cheaper than hiring someone to resize all of the images and still significantly faster. If you don’t want to pay for external hosting you can easily store them on your servers or S3.

A full example can bee seen as a gist. If you want to use these images and more to code some great interfaces we’re hiring frontend developers and more!

Introducing the SeatGeek Event Recommendations API

We’ve made it our mission to become America’s gateway to live entertainment. We even put it on our wall, right by the front entrance of our office.

So one area we’ve focused on is live event recommendations. After all, you can’t go see your favorite band if you don’t know it’s in town.

For the past year, we’ve been improving the recommendation service that powers our recommendations calendar on seatgeek.com…

SeatGeek recommendations calendar

…as well as our concert recommendation app on Spotify:

SeatGeek app on Spotify

After much work, we’ve finally advanced it to the point where we’re comfortable integrating it with our public events API and releasing it to the world.

We believe our recommendations are far more advanced than anything you can find on the Web right now, and we’re excited to see developers start to use it.

How Most Music Recommendation APIs Work

Recommendation engines operate on a pretty simple principle. You take a whole bunch of users and find out what they like. Then you build a whole bunch of correlations of the form, “People who like X like Y.” From that, you assume that X is similar to Y. When your next user comes along who likes X, you say to him or her, “We think you might like Y as well.”

There are quite a few publicly available APIs that support “similar artist” type queries. Last.fm has a good one. It’s a simple model to implement. The results are easy to cache. The problem is you get some pretty mediocre results when you try to do anything interesting.

A Motivating Example

Let’s say we have a user, Bob. Bob lists his two favorite musicians as Taylor Swift and Kenny Chesney. If you were to hit the SeatGeek API and ask for similar artists, you might get something that looks like this:

Artists Similar to Taylor Swift
1. Carrie Underwood
2. Justin Bieber
3. One Direction
4. Katy Perry
5. Ed Sheeran

Artists Similar to Kenny Chesney
1. Tim McGraw
2. Brad Paisley
3. Zac Brown Band
4. Jason Aldean
5. Keith Urban

Taylor Swift is a pop star with country influences and teen appeal. Unsurprisingly, she is most similar (on a 1:1 basis) with Carrie Underwood, another pop star with country influences and teen appeal. But she is also similar to some teen pop sensations (Justin Bieber) and some ordinary pop stars (Katy Perry).

Kenny Chesney, on the other hand, is pretty much just a country singer.

You can probably guess where I’m going with this. If Bob likes Taylor Swift and Kenny Chesney because he’s a country music fan and we start encouraging him to go see One Direction shows, he’s gonna be none too pleased.

And yet, unless you want to go through the trouble of building out your own recommendation system from scratch, that’s about the best you can do in terms of public APIs on the Net.

How the SeatGeek Recommendation API Works

The proper way to recommend music for Bob is to find other users like Bob and figure out what they like. In other words, to say, “People who like X and Y like __.” If Bob were to give us a third preference, the question becomes, “People who like X, Y and Z like __.” If he gives us a fourth preference, we use that as well.

Because the space of possible combinations grows exponentially, we can’t just compute all of these similarities and cache them. Instead, we use some clever math and compute affinity scores in real time. That allows us to support extremely flexible recommendation queries internally that we can use to build interesting experiences for our users.

Let’s go back to Taylor and Kenny. What happens if we try combining their preferences?

Artists Similar to Taylor Swift + Kenny Chesney (Jointly)
1. Tim McGraw
2. Jason Aldean
3. Carrie Underwood
4. Brad Paisley
5. Zac Brown Band

...

16. Katy Perry
23. Justin Bieber
31. One Direction

As you can see, the country music rises to the surface, and the teen-pop sensations fall out of the way.

Now let’s see what happens if we find a second user, Alice, who identifies her favorite bands as Taylor Swift and Katy Perry. Well, we might suspect she’s a fan of female pop stars, and our recommendations bear that out:

Artists Similar to Taylor Swift + Katy Perry
1. Ke$ha
2. Justin Bieber
3. Pink
4. Carrie Underwood
5. Kelly Clarkson

...

44. Zac Brown Band
49. Kenny Chesney

As we go deeper into the rabbit hole with more preferences, the recommendations become more and more advanced.

Pictures!

What follows is an example, simplified preference space. Green bands are ‘similar’ to X. Red bands are ‘similar’ to Y. Blue bands are ‘similar’ to Z. A user likes X and Z. What should we recommend?

Example Picture 1

Most recommenders combine preference through what is essentially a union operation. If a user likes X and Z, he will be shown events which are similar to X and events which are similar to Z.

Example Picture 2

SeatGeek’s recommendation engine (code-named Santamaria) computes the joint recommendation set of X and Z. In effect, it extracts the similar characteristics of X and Z and recommends other performers that share those specific traits. This leads to a much more accurate set of recommendations for the user.

Example Picture 3

As the number of seeds grows, the composition of preferences becomes more and more specific, and we can accurately recommend shows to people with fairly idiosyncratic tastes.

Example Picture 4

Using Our API

We’re very excited to be finally opening up our recommendations API to the public. The full documentation for our API can be found here:

http://platform.seatgeek.com/

You’ll need a SeatGeek account and an API key to get started:

Step 1: Request an API key here: http://seatgeek.com/account/develop

Step 2: Find some SeatGeek performers

http://api.seatgeek.com/2/performers?q=taylor%20swift
http://api.seatgeek.com/2/performers?q=kenny%20chesney

Step 3: Make a request using SG performer IDs

http://api.seatgeek.com/2/recommendations?performers.id=35&performers.id=87&postal_code=10014&per_page=10&client_id=API_KEY

The API takes a geolocation parameter, an arbitrary list of performers, and a wide array of filtering parameters.

Check it out and let us know what you think. You can email us at hi@seatgeek.com or post a message in our support forum.

Yak Shaving: Adding OAuth Support to Nginx via Lua

**TL;DR:** We built OAuth2 authentication and authorization layer via nginx middleware using lua. If you intend on performing this, read the docs, automate what you can, and carry rations.

As SeatGeek has grown over the years, we’ve amassed quite a few different administrative interfaces for various tasks. We regularly build new modules to export data for news outlets, our own blog posts, infographics, etc. We also regularly build internal dev tools to handle things such as deployment, operations visualization, event curation etc. In the course of doing that, we’ve also used and created a few different interfaces for authentication:

  • Github/Google Oauth
  • Our internal SeatGeek User System
  • Basic Auth
  • Hardcoded logins

Obviously, this is subpar. The myriad of authentication systems makes it difficult to abstract features such as access levels and general permissioning for various datastores.

One System to auth them all

We did a bit of research about what sort of setup would solve our problems. This turned up Odin, which works well for authenticating users against Google Apps. Unfortunately, it would require us to use Apache, and we are pretty married to Nginx as a frontend for our backend applications.

As luck would have it, I came across a post by mixlr referencing their usage of Lua at the Nginx level for:

  • Modifying response headers
  • Rewriting requests internally
  • Selectively denying access to hosts based on IP

The last one in that set seemed interesting. Thus began the journey in package management hell.

Building Nginx with Lua Support

Lua support for Nginx is not distributed with the core Nginx source, and as such any testing would require us to build pacakges for both OS X–for testing purposes–and Linux - for deployment.

Custom Nginx for OS X

For Mac OS X, we promote the usage of the Homebrew for package management. Nginx does not come with many modules enabled in the base formula for one very good reason:

The problem is that NGINX has so many options that adding them all to a formula would be batshit insane and adding some of them to a formula opens the door to adding all of them and associated insanity. - Charlie Sharpsteen, @sharpie

So we needed to build our own. Preferably in a manner that would allow further customization in case we need more features in the future. Fortunately, modifying homebrew packages is quite straightforward.

We want to have a workspace for working on the recipe:

cd ~
mkdir -p src
cd src

Next we need the formula itself. You can do one of the following to retrieve it:

  • Go spelunking in your HOMEBREW_PREFIX directory - usually /usr/local - for the nginx.rb
  • Have the github url memorized as an api and wget https://raw.github.com/mxcl/homebrew/master/Library/Formula/nginx.rb
  • Simply output your formula using brew cat nginx > nginx.rb

If we brew install ./nginx.rb, that will install the recipe contained within that file. Since this is a completely custom nginx installation, we’ll want to rename the formula so that future brew upgrade calls do not nix our customizations.

mv nginx.rb nginx-custom.rb
cat nginx-custom.rb | sed 's/class Nginx/class NginxCustom/' >> tmp
rm nginx-custom.rb
mv tmp nginx-custom.rb

We’re now ready to add new modules to our compilation step. Thankfully this is easy, we just need to collect all the custom modules from passed arguments to the brew install command. The following bit of ruby takes care of this:

# Collects arguments from ARGV
def collect_modules regex=nil
    ARGV.select { |arg| arg.match(regex) != nil }.collect { |arg| arg.gsub(regex, '') }
end

# Get nginx modules that are not compiled in by default specified in ARGV
def nginx_modules; collect_modules(/^--include-module-/); end

# Get nginx modules that are available on github specified in ARGV
def add_from_github; collect_modules(/^--add-github-module=/); end

# Get nginx modules from mdounin's hg repository specified in ARGV
def add_from_mdounin; collect_modules(/^--add-mdounin-module=/); end

# Retrieve a repository from github
def fetch_from_github name
    name, repository = name.split('/')
    raise "You must specify a repository name for github modules" if repository.nil?

    puts "- adding #{repository} from github..."
    `git clone -q git://github.com/#{name}/#{repository} modules/#{name}/#{repository}`
    path = Dir.pwd + '/modules/' + name + '/' + repository
end

# Retrieve a tar of a package from mdounin
def fetch_from_mdounin name
    name, hash = name.split('#')
    raise "You must specify a commit sha for mdounin modules" if hash.nil?

    puts "- adding #{name} from mdounin..."
    `mkdir -p modules/mdounin && cd $_ ; curl -s -O http://mdounin.ru/hg/#{name}/archive/#{hash}.tar.gz; tar -zxf #{hash}.tar.gz`
    path = Dir.pwd + '/modules/mdounin/' + name + '-' + hash
end

The above helper methods allow us to specify new modules to include on the command line and retrieve the modules from their respective locations. At this point, we’ll need to modify the nginx-custom.rb recipe to include the flags and retrieve the packages, around line 58:

nginx_modules.each { |name| args << "--with-#{name}"; puts "- adding #{name} module" }
add_from_github.each { |name| args <<  "--add-module=#{fetch_from_github(name)}" }
add_from_mdounin.each { |name| args <<  "--add-module=#{fetch_from_mdounin(name)}" }

At this point, we can compile a custom version of nginx with our own modules.

brew install ./nginx-custom.rb \
    --add-github-module=agentzh/chunkin-nginx-module \
    --include-module-http_gzip_static_module \
    --add-mdounin-module=ngx_http_auth_request_module#a29d74804ff1

We’ve provided this formula as a tap for you convenience at seatgeek/homebrew-formulae.

Custom Nginx for Debian

We typically deploy to some flavor of Debian–usually Ubuntu–for our production servers. As such, it would be nice to simply run dpkg -i nginx-custom to have our customized package installed. The steps to doing so are relatively simple once you’ve gone through them.

Some notes for those researching custom debian/ubuntu packaging:

  • It is possible to get the debian package source using apt-get source PACKAGE_NAME
  • Debian package building is generally governed by a rules file, which you’ll need some sed-fu to manipulate
  • You can update deb dependencies by modifying the control file. Note that there are some meta-dependencies specified herein that you’ll not want to remove, but these are easy to identify.
  • New releases must always have a section in the changelog, otherwise the package may not be upgraded to because it may have already been installed. You should use tags in the form +tag_name to idenfity changes from the baseline package with your own additions. I also personally append a number - starting from 0 - signifying the release number of the package.
  • Most of these changes can be automated in some fashion, but it appears as though there are no simple command line tools for creating custom releases of packages. That’s definitely something we’re interested in, so feel free to link to tooling to do so if you know of anything.

While running this process is great, I have built a small bash script that should automate the majority of the process. It is available as a gist on github.

It only took 90 nginx package builds before I realized the process was scriptable.

OAuth ALL the things

Now that it is possible to test and deploy a Lua script embedded within Nginx, we can move on to actually writing some Lua.

The nginx-lua module provides quite a few helper functions and variables for accessing most of Nginx’s abilities, so it is quite possible to force OAuth authentication via the access_by_lua directive provided by the module.

When using the *_by_lua_file directives, nginx must be reloaded for code changes to take effect.

I built a simple OAuth2 provider for SeatGeek in NodeJS. This part is simple, and you can likely find something off the box in your language of choice.

Next, our OAuth API uses JSON for handling token, access level, and re-authentication responses, so we needed to install the lua-cjson module.

# install lua-cjson
if [ ! -d lua-cjson-2.1.0 ]; then
    tar zxf lua-cjson-2.1.0.tar.gz
fi
cd lua-cjson-2.1.0
sed 's/i686/x86_64/' /usr/share/lua/5.1/luarocks/config.lua > /usr/share/lua/5.1/luarocks/config.lua-tmp
rm /usr/share/lua/5.1/luarocks/config.lua
mv /usr/share/lua/5.1/luarocks/config.lua-tmp /usr/share/lua/5.1/luarocks/config.lua
luarocks make

My OAuth provider uses the query-string for sending error messages on authentication, so I needed to support that in my Lua script:

local args = ngx.req.get_uri_args()
if args.error and args.error == "access_denied" then
    ngx.status = ngx.HTTP_UNAUTHORIZED
    ngx.say("{\"status\": 401, \"message\": \""..args.error_description.."\"}")
    return ngx.exit(ngx.HTTP_OK)
end

Now that we’ve handled our base error case, we’ll set a cookie for the access token. In my case, the cookie expires before the access token actually expires so that I can use the cookie to renew my access token.

local access_token = ngx.var.cookie_SGAccessToken
if access_token then
    ngx.header["Set-Cookie"] = "SGAccessToken="..access_token.."; path=/;Max-Age=3000"
end

At this point, we’ve handled error responses from the api, and stored the access_token away for later retrieval. We now need to ensure the oauth process actually kicks off. In this block, we’ll want to:

  • Start the oauth process if there is no access_token stored and we are not in the middle of it
  • Retrieve the user access_token from the oauth api if the oauth access code is present in the query string arguments
  • Deny users with invalid access codes

Reading the docs on available nginx-lua functions and variables can clear up some issues, and perhaps show you various ways in which you can access certain request/response information

At this point we need to retrieve data from our api to retrieve an access token. Nginx-lua provides the ngx.location.capture method, which can be used to retrieve the response from any internal endpoint within redis. This means we cannot call something like http://seatgeek.com/ncaa-football-tickets directly, but would need to use proxy_pass in order to wrap the external url in an internal endpoint.

My convention for these endpoints is to prefix them with an _ (underscore), and normally blocked against direct access.

-- first lets check for a code where we retrieve
-- credentials from the api
if not access_token or args.code then
    if args.code then
        -- internal-oauth:1337/access_token
        local res = ngx.location.capture("/_access_token?client_id="..app_id.."&client_secret="..app_secret.."&code="..args.code)

        -- kill all invalid responses immediately
        if res.status ~= 200 then
            ngx.status = res.status
            ngx.say(res.body)
            ngx.exit(ngx.HTTP_OK)
        end

        -- decode the token
        local text = res.body
        local json = cjson.decode(text)
        access_token = json.access_token
    end

    -- both the cookie and proxy_pass token retrieval failed
    if not access_token then
        -- Track the endpoint they wanted access to so we can transparently redirect them back
        ngx.header["Set-Cookie"] = "SGRedirectBack="..nginx_uri.."; path=/;Max-Age=120"

        -- Redirect to the /oauth endpoint, request access to ALL scopes
        return ngx.redirect("internal-oauth:1337/oauth?client_id="..app_id.."&scope=all")
    end
end

At this point in the Lua script, you should have a - hopefully! - valid access_token. We can use this against your whatever endpoint you have setup to provide user information. In my endpoint, I respond with a 401 status code if the user has zero access, 403 if their token is expired, and access_level information via a simple integer in the json response.

-- ensure we have a user with the proper access app-level
-- internal-oauth:1337/accessible
local res = ngx.location.capture("/_user", {args = { access_token = access_token } } )
if res.status ~= 200 then
    -- delete their bad token
    ngx.header["Set-Cookie"] = "SGAccessToken=deleted; path=/; Expires=Thu, 01-Jan-1970 00:00:01 GMT"

    -- Redirect 403 forbidden back to the oauth endpoint, as their stored token was somehow bad
    if res.status == 403 then
        return ngx.redirect("https://seatgeek.com/oauth?client_id="..app_id.."&scope=all")
    end

    -- Disallow access
    ngx.status = res.status
    ngx.say("{"status": 503, "message": "Error accessing api/me for credentials"}")
    return ngx.exit(ngx.HTTP_OK)
end

Now that we’ve verified that the user is indeed authenticated and has some level of access, we can check their access level against whatever we define is the access level for the current endpoint. I personally delete the SGAccessToken at this step so that the user has the ability to log into a different user, but that is up to you.

local json = cjson.decode(res.body)
-- Ensure we have the minimum for access_level to this resource
if json.access_level < 255 then
    -- Expire their stored token
    ngx.header["Set-Cookie"] = "SGAccessToken=deleted; path=/; Expires=Thu, 01-Jan-1970 00:00:01 GMT"

    -- Disallow access
    ngx.status = ngx.HTTP_UNAUTHORIZED
    ngx.say("{\"status\": 403, \"message\": \"USER_ID"..json.user_id.." has no access to this resource\"}")
    return ngx.exit(ngx.HTTP_OK)
end

-- Store the access_token within a cookie
ngx.header["Set-Cookie"] = "SGAccessToken="..access_token.."; path=/;Max-Age=3000"

-- Support redirection back to your request if necessary
local redirect_back = ngx.var.cookie_SGRedirectBack
if redirect_back then
    ngx.header["Set-Cookie"] = "SGRedirectBack=deleted; path=/; Expires=Thu, 01-Jan-1970 00:00:01 GMT"
    return ngx.redirect(redirect_back)
end

Now we just need to tell our current app who is logged in via some headers. You can reuse REMOTE_USER if you have some requirement that this replace basic auth, but otherwise anything is fair game.

-- Set some headers for use within the protected endpoint
ngx.req.set_header("X-USER-ACCESS-LEVEL", json.access_level)
ngx.req.set_header("X-USER-EMAIL", json.email)

I can now access these http headers like any others within my applications, replacing hundreds of lines of code and hours of work reimplementing authentication yet again.

Nginx and Lua, sitting in a tree

At this point, we should have a working lua script that we can use to block/deny access. We can place this into a file on disk and then use access_by_lua_file to use it within our nginx site. At SeatGeek, we use Chef to template out config files, though you can use Puppet, Fabric, or whatever else you’d like to do so.

Below is the simplest nginx site you can use to get this entire thing running. You’ll also want to check out the access.lua - available here - which is the compilation of the above lua script.

# The app we are proxying to
upstream production-app {
  server localhost:8080;
}

# The internal oauth provider
upstream internal-oauth {
  server localhost:1337;
}

server {
  listen       80;
  server_name  private.example.com;
  root         /apps;
  charset      utf-8;

  # This will run for everything but subrequests
  access_by_lua_file "/etc/nginx/access.lua";

  # Used in a subrequest
  location /_access_token { proxy_pass http://internal-oauth/oauth/access_token; }
  location /_user { proxy_pass http://internal-oauth/user; }

  location / {
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header  Host $http_host;
    proxy_redirect    off;
    proxy_max_temp_file_size 0;

    if (!-f $request_filename) {
      proxy_pass http://production-app;
      break;
    }
  }

}

Further Considerations

While this setup has worked really well for us, I’d like to point out some shortcomings:

  • The above code is a simplification of our access_by_lua script. We also handle POST request saving, inject JS into pages to renew the session automatically, handle token renewal etc. You may not need these features, and in fact, I didn’t think I’d need them until we started testing this system on our internal systems.
  • We had some endpoints which were available via basic auth for certain background tasks. These had to be reworked so that the data was retrieved from an external store, such as S3. Be aware that this may not always be possible, so oauth may not be the answer in your case.
  • Oauth2 was simply the standard I chose. In theory, you could use Facebook Auth to achieve similar results. You may also combine this approach with rate-limiting, or storing various access levels in a datastore such as redis for easy manipulation and retrieval within your Lua script. If you were really bored, you could reimplement Basic Auth within Lua, it’s just up to you.
  • There are no test harnesses for systems such as these. Test-junkies will cringe when they realize it’s going to be integration testing for a while. You can likely rerun the above by injecting variable mocks into the global scope and then executing scripts, but it’s not the ideal setup.
  • You still need to modify apps to recognize your new access headers. Internal tools will be easiest, but you may need to make certain concessions for vendor software.

The above blog post combined nginx-lua and our internal oauth provider to enable using OAuth for access control to our infrastructure.

Further Reading

Also

SeatGeek is hiring UI Developers and Web Engineers. Your first tasks will be to make my OAuth application pretty and write some tests for a bit of Lua code… (kidding)

If you have any questions about the project just let us know in the comments!

Putting Venue Maps in a Terminal: Introducing SGCLI

At SeatGeek we have regularly scheduled Hackathons–opportunities for all of us to drop what we’re doing for two days and work on whatever interesting, creative, or experimental projects we dream up. We had a Hackathon last week, and I decided to write a command-line client for SeatGeek, which I called SGCLI.

Back in 2005 I spent a couple of months using a box running FreeBSD without X as my main computer. I’m not sure exactly what the thought process was that led to that setup, but using it taught me that in some instances command-line applications can be much faster to work with than their graphical equivalents. With that in mind, I set out to try and replicate (and if possible, improve) the experience of searching for and buying tickets on SeatGeek.

The first step to building the project was to learn some curses. Luckily, there’s a great tutorial on curses programming with Python on python.org. It didn’t take long to have a basic application running (complete with an ASCII-art version of the SeatGeek logo):

SGCLI Welcome Screen

The rest of the first day was spent building out the main parts of the SeatGeek experience: the search page, selecting an event, browsing ticket listings and viewing an individual listing. There were some intense moments, but by the time I quit for the night (probably around 11:30 or so - and I was early!) it seemed like at least the basics would be working in time for Friday’s demo.

On Friday I focused mostly on the features that I thought would give some added spark to the demo, namely rendering SeatGeek’s beautiful maps into slightly-less-beautiful ASCII art, and meme integration. Rendering the maps proved to be a bit tricky, but in the end it worked out pretty well:

Yankee Stadium in SGCLI

Map rendering works using the map infrastructure we built for the SeatGeek mobile website. That allows SGCLI to get a single .png for an event that represents the map. Then, SGCLI uses PIL to scale the map down such that the size in pixels is the same as the size in characters of the final map we want to render (that final size depends on the size of the terminal window). An important note here is that the aspect ratio has to change during this operation: almost all fonts are much taller than they are wide and we need to correct for that. So, if we decide that our final map should be 20 characters high and 36 characters wide, we use PIL to scale the 320x320 .png down to 36x20.

After scaling down we use PIL to create two copies of the same image. One of the copies gets converted to grayscale, and the other copy gets “posterized” down to a single bit per color channel (that matches the color options I have available in my terminal). The final step is to actually output the ASCII art. For each pixel we use the grayscale image to select a character to display based on luminosity (darker pixels get characters like #, while brighter pixels become characters like . or -). We use the posterized image to figure out what color to draw the character in (out of the 7 options available to us). After rendering the map using curses, we draw a marker over the top to indicate where the selected tickets are located.

This method certainly could be smarter (for example, it doesn’t take into account the ‘shape’ of a character when deciding what character to use), but it worked pretty well and was easy to implement. Check it out in the source code for SGCLI to see it in all of its glory.

Check out the final project–it’s open source and easy to install and run. Just do:

pip install sgcli # or easy_install sgcli
sgcli

There are plenty of neat things that I didn’t mention here. Be sure to check out the support for autocompletion while searching, and make sure to hit “b” on a ticket listings page (even if you don’t follow through and buy the tickets) - you’ll get a nice surprise when you return to the app. There are some notes on keyboard shortcuts in the README.

If you have any questions about the project just let us know in the comments!

Introducing Absolute Deal Score

Deal Score

Among all of SeatGeek’s features, we have always been proudest of Deal Score, our metric that enables users to seamlessly pick out the best deal for an event from within thousands of ticket listings.  But today that feature is getting a whole lot cooler. We’re launching Absolute Deal Score, an upgrade that has been under development for many months here at SeatGeek.

For the uninitiated, the premise behind Deal Score is simple: Deal Score is a rating of whether a ticket is a bargain or a rip-off, which facilitates apples-to-apples comparisons among ticket listings.  A metric like Deal Score is particularly useful for live event tickets because every seat in a venue is different.  If you’re shopping online for batteries, you can sort your options by price and expect the cheapest options will include the good bargains.  But if you’re shopping for Yankees tickets and you sort by price, the cheapest options tend to be a bunch of nosebleed seats.  Deal Score offers a better way to identify good buys.

As initially designed, Deal Score compared the relative value of a ticket listing to that of all others listed in the venue for that one given event. The worst deal for every event was given a Deal Score of 0, the best deal was given a Score of 100, and everything else was filled in between those two numbers.  But as we thought about how best to surface values in the ticket marketplace, we came to the realization that anchoring value against just a single game wasn’t going far enough.

Today, we’re excited to announce a major overhaul of our Deal Score algorithm–one that not only identifies the best ticket deals within a given event, but also how those individual deals comparatively stack up against all tickets for similar events (for example, how a listing for a single Yankees ticket stacks up against those for all other Yankees games this season).   Listings are no longer anchored at 0 and 100 for every event but, rather, 0 is anchored to the absolute worst deal for all events on SeatGeek, and 100 is anchored to the best deal among all events.

Deal Score 2

SeatGeek’s head of R&D, Steve Ritter, spoke in great detail about the math and approach behind Deal Score in a two-part blogpost series back in May but, in brief, the algorithm assesses the listed price of a ticket against our estimated market value of that ticket (based on historical prices, row/section position and other factors).  As we thought about ways to build on the solid foundation of Deal Score, we realized that our Deal Score methodology could be applied across a broad series of similar events, such as a full season of NBA games or a multi-month run of a Broadway show. We’ve been testing this live on SeatGeek for the past few weeks, so you may have already noticed some changes.  As a user, this update has a some meaningful benefits:

  • Deal Scores are now comparable across all events, not just against ticket listings within a single game or concert, as was the case previously. For example, if you compare a 93 deal score for a November 2012 Knicks game against a 85 deal score for a different game at MSG 3 months later, you’ll know that the November ticket is without question a better value–a distinction that couldn’t previously be made.
  • Absolute Deal Score still surfaces the best ticket deals within a single event, just as the previous iteration of Deal Score. The only difference is that scores aren’t anchored to a relative distribution as before. The top 10 deals you see on any event page are still the best ticket deals available that evening, just as was the case previously.
  • You may see fewer “100” deal scores on event pages, but the highest Deal Scores now truly represent exceptional values for that event type. This doesn’t mean that ticket labeled with even a 50 or 60 Deal Score is a poor value–indeed, any ticket with a DS above 50 ranks in the upper 20% of all deals on site. But we felt it important to define far better gradations between above average deals and purely outstanding deals, and to do so across the widest body of comparable ticket listings.

We’re excited about this update and how it will change the experience of shopping on SeatGeek. We’d love to hear what you think! As always, drop us a line at hi@seatgeek.com.

Using a Kalman Filter to Predict Ticket Prices

Welcome back! In case you missed part one of this series, we’re opening up the hood on Deal Score, one of SeatGeek’s most popular features.

In part one we gave a brief overview of why we sort ticket listings by Deal Score rather than by price. We gave you our two main assumptions:

  • Seat quality, within a given venue, has a consistent ordering.
  • The relationship of seat price to seat quality follows a similar pattern across all events at a given venue.

Court

In the last post we discussed how we take advantage of the first assumption. Today, we’ll explain the second assumption and how it leads to the accurate valuations of thousands of tickets on a daily basis.

This second assumption means that all we need to do in order to predict prices is find a function, unique to each event, that maps our seat score vector to a vector of expected prices. Since we’re working with limited data here and surely do not have enough data to induce the structure of this curve for each event, we make a further simplifying assumption: for each venue, the curve will look similar for every event at that venue, whether that curve is a straight line, polynomial, or otherwise.

This does not mean that we can assume that a premium seat will carry the same relative premium over a nosebleed for each game. For a game that will not sell out, for example, the value of a nosebleed should be negative–this ticket will not sell, and given the additional expenses involved in attending an event (gas, parking, food, etc.), someone would have to be paid in order to sit in that seat. In such a situation, box seats are infinitely more valuable than bleachers. For a premium game, on the other hand, such as a playoff game, opening day, or any game at Fenway Park, all seats will sell out, as there is significant value just getting through the gates to see the event. Perhaps the box seat is only worth two or three times as much as the nosebleed in these cases.

At this point, we break out a terrific tool for processing small amounts of noisy data, the Kalman filter. Heavily used in the guidance and control of spacecraft and aircraft as well as with time-series data in economic and financial spheres, the Kalman filter is an algorithm that uses state estimates of model parameters combined with estimates of their variance to make predictions about the output of a linear dynamic system. I’ll spare you the obligatory rocket science joke and jump straight into a tutorial on how to use a Kalman filter to make predictions in the face of noisy and limited ticket data.

Since every observed price is going to be the output of a noisy system at a point in time, we are most interested in the likely state of the system as of the last observation, making a recursive estimator such as the Kalman filter an excellent choice.

Step one: model the system

In SeatGeek’s case, our assumption is that the underlying structure of price dynamics is similar across events, and we can therefore generate a curve mapping seat quality to expected prices using the same parameters on the curve each time. As an example, here is a plot with seat scores for Knicks games at Madison Square Garden on the x-axis and historical average sale price for those seats on the y-axis.

Graph

I’ve added a best-fit line, which shows that sale prices tend to grow exponentially with seat quality at this particular venue. As such, we will model our price predictions as log-linear with respect to seat quality. We’re about to do a lot of math here, so feel free to skip ahead.

The Kalman filter maintains the state of the filter at step k with two variables:

  • : the parameters of the model given observations up to and including step k

  • : the covariance matrix of parameter errors, a measure of the confidence the model has in its parameters

In our simple case, represents the intercept and slope of our line. represents the covariance of parameter errors. This covariance matrix will be used down the line to determine which parameters must change when we make a new observation. It will also determine the magnitude of the adjustment.

A general Kalman filter uses a state-transition matrix in order to advance from one observation to predicting the value of the next.

Where: is our best estimate of given observations up to and including . is white noise of the same dimension as the model, drawn from a multivariate normal distribution having covariance , representing the process error. Many applications of the filter model physical systems that take velocity and acceleration into account (and behind the scenes, so does SeatGeek). In these cases, can include time-varying parameters, but in our simple example, we set:

Assuming that in between observations the underlying model dynamics do not change according to any known physical system, we use the identity matrix.1

Step two: model the output

Sharp eyes may have noticed that the preceding equation does not use our lovely seat scores quite yet. The reason is our observations do not come in the form of linear models, but rather in observed fair values for seats, i.e. when users express an intent to buy. We have to model our output as , where is a 1x2 matrix , theta being the rating of the seat in question. Keeping with Kalman filter assumptions we model our residuals, the difference between our observations and predictions, as Gaussian white noise.2 In the single-output case, the observation noise can be thought of as the square of our standard estimation error, or how far we allow our predictions to be off before the model updates itself. This variance, , will be used later on when we update the model.

Step three: predict

Now that we have a model of our system, we can start making predictions. Using historical data, we can generate , our default parameters, and start predicting prices. Our prediction, of course, is that our observations will lie on the line defined by , shown in the image above. In this case, .

We add those parameters to our listing feed, determine seat quality from the data provided to us by the market in question, and predict a price for each listing. We compare the predicted price to the listed price, assign a Deal Score to each listing, and sort your search results accordingly. We live to fight another day.

Step four: observe

Market dynamics, however, are not so kind as to stay constant, and our models, alas, are unable to perfectly predict every price from the outset. The Kalman filter is thus useful for responding to changing tides. Since observations of changing tides can be few and far between and must inform our predictions on all other tickets, it behooves us to have a degree of certainty about our model, which we represent by , the 2x2 covariance matrix of our state estimate errors. A good design decision is to start off with large numbers on the diagonals and zeroes elsewhere, assuming low certainty of model parameters and independence.2 The filtering process should be able to give you excellent color on their true relationship.

Many of the signals discussed in part one can be interpreted, directly or indirectly, as a fair ticket price, and when we observe a new price, , we model the residual as where represents our predicted price for that seat. In the Kalman filter, the residual variance (variance of ) is modeled as . In the general case, these are covariance matrix. Since our model outputs only one value, a predicted price, and are variances. we have seen before, this is the general model of our error variance. is the variance of this particular observation, which varies depending on the seat score of this particular observation (see dotted lines in the slideshow below). Variance is higher on expensive tickets than it is on cheaper tickets. We will use the residual and its variance in the next step to order to update our parameters.

Step five: update

We now come to the key element of the Kalman filter, the gain. The gain takes our a priori estimate covariance , our observation model , and our residual variance in order to decide how much we should change our model parameters before the next prediction. Our optimal gain is . Kalman gain is a bit of a tricky nut to crack. If you think of the new parameters as a weighted average of the old estimate and the new observation, provides the optimal weight for the observed residual in the new average.

If you’ve been following along, you can see that the larger our a priori uncertainty, the higher this gain factor gets. Now that we have a gain factor, we can start making some updates.

For the model parameters, this is easy. We simply scale the Kalman gain by the measurement residual, yielding us a new estimate. You can see here that if we guessed the price exactly, the slope and intercept do not change (a good sanity check) and that if we were fairly sure about the estimate beforehand, it requires a major miss before we update it substantially:

Similarly, we also update our error covariance matrix,

The error covariance update is a bit of a headache as well; the easiest way to think about it is to remember that represents the covariance of our parameter estimation errors and to play around with what makes the changes in large or small. For example, if the observation noise, is very large, our gain will be small and our certainty will remain mostly unchanged.

Now that we understand how the filter works, let’s rejoin our original programming and see it in action!

Putting it all together

The slideshow below takes you on a visual tour through several steps of the dynamic linear model. In all slides, the dark red line represents our estimate of the mapping from seat quality to expected price and the and the dashed lines represent our 95% confidence interval for the price.

Before concluding, I’d like to note that a major motivation behind this series was the lack of real-world Kalman filter examples out here on the internet, which is disappointing given its usefulness as an estimator, especially for low-dimensional time-variant systems with small data. Here are some of the better articles I’ve found:

I gladly welcome thoughts on our usage of the filter or critiques of my explanation from those who have a better handle on things. Leave a comment or find me on twitter @steve_rit.

Notes

  • 1: If we wanted to add time-variance to our parameters, we could use something like:
  • 2: To cut down on the amount of notation, I’ve removed some symbols representing noise that aren’t directly used in the predict-update process.