May 20th, 2013

SeatGeek and Timbre Move Into the Limelight

SG and Timbre

What happens when you take the best products for ordering music tickets on the web and combine it with the best apps for discovering live music on your mobile device? Ladies and gentlemen, we’d like to introduce you to Limelight: a platform developed to allow any artist/venue to submit their show for display on both the Timbre app and SeatGeek.com.

Add a performer

Limelight is a self service product for artists, managers, and whoever else has a concert they want to publicize, to add their event information to SeatGeek’s event catalog. Inclusion in our catalog means that these events will not only be listed in SeatGeek’s apps, but also on the sites and apps of hundreds of partners who use our easy-to-integrate API. (An API which not only gives developers access to every live event in North America, but is currently used by Yahoo, AOL Music and Rolling Stone.)

The process is easy. Simply click here and login with your SeatGeek account. From there you’ll be prompted to add your band or venue into our system. After entering in all the necessary details, just click the “add” button – your show will take 24-48 hours to appear on the SeatGeek/Timbre catalog. Yes, it’s really that easy, and you never even had to leave your computer chair.

Head over to Limelight and see what all the fuss is about.

May 13th, 2013

SeatGeek for iOS V1.1 Brings Upgrades in Event Discovery and Ticket Search Experience

At SeatGeek, we’ve noticed that early May marks the beginning of the busiest season of the year in terms of major events. Summer concerts and festivals are happening in every corner of the country, the MLB season is heating up and the NBA and NHL are in the midst of exciting playoff stretch runs.

So right in time for this season, we’ve pushed out an exciting new update to SeatGeek’s app for iPhone free download here that will make it even easier for you to discover events you’ll love – and provide the best possible way to save money on tickets for those events.

To complement the immersive interactive maps, Deal Score and exhaustive search functionality that sits at the core of our original iPhone app, we’ve added three exciting new features in SeatGeek for iPhone v1.1 that will help you discover and enjoy live events:

A new “Explore” feature that surfaces upcoming events in your area New view from seat images for most major venues Direct links on event pages to primary box office pages

Explore feature helps you discover new shows in your area

The addition of our new “Explore” feature was the most significant update of v1.1 and provides iPhone users with a way to discover new events in their area – a functionality we already provide on the SeatGeek web application in the form of our Columbus event recommendation calendar.

There are scores of apps out there that aspire to provide event recommendations. Most require users to provide copious amounts of personal data before any recommendations are generated.

We wanted to take a much less intrusive approach with our Explore feature. When you hit the “Explore Events” button on the main page, you’ll be presented with an elegant, scrollable list of popular events in your area, assorted across different genres, artists and teams. Once you find an event you like, you’ll see the specific dates that artists is playing, as well as a new scrollable list of additional shows by similar artists. As you “explore” through these recommendations, you can easily drill down to the perfect show that might match your interest with a few flicks of your thumb – with no log-in or scan and match required.

explore

View-from-seat images for 100+ venues

One of our most popular features on the SeatGeek web app are the real view-from-seat images we provide when a user mouses over a given ticket listing on our interactive stadium maps. We wanted to bring that same functionality over to the iPhone, so this update included a new Ticket Detail page that incorporates a high-quality image of the view from your seat in most major sports venues.

The image can be panned around so that you can see all aspects of the stadium from that particular location. We’ve got support for over 100 venues already and will add even more in the near future.

view from seat

Direct link to primary box office page

At SeatGeek, we’ve worked hard to pull in as many sources of event tickets online as possible — both from major secondary market sites like StubHub, TicketsNow and Ebay as well as smaller vendors across North America. But as our goal is to provide consumers with ALL ticketing options for an event, we want to make sure you can check in on the primary box office as well.

Our updated iPhone app will allow you to do just that, as we’ve now included a link to the primary box office. We know that there are many great concerts out there that don’t have an active secondary market (e.g., the show is in a small venue, or it’s an up-and-coming band), so this new feature should allow for users to land tickets to virtually any event in our system.

primary

How to download the SeatGeek iOS app

If you haven’t yet downloaded the SeatGeek app, you can do so by clicking the link below. It’s free and it will help you save serious cash on event tickets this summer (and beyond).

download

And if you have already downloaded the app, we’d love to hear your thoughts on it! Feel free to email us at hi@seatgeek.com or tweet us at @SeatGeek anytime!

Feb 5th, 2013

Introducing the SeatGeek Event Recommendations API

We’ve made it our mission to become America’s gateway to live entertainment. We even put it on our wall, right by the front entrance of our office.

So one area we’ve focused on is live event recommendations. After all, you can’t go see your favorite band if you don’t know it’s in town.

For the past year, we’ve been improving the recommendation service that powers our recommendations calendar on seatgeek.com…

…as well as our concert recommendation app on Spotify:

SeatGeek app on Spotify

After much work, we’ve finally advanced it to the point where we’re comfortable integrating it with our public events API and releasing it to the world.

We believe our recommendations are far more advanced than anything you can find on the Web right now, and we’re excited to see developers start to use it.

How Most Music Recommendation APIs Work

Recommendation engines operate on a pretty simple principle. You take a whole bunch of users and find out what they like. Then you build a whole bunch of correlations of the form, “People who like X like Y.” From that, you assume that X is similar to Y. When your next user comes along who likes X, you say to him or her, “We think you might like Y as well.”

There are quite a few publicly available APIs that support “similar artist” type queries. Last.fm has a good one. It’s a simple model to implement. The results are easy to cache. The problem is you get some pretty mediocre results when you try to do anything interesting.

A Motivating Example

Let’s say we have a user, Bob. Bob lists his two favorite musicians as Taylor Swift and Kenny Chesney. If you were to hit the SeatGeek API and ask for similar artists, you might get something that looks like this:

Artists Similar to Taylor Swift
1. Carrie Underwood
2. Justin Bieber
3. One Direction
4. Katy Perry
5. Ed Sheeran

Artists Similar to Kenny Chesney
1. Tim McGraw
2. Brad Paisley
3. Zac Brown Band
4. Jason Aldean
5. Keith Urban

Taylor Swift is a pop star with country influences and teen appeal. Unsurprisingly, she is most similar (on a 1:1 basis) with Carrie Underwood, another pop star with country influences and teen appeal. But she is also similar to some teen pop sensations (Justin Bieber) and some ordinary pop stars (Katy Perry).

Kenny Chesney, on the other hand, is pretty much just a country singer.

You can probably guess where I’m going with this. If Bob likes Taylor Swift and Kenny Chesney because he’s a country music fan and we start encouraging him to go see One Direction shows, he’s gonna be none too pleased.

And yet, unless you want to go through the trouble of building out your own recommendation system from scratch, that’s about the best you can do in terms of public APIs on the Net.

How the SeatGeek Recommendation API Works

The proper way to recommend music for Bob is to find other users like Bob and figure out what they like. In other words, to say, “People who like X and Y like __.” If Bob were to give us a third preference, the question becomes, “People who like X, Y and Z like __.” If he gives us a fourth preference, we use that as well.

Because the space of possible combinations grows exponentially, we can’t just compute all of these similarities and cache them. Instead, we use some clever math and compute affinity scores in real time. That allows us to support extremely flexible recommendation queries internally that we can use to build interesting experiences for our users.

Let’s go back to Taylor and Kenny. What happens if we try combining their preferences?

Artists Similar to Taylor Swift + Kenny Chesney (Jointly)
1. Tim McGraw
2. Jason Aldean
3. Carrie Underwood
4. Brad Paisley
5. Zac Brown Band

...

16. Katy Perry
23. Justin Bieber
31. One Direction

As you can see, the country music rises to the surface, and the teen-pop sensations fall out of the way.

Now let’s see what happens if we find a second user, Alice, who identifies her favorite bands as Taylor Swift and Katy Perry. Well, we might suspect she’s a fan of female pop stars, and our recommendations bear that out:

Artists Similar to Taylor Swift + Katy Perry
1. Ke$ha
2. Justin Bieber
3. Pink
4. Carrie Underwood
5. Kelly Clarkson

...

44. Zac Brown Band
49. Kenny Chesney

As we go deeper into the rabbit hole with more preferences, the recommendations become more and more advanced.

Pictures!

What follows is an example, simplified preference space. Green bands are ‘similar’ to X. Red bands are ‘similar’ to Y. Blue bands are ‘similar’ to Z. A user likes X and Z. What should we recommend?

Example Picture 1

Most recommenders combine preference through what is essentially a union operation. If a user likes X and Z, he will be shown events which are similar to X and events which are similar to Z.

Example Picture 2

SeatGeek’s recommendation engine (code-named Santamaria) computes the joint recommendation set of X and Z. In effect, it extracts the similar characteristics of X and Z and recommends other performers that share those specific traits. This leads to a much more accurate set of recommendations for the user.

Example Picture 3

As the number of seeds grows, the composition of preferences becomes more and more specific, and we can accurately recommend shows to people with fairly idiosyncratic tastes.

Example Picture 4

Using Our API

We’re very excited to be finally opening up our recommendations API to the public. The full documentation for our API can be found here:

http://platform.seatgeek.com/

You’ll need a SeatGeek account and an API key to get started:

Step 1: Request an API key here: https://seatgeek.com/account/develop

Step 2: Find some SeatGeek performers

https://api.seatgeek.com/2/performers?q=taylor%20swift
https://api.seatgeek.com/2/performers?q=kenny%20chesney

Step 3: Make a request using SG performer IDs

http://api.seatgeek.com/2/recommendations?performers.id=35&performers.id=87&postal_code=10014&per_page=10&client_id=API_KEY

The API takes a geolocation parameter, an arbitrary list of performers, and a wide array of filtering parameters.

Check it out and let us know what you think. You can email us at hi@seatgeek.com or post a message in our support forum.

Dec 3rd, 2012

Yak Shaving: Adding OAuth Support to Nginx via Lua

**TL;DR:** We built OAuth2 authentication and authorization layer via nginx middleware using lua. If you intend on performing this, read the docs, automate what you can, and carry rations.

As SeatGeek has grown over the years, we’ve amassed quite a few different administrative interfaces for various tasks. We regularly build new modules to export data for news outlets, our own blog posts, infographics, etc. We also regularly build internal dev tools to handle things such as deployment, operations visualization, event curation etc. In the course of doing that, we’ve also used and created a few different interfaces for authentication:

Github/Google Oauth
Our internal SeatGeek User System
Basic Auth
Hardcoded logins

Obviously, this is subpar. The myriad of authentication systems makes it difficult to abstract features such as access levels and general permissioning for various datastores.

One System to auth them all

We did a bit of research about what sort of setup would solve our problems. This turned up Odin, which works well for authenticating users against Google Apps. Unfortunately, it would require us to use Apache, and we are pretty married to Nginx as a frontend for our backend applications.

As luck would have it, I came across a post by mixlr referencing their usage of Lua at the Nginx level for:

Modifying response headers
Rewriting requests internally
Selectively denying access to hosts based on IP

The last one in that set seemed interesting. Thus began the journey in package management hell.

Building Nginx with Lua Support

Lua support for Nginx is not distributed with the core Nginx source, and as such any testing would require us to build pacakges for both OS X–for testing purposes–and Linux - for deployment.

Custom Nginx for OS X

For Mac OS X, we promote the usage of the Homebrew for package management. Nginx does not come with many modules enabled in the base formula for one very good reason:

The problem is that NGINX has so many options that adding them all to a formula would be batshit insane and adding some of them to a formula opens the door to adding all of them and associated insanity. - Charlie Sharpsteen, @sharpie

So we needed to build our own. Preferably in a manner that would allow further customization in case we need more features in the future. Fortunately, modifying homebrew packages is quite straightforward.

We want to have a workspace for working on the recipe:

cd ~
mkdir -p src
cd src

Next we need the formula itself. You can do one of the following to retrieve it:

Go spelunking in your HOMEBREW_PREFIX directory - usually /usr/local - for the nginx.rb
Have the github url memorized as an api and wget https://raw.github.com/mxcl/homebrew/master/Library/Formula/nginx.rb
Simply output your formula using brew cat nginx > nginx.rb

If we brew install ./nginx.rb, that will install the recipe contained within that file. Since this is a completely custom nginx installation, we’ll want to rename the formula so that future brew upgrade calls do not nix our customizations.

mv nginx.rb nginx-custom.rb
cat nginx-custom.rb | sed 's/class Nginx/class NginxCustom/' >> tmp
rm nginx-custom.rb
mv tmp nginx-custom.rb

We’re now ready to add new modules to our compilation step. Thankfully this is easy, we just need to collect all the custom modules from passed arguments to the brew install command. The following bit of ruby takes care of this:

# Collects arguments from ARGV
def collect_modules regex=nil
    ARGV.select { |arg| arg.match(regex) != nil }.collect { |arg| arg.gsub(regex, '') }
end

# Get nginx modules that are not compiled in by default specified in ARGV
def nginx_modules; collect_modules(/^--include-module-/); end

# Get nginx modules that are available on github specified in ARGV
def add_from_github; collect_modules(/^--add-github-module=/); end

# Get nginx modules from mdounin's hg repository specified in ARGV
def add_from_mdounin; collect_modules(/^--add-mdounin-module=/); end

# Retrieve a repository from github
def fetch_from_github name
    name, repository = name.split('/')
    raise "You must specify a repository name for github modules" if repository.nil?

    puts "- adding #{repository} from github..."
    `git clone -q git://github.com/#{name}/#{repository} modules/#{name}/#{repository}`
    path = Dir.pwd + '/modules/' + name + '/' + repository
end

# Retrieve a tar of a package from mdounin
def fetch_from_mdounin name
    name, hash = name.split('#')
    raise "You must specify a commit sha for mdounin modules" if hash.nil?

    puts "- adding #{name} from mdounin..."
    `mkdir -p modules/mdounin && cd $_ ; curl -s -O http://mdounin.ru/hg/#{name}/archive/#{hash}.tar.gz; tar -zxf #{hash}.tar.gz`
    path = Dir.pwd + '/modules/mdounin/' + name + '-' + hash
end

The above helper methods allow us to specify new modules to include on the command line and retrieve the modules from their respective locations. At this point, we’ll need to modify the nginx-custom.rb recipe to include the flags and retrieve the packages, around line 58:

nginx_modules.each { |name| args << "--with-#{name}"; puts "- adding #{name} module" }
add_from_github.each { |name| args <<  "--add-module=#{fetch_from_github(name)}" }
add_from_mdounin.each { |name| args <<  "--add-module=#{fetch_from_mdounin(name)}" }

At this point, we can compile a custom version of nginx with our own modules.

brew install ./nginx-custom.rb \
    --add-github-module=agentzh/chunkin-nginx-module \
    --include-module-http_gzip_static_module \
    --add-mdounin-module=ngx_http_auth_request_module#a29d74804ff1

We’ve provided this formula as a tap for you convenience at seatgeek/homebrew-formulae.

Custom Nginx for Debian

We typically deploy to some flavor of Debian–usually Ubuntu–for our production servers. As such, it would be nice to simply run dpkg -i nginx-custom to have our customized package installed. The steps to doing so are relatively simple once you’ve gone through them.

Some notes for those researching custom debian/ubuntu packaging:

It is possible to get the debian package source using apt-get source PACKAGE_NAME
Debian package building is generally governed by a rules file, which you’ll need some sed-fu to manipulate
You can update deb dependencies by modifying the control file. Note that there are some meta-dependencies specified herein that you’ll not want to remove, but these are easy to identify.
New releases must always have a section in the changelog, otherwise the package may not be upgraded to because it may have already been installed. You should use tags in the form +tag_name to idenfity changes from the baseline package with your own additions. I also personally append a number - starting from 0 - signifying the release number of the package.
Most of these changes can be automated in some fashion, but it appears as though there are no simple command line tools for creating custom releases of packages. That’s definitely something we’re interested in, so feel free to link to tooling to do so if you know of anything.

While running this process is great, I have built a small bash script that should automate the majority of the process. It is available as a gist on github.

It only took 90 nginx package builds before I realized the process was scriptable.

OAuth ALL the things

Now that it is possible to test and deploy a Lua script embedded within Nginx, we can move on to actually writing some Lua.

The nginx-lua module provides quite a few helper functions and variables for accessing most of Nginx’s abilities, so it is quite possible to force OAuth authentication via the access_by_lua directive provided by the module.

When using the *_by_lua_file directives, nginx must be reloaded for code changes to take effect.

I built a simple OAuth2 provider for SeatGeek in NodeJS. This part is simple, and you can likely find something off the box in your language of choice.

Next, our OAuth API uses JSON for handling token, access level, and re-authentication responses, so we needed to install the lua-cjson module.

# install lua-cjson
if [ ! -d lua-cjson-2.1.0 ]; then
    tar zxf lua-cjson-2.1.0.tar.gz
fi
cd lua-cjson-2.1.0
sed 's/i686/x86_64/' /usr/share/lua/5.1/luarocks/config.lua > /usr/share/lua/5.1/luarocks/config.lua-tmp
rm /usr/share/lua/5.1/luarocks/config.lua
mv /usr/share/lua/5.1/luarocks/config.lua-tmp /usr/share/lua/5.1/luarocks/config.lua
luarocks make

My OAuth provider uses the query-string for sending error messages on authentication, so I needed to support that in my Lua script:

local args = ngx.req.get_uri_args()
if args.error and args.error == "access_denied" then
    ngx.status = ngx.HTTP_UNAUTHORIZED
    ngx.say("{\"status\": 401, \"message\": \""..args.error_description.."\"}")
    return ngx.exit(ngx.HTTP_OK)
end

Now that we’ve handled our base error case, we’ll set a cookie for the access token. In my case, the cookie expires before the access token actually expires so that I can use the cookie to renew my access token.

local access_token = ngx.var.cookie_SGAccessToken
if access_token then
    ngx.header["Set-Cookie"] = "SGAccessToken="..access_token.."; path=/;Max-Age=3000"
end

At this point, we’ve handled error responses from the api, and stored the access_token away for later retrieval. We now need to ensure the oauth process actually kicks off. In this block, we’ll want to:

Start the oauth process if there is no access_token stored and we are not in the middle of it
Retrieve the user access_token from the oauth api if the oauth access code is present in the query string arguments
Deny users with invalid access codes

Reading the docs on available nginx-lua functions and variables can clear up some issues, and perhaps show you various ways in which you can access certain request/response information

At this point we need to retrieve data from our api to retrieve an access token. Nginx-lua provides the ngx.location.capture method, which can be used to retrieve the response from any internal endpoint within redis. This means we cannot call something like [https://seatgeek.com/ncaa-football-tickets?oq=ncaa+football+tickets(https://seatgeek.com/ncaa-football-tickets) directly, but would need to use proxy_pass in order to wrap the external url in an internal endpoint.

My convention for these endpoints is to prefix them with an _ (underscore), and normally blocked against direct access.

-- first lets check for a code where we retrieve
-- credentials from the api
if not access_token or args.code then
    if args.code then
        -- internal-oauth:1337/access_token
        local res = ngx.location.capture("/_access_token?client_id="..app_id.."&client_secret="..app_secret.."&code="..args.code)

        -- kill all invalid responses immediately
        if res.status ~= 200 then
            ngx.status = res.status
            ngx.say(res.body)
            ngx.exit(ngx.HTTP_OK)
        end

        -- decode the token
        local text = res.body
        local json = cjson.decode(text)
        access_token = json.access_token
    end

    -- both the cookie and proxy_pass token retrieval failed
    if not access_token then
        -- Track the endpoint they wanted access to so we can transparently redirect them back
        ngx.header["Set-Cookie"] = "SGRedirectBack="..nginx_uri.."; path=/;Max-Age=120"

        -- Redirect to the /oauth endpoint, request access to ALL scopes
        return ngx.redirect("internal-oauth:1337/oauth?client_id="..app_id.."&scope=all")
    end
end

At this point in the Lua script, you should have a - hopefully! - valid access_token. We can use this against your whatever endpoint you have setup to provide user information. In my endpoint, I respond with a 401 status code if the user has zero access, 403 if their token is expired, and access_level information via a simple integer in the json response.

-- ensure we have a user with the proper access app-level
-- internal-oauth:1337/accessible
local res = ngx.location.capture("/_user", {args = { access_token = access_token } } )
if res.status ~= 200 then
    -- delete their bad token
    ngx.header["Set-Cookie"] = "SGAccessToken=deleted; path=/; Expires=Thu, 01-Jan-1970 00:00:01 GMT"

    -- Redirect 403 forbidden back to the oauth endpoint, as their stored token was somehow bad
    if res.status == 403 then
        return ngx.redirect("https://seatgeek.com/oauth?client_id="..app_id.."&scope=all")
    end

    -- Disallow access
    ngx.status = res.status
    ngx.say("{"status": 503, "message": "Error accessing api/me for credentials"}")
    return ngx.exit(ngx.HTTP_OK)
end

Now that we’ve verified that the user is indeed authenticated and has some level of access, we can check their access level against whatever we define is the access level for the current endpoint. I personally delete the SGAccessToken at this step so that the user has the ability to log into a different user, but that is up to you.

local json = cjson.decode(res.body)
-- Ensure we have the minimum for access_level to this resource
if json.access_level < 255 then
    -- Expire their stored token
    ngx.header["Set-Cookie"] = "SGAccessToken=deleted; path=/; Expires=Thu, 01-Jan-1970 00:00:01 GMT"

    -- Disallow access
    ngx.status = ngx.HTTP_UNAUTHORIZED
    ngx.say("{\"status\": 403, \"message\": \"USER_ID"..json.user_id.." has no access to this resource\"}")
    return ngx.exit(ngx.HTTP_OK)
end

-- Store the access_token within a cookie
ngx.header["Set-Cookie"] = "SGAccessToken="..access_token.."; path=/;Max-Age=3000"

-- Support redirection back to your request if necessary
local redirect_back = ngx.var.cookie_SGRedirectBack
if redirect_back then
    ngx.header["Set-Cookie"] = "SGRedirectBack=deleted; path=/; Expires=Thu, 01-Jan-1970 00:00:01 GMT"
    return ngx.redirect(redirect_back)
end

Now we just need to tell our current app who is logged in via some headers. You can reuse REMOTE_USER if you have some requirement that this replace basic auth, but otherwise anything is fair game.

-- Set some headers for use within the protected endpoint
ngx.req.set_header("X-USER-ACCESS-LEVEL", json.access_level)
ngx.req.set_header("X-USER-EMAIL", json.email)

I can now access these http headers like any others within my applications, replacing hundreds of lines of code and hours of work reimplementing authentication yet again.

Nginx and Lua, sitting in a tree

At this point, we should have a working lua script that we can use to block/deny access. We can place this into a file on disk and then use access_by_lua_file to use it within our nginx site. At SeatGeek, we use Chef to template out config files, though you can use Puppet, Fabric, or whatever else you’d like to do so.

Below is the simplest nginx site you can use to get this entire thing running. You’ll also want to check out the access.lua - available here - which is the compilation of the above lua script.

# The app we are proxying to
upstream production-app {
  server localhost:8080;
}

# The internal oauth provider
upstream internal-oauth {
  server localhost:1337;
}

server {
  listen       80;
  server_name  private.example.com;
  root         /apps;
  charset      utf-8;

  # This will run for everything but subrequests
  access_by_lua_file "/etc/nginx/access.lua";

  # Used in a subrequest
  location /_access_token { proxy_pass http://internal-oauth/oauth/access_token; }
  location /_user { proxy_pass http://internal-oauth/user; }

  location / {
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header  Host $http_host;
    proxy_redirect    off;
    proxy_max_temp_file_size 0;

    if (!-f $request_filename) {
      proxy_pass http://production-app;
      break;
    }
  }

}

Further Considerations

While this setup has worked really well for us, I’d like to point out some shortcomings:

The above code is a simplification of our access_by_lua script. We also handle POST request saving, inject JS into pages to renew the session automatically, handle token renewal etc. You may not need these features, and in fact, I didn’t think I’d need them until we started testing this system on our internal systems.
We had some endpoints which were available via basic auth for certain background tasks. These had to be reworked so that the data was retrieved from an external store, such as S3. Be aware that this may not always be possible, so oauth may not be the answer in your case.
Oauth2 was simply the standard I chose. In theory, you could use Facebook Auth to achieve similar results. You may also combine this approach with rate-limiting, or storing various access levels in a datastore such as redis for easy manipulation and retrieval within your Lua script. If you were really bored, you could reimplement Basic Auth within Lua, it’s just up to you.
There are no test harnesses for systems such as these. Test-junkies will cringe when they realize it’s going to be integration testing for a while. You can likely rerun the above by injecting variable mocks into the global scope and then executing scripts, but it’s not the ideal setup.
You still need to modify apps to recognize your new access headers. Internal tools will be easiest, but you may need to make certain concessions for vendor software.

tl;dr with links

The above blog post combined nginx-lua and our internal oauth provider to enable using OAuth for access control to our infrastructure.

Also

SeatGeek is hiring UI Developers and Web Engineers. Your first tasks will be to make my OAuth application pretty and write some tests for a bit of Lua code… (kidding)

If you have any questions about the project just let us know in the comments!

Oct 16th, 2012

Putting Venue Maps in a Terminal: Introducing SGCLI

seatgeek open sourced seatgeek/sgcli

A command-line interface for SeatGeek

At SeatGeek we have regularly scheduled Hackathons–opportunities for all of us to drop what we’re doing for two days and work on whatever interesting, creative, or experimental projects we dream up. We had a Hackathon last week, and I decided to write a command-line client for SeatGeek, which I called SGCLI.

Back in 2005 I spent a couple of months using a box running FreeBSD without X as my main computer. I’m not sure exactly what the thought process was that led to that setup, but using it taught me that in some instances command-line applications can be much faster to work with than their graphical equivalents. With that in mind, I set out to try and replicate (and if possible, improve) the experience of searching for and buying tickets on SeatGeek.

The first step to building the project was to learn some curses. Luckily, there’s a great tutorial on curses programming with Python on python.org. It didn’t take long to have a basic application running (complete with an ASCII-art version of the SeatGeek logo):

SGCLI Welcome Screen

The rest of the first day was spent building out the main parts of the SeatGeek experience: the search page, selecting an event, browsing ticket listings and viewing an individual listing. There were some intense moments, but by the time I quit for the night (probably around 11:30 or so - and I was early!) it seemed like at least the basics would be working in time for Friday’s demo.

On Friday I focused mostly on the features that I thought would give some added spark to the demo, namely rendering SeatGeek’s beautiful maps into slightly-less-beautiful ASCII art, and meme integration. Rendering the maps proved to be a bit tricky, but in the end it worked out pretty well:

Yankee Stadium in SGCLI

Map rendering works using the map infrastructure we built for the SeatGeek mobile website. That allows SGCLI to get a single .png for an event that represents the map. Then, SGCLI uses PIL to scale the map down such that the size in pixels is the same as the size in characters of the final map we want to render (that final size depends on the size of the terminal window). An important note here is that the aspect ratio has to change during this operation: almost all fonts are much taller than they are wide and we need to correct for that. So, if we decide that our final map should be 20 characters high and 36 characters wide, we use PIL to scale the 320x320 .png down to 36x20.

After scaling down we use PIL to create two copies of the same image. One of the copies gets converted to grayscale, and the other copy gets “posterized” down to a single bit per color channel (that matches the color options I have available in my terminal). The final step is to actually output the ASCII art. For each pixel we use the grayscale image to select a character to display based on luminosity (darker pixels get characters like #, while brighter pixels become characters like . or -). We use the posterized image to figure out what color to draw the character in (out of the 7 options available to us). After rendering the map using curses, we draw a marker over the top to indicate where the selected tickets are located.

This method certainly could be smarter (for example, it doesn’t take into account the ‘shape’ of a character when deciding what character to use), but it worked pretty well and was easy to implement. Check it out in the source code for SGCLI to see it in all of its glory.

Check out the final project–it’s open source and easy to install and run. Just do:

pip install sgcli # or easy_install sgcli
sgcli

There are plenty of neat things that I didn’t mention here. Be sure to check out the support for autocompletion while searching, and make sure to hit “b” on a ticket listings page (even if you don’t follow through and buy the tickets) - you’ll get a nice surprise when you return to the app. There are some notes on keyboard shortcuts in the README.

If you have any questions about the project just let us know in the comments!

Aug 10th, 2012

Introducing Absolute Deal Score

Deal Score

Among all of SeatGeek’s features, we have always been proudest of Deal Score, our metric that enables users to seamlessly pick out the best deal for an event from within thousands of ticket listings. But today that feature is getting a whole lot cooler. We’re launching Absolute Deal Score, an upgrade that has been under development for many months here at SeatGeek.

For the uninitiated, the premise behind Deal Score is simple: Deal Score is a rating of whether a ticket is a bargain or a rip-off, which facilitates apples-to-apples comparisons among ticket listings. A metric like Deal Score is particularly useful for live event tickets because every seat in a venue is different. If you’re shopping online for batteries, you can sort your options by price and expect the cheapest options will include the good bargains. But if you’re shopping for Yankees tickets and you sort by price, the cheapest options tend to be a bunch of nosebleed seats. Deal Score offers a better way to identify good buys.

As initially designed, Deal Score compared the relative value of a ticket listing to that of all others listed in the venue for that one given event. The worst deal for every event was given a Deal Score of 0, the best deal was given a Score of 100, and everything else was filled in between those two numbers. But as we thought about how best to surface values in the ticket marketplace, we came to the realization that anchoring value against just a single game wasn’t going far enough.

Today, we’re excited to announce a major overhaul of our Deal Score algorithm–one that not only identifies the best ticket deals within a given event, but also how those individual deals comparatively stack up against all tickets for similar events (for example, how a listing for a single Yankees ticket stacks up against those for all other Yankees games this season). Listings are no longer anchored at 0 and 100 for every event but, rather, 0 is anchored to the absolute worst deal for all events on SeatGeek, and 100 is anchored to the best deal among all events.

Deal Score 2

SeatGeek’s head of R&D, Steve Ritter, spoke in great detail about the math and approach behind Deal Score in a two-part blogpost series back in May but, in brief, the algorithm assesses the listed price of a ticket against our estimated market value of that ticket (based on historical prices, row/section position and other factors). As we thought about ways to build on the solid foundation of Deal Score, we realized that our Deal Score methodology could be applied across a broad series of similar events, such as a full season of NBA games or a multi-month run of a Broadway show. We’ve been testing this live on SeatGeek for the past few weeks, so you may have already noticed some changes. As a user, this update has a some meaningful benefits:

Deal Scores are now comparable across all events, not just against ticket listings within a single game or concert, as was the case previously. For example, if you compare a 93 deal score for a November 2012 Knicks game against a 85 deal score for a different game at MSG 3 months later, you’ll know that the November ticket is without question a better value–a distinction that couldn’t previously be made.
Absolute Deal Score still surfaces the best ticket deals within a single event, just as the previous iteration of Deal Score. The only difference is that scores aren’t anchored to a relative distribution as before. The top 10 deals you see on any event page are still the best ticket deals available that evening, just as was the case previously.
You may see fewer “100” deal scores on event pages, but the highest Deal Scores now truly represent exceptional values for that event type. This doesn’t mean that ticket labeled with even a 50 or 60 Deal Score is a poor value–indeed, any ticket with a DS above 50 ranks in the upper 20% of all deals on site. But we felt it important to define far better gradations between above average deals and purely outstanding deals, and to do so across the widest body of comparable ticket listings.

We’re excited about this update and how it will change the experience of shopping on SeatGeek. We’d love to hear what you think! As always, drop us a line at hi@seatgeek.com.

May 9th, 2012

Using a Kalman Filter to Predict Ticket Prices

Welcome back! In case you missed part one of this series, we’re opening up the hood on Deal Score, one of SeatGeek’s most popular features.

In part one we gave a brief overview of why we sort ticket listings by Deal Score rather than by price. We gave you our two main assumptions:

Seat quality, within a given venue, has a consistent ordering.
The relationship of seat price to seat quality follows a similar pattern across all events at a given venue.

Court

In the last post we discussed how we take advantage of the first assumption. Today, we’ll explain the second assumption and how it leads to the accurate valuations of thousands of tickets on a daily basis.

This second assumption means that all we need to do in order to predict prices is find a function, unique to each event, that maps our seat score vector to a vector of expected prices. Since we’re working with limited data here and surely do not have enough data to induce the structure of this curve for each event, we make a further simplifying assumption: for each venue, the curve will look similar for every event at that venue, whether that curve is a straight line, polynomial, or otherwise.

This does not mean that we can assume that a premium seat will carry the same relative premium over a nosebleed for each game. For a game that will not sell out, for example, the value of a nosebleed should be negative–this ticket will not sell, and given the additional expenses involved in attending an event (gas, parking, food, etc.), someone would have to be paid in order to sit in that seat. In such a situation, box seats are infinitely more valuable than bleachers. For a premium game, on the other hand, such as a playoff game, opening day, or any game at Fenway Park, all seats will sell out, as there is significant value just getting through the gates to see the event. Perhaps the box seat is only worth two or three times as much as the nosebleed in these cases.

At this point, we break out a terrific tool for processing small amounts of noisy data, the Kalman filter. Heavily used in the guidance and control of spacecraft and aircraft as well as with time-series data in economic and financial spheres, the Kalman filter is an algorithm that uses state estimates of model parameters combined with estimates of their variance to make predictions about the output of a linear dynamic system. I’ll spare you the obligatory rocket science joke and jump straight into a tutorial on how to use a Kalman filter to make predictions in the face of noisy and limited ticket data.

Since every observed price is going to be the output of a noisy system at a point in time, we are most interested in the likely state of the system as of the last observation, making a recursive estimator such as the Kalman filter an excellent choice.

Step one: model the system

In SeatGeek’s case, our assumption is that the underlying structure of price dynamics is similar across events, and we can therefore generate a curve mapping seat quality to expected prices using the same parameters on the curve each time. As an example, here is a plot with seat scores for Knicks games at Madison Square Garden on the x-axis and historical average sale price for those seats on the y-axis.

Graph

I’ve added a best-fit line, which shows that sale prices tend to grow exponentially with seat quality at this particular venue. As such, we will model our price predictions as log-linear with respect to seat quality. We’re about to do a lot of math here, so feel free to skip ahead.

The Kalman filter maintains the state of the filter at step k with two variables:

$\mathbf{\hat{x}_{k}}$
: the parameters of the model given observations up to and including step k
$\mathbf{P}_{k}$
: the covariance matrix of parameter errors, a measure of the confidence the model has in its parameters

In our simple case, $\mathbf{\hat{x}}$ represents the intercept $\hat{x_1}$ and slope $\hat{x_2}$ of our line. $\mathbf{P}$ represents the covariance of parameter errors. This covariance matrix will be used down the line to determine which parameters must change when we make a new observation. It will also determine the magnitude of the adjustment.

A general Kalman filter uses a state-transition matrix $\mathbf{F}$ in order to advance from one observation to predicting the value of the next.

$\mathbf{x}_{k|k-1} = \mathbf{Fx}_{k-1|k-1} + \mathbf{w}_k$

Where: $\mathbf{x}_{k\|k-1}$ is our best estimate of $\mathbf{x}_k$ given observations up to and including $k-1$ . $\mathbf{w}_{k}$ is white noise of the same dimension as the model, drawn from a multivariate normal distribution having covariance $\mathbf{Q}$ , representing the process error. Many applications of the filter model physical systems that take velocity and acceleration into account (and behind the scenes, so does SeatGeek). In these cases, $\mathbf{F}$ can include time-varying parameters, but in our simple example, we set:

$\mathbf{F} = \begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix}$

Assuming that in between observations the underlying model dynamics do not change according to any known physical system, we use the identity matrix.¹

Step two: model the output

Sharp eyes may have noticed that the preceding equation does not use our lovely seat scores quite yet. The reason is our observations do not come in the form of linear models, but rather in observed fair values for seats, i.e. when users express an intent to buy. We have to model our output as $\mathbf{z}_k = \mathbf{H}_k \mathbf{x}_k + \mathbf{v}_k$ , where $\mathbf{H}_k$ is a 1x2 matrix $\begin{bmatrix} 1 & \theta_k \end{bmatrix}$ , theta being the rating of the seat in question. Keeping with Kalman filter assumptions we model our residuals, the difference between our observations and predictions, as Gaussian white noise.² In the single-output case, the observation noise can be thought of as the square of our standard estimation error, or how far we allow our predictions to be off before the model updates itself. This variance, $\mathbf{R}$ , will be used later on when we update the model.

Step three: predict

Now that we have a model of our system, we can start making predictions. Using historical data, we can generate $\mathbf{x}_0$ , our default parameters, and start predicting prices. Our prediction, of course, is that our observations will lie on the line defined by $\mathbf{x}_0$ , shown in the image above. In this case, $\mathbf{x}_0 = \begin{bmatrix} 7.2356 \ , \ .1428 \end{bmatrix}$ .

We add those parameters to our listing feed, determine seat quality from the data provided to us by the market in question, and predict a price for each listing. We compare the predicted price to the listed price, assign a Deal Score to each listing, and sort your search results accordingly. We live to fight another day.

Step four: observe

Market dynamics, however, are not so kind as to stay constant, and our models, alas, are unable to perfectly predict every price from the outset. The Kalman filter is thus useful for responding to changing tides. Since observations of changing tides can be few and far between and must inform our predictions on all other tickets, it behooves us to have a degree of certainty about our model, which we represent by $\mathbf{P}$ , the 2x2 covariance matrix of our state estimate errors. A good design decision is to start off $\mathbf{P}_0$ with large numbers on the diagonals and zeroes elsewhere, assuming low certainty of model parameters and independence.² The filtering process should be able to give you excellent color on their true relationship.

Many of the signals discussed in part one can be interpreted, directly or indirectly, as a fair ticket price, and when we observe a new price, $\mathbf{z}_k$ , we model the residual as $y_k = \mathbf{z}_k - \mathbf{H}_k \mathbf{\hat{x}}_{k\|k-1}$ where $\mathbf{H}_k \mathbf{\hat{x}}_{k\|k-1}$ represents our predicted price for that seat. In the Kalman filter, the residual variance (variance of $y_k$ ) is modeled as $\mathbf{S}_k = \mathbf{H}_k \mathbf{P}_{k\|k-1} \mathbf{H}_{k}^{\mathbf{T}} + \mathbf{R}$ . In the general case, these are covariance matrix. Since our model outputs only one value, a predicted price, $\mathbf{S}$ and $\mathbf{R}$ are variances. $\mathbf{R}$ we have seen before, this is the general model of our error variance. $\mathbf{S}_k$ is the variance of this particular observation, which varies depending on the seat score of this particular observation (see dotted lines in the slideshow below). Variance is higher on expensive tickets than it is on cheaper tickets. We will use the residual and its variance in the next step to order to update our parameters.

Step five: update

We now come to the key element of the Kalman filter, the gain. The gain takes our a priori estimate covariance $\mathbf{P}_{k\|k-1}$ , our observation model $\mathbf{H}_k$ , and our residual variance $\mathbf{S}$ in order to decide how much we should change our model parameters before the next prediction. Our optimal gain is $\mathbf{K}_k = \mathbf{P}_{k\|k-1} \mathbf{H}_{k}^{\mathbf{T}} \mathbf{S}_{k}^{-1}$ . Kalman gain is a bit of a tricky nut to crack. If you think of the new parameters $\mathbf{x}_{k+1}$ as a weighted average of the old estimate $\mathbf{x}_k$ and the new observation, $\mathbf{K}$ provides the optimal weight for the observed residual in the new average.

If you’ve been following along, you can see that the larger our a priori uncertainty, the higher this gain factor gets. Now that we have a gain factor, we can start making some updates.

For the model parameters, this is easy. We simply scale the Kalman gain by the measurement residual, yielding us a new estimate. You can see here that if we guessed the price exactly, the slope and intercept do not change (a good sanity check) and that if we were fairly sure about the estimate beforehand, it requires a major miss before we update it substantially:

$\mathbf{\hat{x}}_{k|k} = \mathbf{\hat{x}}_{k|k-1} + \mathbf{K}_{k} y_k$

Similarly, we also update our error covariance matrix,

$\mathbf{P}_{k|k} = ( I - \mathbf{K}_k \mathbf{H}_k ) \mathbf{P}_{k|k-1}$

The error covariance update is a bit of a headache as well; the easiest way to think about it is to remember that $\mathbf{P}$ represents the covariance of our parameter estimation errors $\mathbf{x}_k - \mathbf{\hat{x}_{k\|k-1} }$ and to play around with what makes the changes in $\mathbf{P}$ large or small. For example, if the observation noise, $\mathbf{S}_k$ is very large, our gain will be small and our certainty will remain mostly unchanged.

Now that we understand how the filter works, let’s rejoin our original programming and see it in action!

Putting it all together

The slideshow below takes you on a visual tour through several steps of the dynamic linear model. In all slides, the dark red line represents our estimate of the mapping from seat quality to expected price and the and the dashed lines represent our 95% confidence interval for the price.

Before concluding, I’d like to note that a major motivation behind this series was the lack of real-world Kalman filter examples out here on the internet, which is disappointing given its usefulness as an estimator, especially for low-dimensional time-variant systems with small data. Here are some of the better articles I’ve found:

Giovanni Petris’ monograph on dynamic linear models in R
his excellent description of dlm, his R package for dynamic linear modeling
a wonderful cheat sheet on the dimensionality of higher order Kalman filters
an introduction to Kalman filters for predicting basketball game scores, written by the incomparable Dean Oliver

I gladly welcome thoughts on our usage of the filter or critiques of my explanation from those who have a better handle on things. Leave a comment or find me on twitter @steve_rit.

Notes

1: If we wanted to add time-variance to our parameters, we could use something like: $\mathbf{F} = \begin{bmatrix} 1 & 0 & \Delta t \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}$
2: To cut down on the amount of notation, I’ve removed some symbols representing noise that aren’t directly used in the predict-update process.

Apr 26th, 2012

The Math Behind Ticket Bargains

Greetings from SeatGeek Research & Development!

I’m here today to take you behind the curtain of one of SeatGeek’s major features, Deal Score. For the uninitiated, Deal Score is a 0-to-100 rating that reveals whether a ticket is a great bargain or a major rip-off. We humbly believe it’s the best way to find tickets. I’d like to quickly tell you why and then spend most of this post discussing some of the math behind Deal Score’s calculation. This is the first in a series of two blog posts, the second coming soon.

Sorting vs. Searching

Why have Deal Score? The standard across ticket sites is, of course, sorting by price. On most ticket sites, a prospective buyer can select sections they want to sit in, filter tickets by price range, and spend a solid chunk of their day trying to figure out the best seats for the money. On most aggregators, listings from several ticketing websites are lumped together… and then sorted by price, whereupon the experience repeats itself with the added pleasure of more noisy data.

SeatGeek, however, is more than an aggregator, we’re a search engine. Using Deal Score, we sort tickets by value rather than price. As a quick example, let’s try to find some tickets for the Red Sox-Indians game May 12th at Fenway Park. If I sort the tickets by price, I need to wade through dozens of cheap listings for standing room only tickets and obstructed view seats. Cheap for sure, but anybody who’s been to Fenway Park can tell you there are some places you just don’t want to sit. I need to be vigilant in order to notice a listing for two tickets in the grandstand behind home plate for $53, the same price level as a listing in the back of the bleachers and in two neck-straining outfield grandstand seats.

SeatGeek Event Page

How good of a deal is this? Sorting by price these three listings look the same, but behind the scenes SeatGeek’s proprietary price prediction has pegged these bleacher seats as being worth $29, the outfield grandstand seats at $34, and the infield seats at $69. Deal Score compares every ticket’s expected price to its listed price and takes the mental leg work out of ticket shopping.

The basic principle behind Deal Score is simple and intuitive: by searching rather than sorting, we can intelligently filter secondary market ticket listings, saving consumers large amounts of time and money.

How does it work?

The most important element of our Deal Score algorithm is to accurately estimate the current market value of a ticket listed on the secondary market. Most marketplaces have large amounts of transactional data on their products, often with supply and demand-side pricing signals. SeatGeek is in the undesirable position of trying to predict, on a daily basis, the price of millions of event tickets that have, by definition, never sold. Each seat at every event is a unique product; while its eventual price is informed by many other signals, the secondary market is both opaque and noisy.

Given our data constraints and the precision necessary, we made two assumptions about seats:

Seat quality, within a given venue, has a consistent ordering. This means that for any given Red Sox game, we expect that Infield Grandstand 18, Row 12 is a better place to sit than Center Field Bleachers 37, Row 37.
The relationship of seat price to seat quality follows a similar pattern across all events at a given venue. This means that a curve plotting sale price against seat quality for a weekend Red Sox-Yankees game at Fenway Park should look similar to a curve for a midweek Red Sox-Royals game, even though the market dynamics would be quite different.¹

The first assumption allows us to use signals from many contexts to inform our predictions. The second assumption allows us to make confident predictions about prices after seeing as few as five or ten prices for each event.  In today’s installment, I’m going to show you the math we use to derive a key metric called “Seat Rank,” the ordinal quality rank of all seats within a venue.

Seat Rank

In order to make the most of our first assumption, we determine the intrinsic “seat quality” of each seat relative to all others. Teams and promoters deal with this every day; they have to set face values for tens of thousands of seats in a stadium, but they have the advantage of only needing to compute a few dozen price levels, at most. In contrast, secondary markets have row-level pricing granularity, and thus require us to understand how much each row is going to sell for on the open market. Fenway Park, for example, has 4,022 distinct section/row pairs, and we must understand how they all rank on a relative basis. Using a little bit of cleverness along with vector coordinate data from SeatGeek’s venue maps, we reduce the problem slightly: we divide each venue into clusters of seats (we call them “seat groups”) whose physical locations and sale prices tend to be close enough to each other that they can be modeled together. These seat groups allow us to make use of less data to predict more prices.   Some venues have as few as twenty groups; others, well into the thousands. Fenway Park has 993.

To understand Seat Scores, consider a simple example where the set of listings $\mathcal{S}$ consists of three seats indexed by $i$ :

$s_i \in \mathcal{S} = \{s_1,s_2,s_3\}$

Suppose these seats are equally priced, despite the fact that their quality $\theta_{i}$ varies. In fact, $s_1$ is twice as good as $s_2$ , which is twice as good as $s_3$ . Without loss of generality, we arbitrarily set $\theta_{1} = 1$ and can define a vector $\Theta$ of relative seat qualities:

$\Theta \qquad = \qquad \left[ \begin{array}{c} \theta_1 \\ \theta_2 \\ \theta_3 \end{array} \right] \qquad = \qquad \left[ \begin{array}{c} 1 \\ \tfrac{1}{2} \\ \tfrac{1}{4} \end{array} \right]$

Unfortunately, while SeatGeek has a lot of data, we cannot directly observe the relative true quality $\Theta$ of these seats. However, we use a group of different signals, including clicks on “buy” buttons and the physical location of a seat within a venue, to arrive at an estimated quality $\hat{\Theta}$ . One of these signals is pairwise comparison. Shoppers constantly make pairwise comparisons among seats. We use this tendency to our advantage. In particular, we obtain our estimate of $\Theta$ by assuming that users’ historical choices are proportional to the true relative quality of seats, revealing information about the true $\Theta$ . For simplicity’s sake, assume that:

$\Pr(\text{user chooses $s_i$ over $s_j$}) \propto \frac{\theta_i}{\theta_i + \theta_j} \forall s_i \neq s_j \in \mathcal{S}$

For example, when faced with a choice between $s_1$ and $s_3$ , users will pick $s_1$ with probability $= \tfrac{1}{1+ 1/4} = 80\%$ . In reality, the data will be much noisier. Each data point is a random realization of their perception of relative seat values. Some pick the first listing they see, others have disparate opinions about what makes for a quality seat, etc.²

Continuing with the Fenway Park example, after processing our input signals, we have a square matrix $\mathbf{R}$ where each cell represents the processed results of pairwise comparisons between seat groups. In this matrix, $\mathbf{R}$ , we define each cell $r_{i,j}$ as the observed relative quality $s_i$ as compared to $s_j$ .

The rough values for $\mathbf{R}$ are fairly noisy, as shown in the matrix below. The matrix below is sorted left-to-right, top-to-bottom by the raw “winning percentage” of each seat in pairwise comparisons. Each cell represents, roughly, the fraction of the time that a user clicked on the seat in the row (y-axis) when the seat in the column (x-axis) was available at an equal or lesser price. A row with mostly red is a seat that “wins” many comparisons, a row with mostly green tends to lose.

maths

The initial $\Theta$ ’s implied by these raw winning percentages are a good start, but these data are far too noisy to be used as reliable estimates. This is a visual representation of what Fenway Park looks like with these raw seat scores:

more maths

To estimate $\hat{\Theta}$ in the presence of noisy data, we use a method called maximum likelihood estimation, which iterates over candidate values for $\hat{\Theta}$ to maximize the probability of observing the real data. We start with rough parameter values, $\hat{\Theta}$ and follow the steps: (1) calculate the probability of observing the data conditional on these values³:

$L \ ( \hat{\Theta} \ | \ \mathbf{R}) = \prod_{i, j} \left ( \frac{\hat{\theta}_{i}}{\hat{\theta}_{i} + \hat{\theta}_{j} } \right )^{r_{i,j}}$

(2) adjusting the parameter values to increase this likelihood

Watch below as the seat scores converge from our initial values to the maximum likelihood (use the controls below to navigate):

Presto! Once we’re finished, we end up with something that looks very similar Fenway’s actual seating chart, only with much more granular distinctions on price levels. With these seatscores, we would expect $\mathbf{R}$ to look like this filled-in matrix instead of the noisy, sparse mess from above.

even moar maths

With these powerful seat scores in hand, we’re halfway to our goal of predicting accurate prices for live events at any venue in the country. Come back for our next post to see how we go from our seat scores to market value predictions for thousands of events every day. UPDATE: View part 2: Using a Kalman Filter to Predict Ticket Prices

Credits

In case you’re wondering what technology we use for these projects, here’s a sampling:

pandas: a python data analysis library, for signal processing
R: for statistical analysis and postprocessing
ggplot2: to make the heatmaps seen above

Notes

1: If you read this far and wondered whether we were ever going to get around to this, then you’ll want to come back for part 2, when we explain how price predictions are derived from these seat scores.
2: Fenway park is actually a good example of this phenomenon, Green Monster seats in particular are heavily disagreed upon by our signals.
3: $r_{ij} = 0$ whenever $i = j$ , so we need not exclude these cases.

Mar 13th, 2012

The SeatGeek Platform

Over the past two and a half years, we’ve poured countless time into building a canonical database of live events in the US. Not only have we cataloged when and where each event is happening, but we also built a system that attaches copious metadata to each event–e.g., the latitude/longitude of the venue, the number of tickets currently listed, etc. Thus far, that database has been used exclusively to power the pages on SeatGeek.com. But we musn’t be selfish! Thus, we recently announced The SeatGeek Platform. Developers can use the Platform to add live event info to existing apps or as a foundation for entirely new apps that deal with live events.

The SeatGeek Platform is composed of our event, performer, and venue data, a REST API, our Partner Program, and a developer support community. The API exposes a mother lode of live event info—nearly all of the data you see available on SeatGeek.com, plus a lot more. Full documentation is here. The Partner Program gives Platform users an easy way to monetize. Anyone who signs up earns a 50/50 rev share whenever a user buys tickets using one of their links. For current partners, that has worked out to about $11 every time one of your users buys a ticket. A few of us on the SeatGeek dev team are closely tracking posts on the support forum, so if you have any questions about the API, just post there and you will get a prompt, thorough response.

How might someone use this thing? A quick example: Let’s say that Sarah runs a site for her indie record label, SBeats, which gets a lot of traffic from fans. Since the record business isn’t massively lucrative these days (shocking, I know!) Sarah is looking for new ways to monetize. She’d also like to add a bit more content to her label’s site. She uses the SeatGeek API to pull in data about which of her artists are touring. She displays that info in a module on each artist’s page. To give users a bit of context, she pulls the “low price” field from the API to show the cost of the cheapest ticket for each show. Whenever a user clicks on a link for a show and buys a ticket, Sarah earns $11, on average.

We’re pumped about this launch. For the first time, we’re exposing our data to developers everywhere. I can’t wait to see what people build.

Dec 25th, 2011

Removing Price Forecasts

Screenshot of our initial homepage

When we launched SeatGeek back in the fall of 2009, we positioned ourselves as a site that forecasts how ticket prices move on the secondary market. That was our “one thing”: forecasts. Russ had spent months building scrapers to collect ticket data. I’d spent months messing with that data in STATA, building models that could accurately forecast prices. We figured we could get traction by helping consumers time their ticket purchases optimally.

Much has changed. The past two years have been all about expanding the vision (trite but true) of what SeatGeek can be. Four months after we launched, we moved from being a price forecaster to a price forecaster that also had pretty good ticket search. The ticket search was getting a better response from users than the forecasts, so we continued to focus on that. Five months after that, we launched our own interactive mapping platform. That really opened the doors on how we could approach ticket search. We created something called Deal Score, which allowed us to use a lot of the data and analytical tools we’d built for forecasts, and that got a great response. In 2011, we added dozens more ticket sellers to our search results. Near the end of the year, we’ve begun to make the leap into being a full-fledged one-stop site for live entertainment.

Through all of this, the forecasting feature has been lost in the shuffle. We’ve continued to support it, but it became a hassle rather than a cause for excitement. Most users stopped paying attention to it; the site had other things that were more compelling. We stopped mentioning it when we described what SeatGeek does. Forecasts ceased to be relevant to our core mission–making ticket buying elegant and simple. Optimally timing a ticket purchase is a complicated, tricky value prop and holds us back in our pursuit of removing complication from ticket buying.

Thus, within the next week or two, we’ll be removing price forecasts from our site. We want to avoid feature bloat. We need to maintain clarity of purpose, for both our users and ourselves. Drop us a line at hi@seatgeek.com if you think this is a terrible idea and if we get enough emails we’ll reconsider. Otherwise, we will begin 2012 without forecasts, which will help us focus on the things that matter most as we try to upend the way people attend live events.

← Older Blog Archives Newer →

Jobs at SeatGeek

Explore feature helps you discover new shows in your area

View-from-seat images for 100+ venues

Direct link to primary box office page

How to download the SeatGeek iOS app

How Most Music Recommendation APIs Work

A Motivating Example

How the SeatGeek Recommendation API Works

Pictures!

Using Our API

One System to auth them all

Building Nginx with Lua Support

Custom Nginx for OS X

Custom Nginx for Debian

OAuth ALL the things

Nginx and Lua, sitting in a tree

Further Considerations

tl;dr with links

Links

Further Reading

Also

Step one: model the system

Step two: model the output

Step three: predict

Step four: observe

Step five: update

Putting it all together

Notes

Sorting vs. Searching

How does it work?

Seat Rank

Credits

Notes