Greetings from SeatGeek Research & Development!
I’m here today to take you behind the curtain of one of SeatGeek’s major features, Deal Score. For the uninitiated, Deal Score is a 0-to-100 rating that reveals whether a ticket is a great bargain or a major rip-off. We humbly believe it’s the best way to find tickets. I’d like to quickly tell you why and then spend most of this post discussing some of the math behind Deal Score’s calculation. This is the first in a series of two blog posts, the second coming soon.
Sorting vs. Searching
Why have Deal Score? The standard across ticket sites is, of course, sorting by price. On most ticket sites, a prospective buyer can select sections they want to sit in, filter tickets by price range, and spend a solid chunk of their day trying to figure out the best seats for the money. On most aggregators, listings from several ticketing websites are lumped together… and then sorted by price, whereupon the experience repeats itself with the added pleasure of more noisy data.
SeatGeek, however, is more than an aggregator, we’re a search engine. Using Deal Score, we sort tickets by value rather than price. As a quick example, let’s try to find some tickets for the Red Sox-Indians game May 12th at Fenway Park. If I sort the tickets by price, I need to wade through dozens of cheap listings for standing room only tickets and obstructed view seats. Cheap for sure, but anybody who’s been to Fenway Park can tell you there are some places you just don’t want to sit. I need to be vigilant in order to notice a listing for two tickets in the grandstand behind home plate for $53, the same price level as a listing in the back of the bleachers and in two neck-straining outfield grandstand seats.
How good of a deal is this? Sorting by price these three listings look the same, but behind the scenes SeatGeek’s proprietary price prediction has pegged these bleacher seats as being worth $29, the outfield grandstand seats at $34, and the infield seats at $69. Deal Score compares every ticket’s expected price to its listed price and takes the mental leg work out of ticket shopping.
The basic principle behind Deal Score is simple and intuitive: by searching rather than sorting, we can intelligently filter secondary market ticket listings, saving consumers large amounts of time and money.
How does it work?
The most important element of our Deal Score algorithm is to accurately estimate the current market value of a ticket listed on the secondary market. Most marketplaces have large amounts of transactional data on their products, often with supply and demand-side pricing signals. SeatGeek is in the undesirable position of trying to predict, on a daily basis, the price of millions of event tickets that have, by definition, never sold. Each seat at every event is a unique product; while its eventual price is informed by many other signals, the secondary market is both opaque and noisy.
Given our data constraints and the precision necessary, we made two assumptions about seats:
- Seat quality, within a given venue, has a consistent ordering. This means that for any given Red Sox game, we expect that Infield Grandstand 18, Row 12 is a better place to sit than Center Field Bleachers 37, Row 37.
- The relationship of seat price to seat quality follows a similar pattern across all events at a given venue. This means that a curve plotting sale price against seat quality for a weekend Red Sox-Yankees game at Fenway Park should look similar to a curve for a midweek Red Sox-Royals game, even though the market dynamics would be quite different.1
The first assumption allows us to use signals from many contexts to inform our predictions. The second assumption allows us to make confident predictions about prices after seeing as few as five or ten prices for each event. In today’s installment, I’m going to show you the math we use to derive a key metric called “Seat Rank,” the ordinal quality rank of all seats within a venue.
In order to make the most of our first assumption, we determine the intrinsic “seat quality” of each seat relative to all others. Teams and promoters deal with this every day; they have to set face values for tens of thousands of seats in a stadium, but they have the advantage of only needing to compute a few dozen price levels, at most. In contrast, secondary markets have row-level pricing granularity, and thus require us to understand how much each row is going to sell for on the open market. Fenway Park, for example, has 4,022 distinct section/row pairs, and we must understand how they all rank on a relative basis. Using a little bit of cleverness along with vector coordinate data from SeatGeek’s venue maps, we reduce the problem slightly: we divide each venue into clusters of seats (we call them “seat groups”) whose physical locations and sale prices tend to be close enough to each other that they can be modeled together. These seat groups allow us to make use of less data to predict more prices. Some venues have as few as twenty groups; others, well into the thousands. Fenway Park has 993.
To understand Seat Scores, consider a simple example where the set of listings
Suppose these seats are equally priced, despite the fact that their quality
Unfortunately, while SeatGeek has a lot of data, we cannot directly observe the relative true quality
For example, when faced with a choice between
Continuing with the Fenway Park example, after processing our input signals, we have a square matrix
The rough values for
(2) adjusting the parameter values to increase this likelihood
Watch below as the seat scores converge from our initial values to the maximum likelihood (use the controls below to navigate):
Presto! Once we’re finished, we end up with something that looks very similar Fenway’s actual seating chart, only with much more granular distinctions on price levels. With these seatscores, we would expect
With these powerful seat scores in hand, we’re halfway to our goal of predicting accurate prices for live events at any venue in the country. Come back for our next post to see how we go from our seat scores to market value predictions for thousands of events every day. UPDATE: View part 2: Using a Kalman Filter to Predict Ticket Prices
In case you’re wondering what technology we use for these projects, here’s a sampling:
- pandas: a python data analysis library, for signal processing
- R: for statistical analysis and postprocessing
- ggplot2: to make the heatmaps seen above
- 1: If you read this far and wondered whether we were ever going to get around to this, then you’ll want to come back for part 2, when we explain how price predictions are derived from these seat scores.
- 2: Fenway park is actually a good example of this phenomenon, Green Monster seats in particular are heavily disagreed upon by our signals.
whenever , so we need not exclude these cases.