Refactoring Python with LibCST

Much of SeatGeek’s core is powered by Python. We rely on several Python services — built on top of the Tornado async web server framework — to deliver tickets to fans, manage our inventory, process payments and payouts, and more.

A few months ago, engineers at Instagram published Static Analysis at Scale — a detailed explanation of how Instagram automates large-scale refactors of their Python codebase using their newly open-sourced tool, LibCST. We were immediately excited to see how we could use LibCST to automate improvement of our Python codebase and eliminate hours of tedious dev work needed to perform large-scale refactors by hand.

This article details our experience building our first major LibCST codemod and using it to automate thousands of lines of code refactors on a large internal commerce service. If you’re interested in the codemod itself, you can explore the source code at https://github.com/seatgeek/tornado-async-transformer.

Choosing a Refactor

As we explored potential refactor targets for a LibCST codemod, we looked specifically for a refactor that:

Delivers clear value to our services.
Is currently performed manually by our developers.
Involves a level of complexity that requires use of LibCST (i.e. not something that can be done with a simple find & replace.)

We landed on upgrading our coroutines from Tornado’s legacy decorated coroutines (an ugly but essential hack created to provide async coroutines to a pre-async Python) to native async/await coroutines (introduced to the language Python 3.5). The Tornado documentation recommends using native async/await coroutines but continues to support the decorator syntax, which much of our legacy code uses. Here are two blocks of code that do the same thing; the first is written as a decorated coroutine and the second is written as a native coroutine.

# legacy decorated coroutine
from tornado import gen
import async_http_client

@gen.coroutine
def fetch_example():
    response = yield async_http_client.fetch("http://example.com")
    raise gen.Return(response.text)

# native async/await coroutine
import async_http_client

async def fetch_example():
    response = await async_http_client.fetch("http://example.com")
    return response.text

The decorated coroutine:

requires importing the tornado library to run asynchronous code
repurposes the yield keyword to mean “await a coroutine”
requires values to be returned using raise gen.Return

Benefits of Using Native `async/await` Coroutines

Migrating from decorated to native coroutines provides several benefits, both to the operation of our services and to dev experience.

Code readability
- Context-switching between meanings of the yield and raise keywords confuses developers and creates awkward code.
- Native coroutines look like those of other languages used at SeatGeek, like C# and Javascript, creating a more open internal codebase.
- No onboarding engineer likes to hear that they have to learn a new syntax for returning a value from a function.
Debugging/Monitoring
- In pdb, stepping into a decorated coroutine lands you deep in the weeds of the Tornado event loop rather than in the body of your coroutine.
- Exceptions raised from decorated coroutines produce bloated stack traces that clutter logs and exception monitoring services.
- Some monitoring services, like New Relic, only provide event loop aware diagnostics when using native coroutines.
Performance
- Native coroutines, according to the Tornado documentation, are generally faster than decorated coroutines.
- Python core developers frequently release performance upgrades to Python’s native async functionality.

Using TDD to Build the Codemod

Using test driven development seemed the obvious choice for building the codemod for the following reasons:

Codemods are inherently testable. All you need to write a test is the original code, the expected refactored code, and a few lines of helper logic to run the test.
We had already done this upgrade by hand on a smaller Python service and had collected a set of refactors from that PR which we wanted our codemod to support. Each of these refactors could be made into a test case.
Syntax trees require a lot of mental overhead; incremental test validation allows for quick tinkering-until-its-right development while protecting from breaks of existing functionality.

We built a simple helper function that visits a test_cases/ directory and iterates over its subdirectories. Each subdirectory represents a supported refactor of the codemod and contains a before.py and after.py file of the intended pre and post refactor code. We feed these test cases into a parameterized pytest function that runs our codemod on before.py and compares the output to after.py. Voila, we have a test suite! Adding a new test case is as easy as writing a Python file, manually refactoring it, and dropping the pair in test_cases/.

Unsupported Refactors

We realized early on that there are some features supported by decorated coroutines that aren’t available with native coroutines, like yielding a dictionary of coroutines. Upon encountering one of these cases we cancel the refactor, display an error message to the developer, and only allow the codemod to run after the developer has manually refactored that piece of code.

To test these exception cases, we created a second collector that visits an exception_cases/ directory and iterates over its Python files. Each file represents a known unsupported refactor and contains a module-level docstring with the exact exception message we expect the developer to see when this code is encountered. These examples are fed into another parameterized pytest function which asserts that the codemod raises the expected exception message when run on the provided code.

Here is an an example exception case test file: yield_dict_literal.py:

"""
Yielding a dict of futures
(https://www.tornadoweb.org/en/branch3.2/releases/v3.2.0.html#tornado-gen)
added in tornado 3.2 is unsupported by the codemod. This file has not been
modified. Manually update to supported syntax before running again.
"""
from tornado import gen


@gen.coroutine
def get_two_users_by_id(user_id_1, user_id_2):
    users = yield {user_id_1: fetch(user_id_1), user_id_2: fetch(user_id_2)}
    raise gen.Return(users)

Demo Site

Inspired by https://black.now.sh/ — a website where developers can try out the Python Black formatter — we wanted to have a simple website where developers could use our codemod without installing anything to their local environment. With a few lines of HTML, JS, and a POST endpoint, we built a demo website (linked in our repo) where developers can try out the tool and run one-off refactors with easy diff visualization.

Rolling Out to Production

One paradox of codemods is that in their early phases, especially when applied to critical codebases, one ends up spending nearly as much time verifying that the automated refactors are correct as one would doing the refactor by hand. To mitigate this, we started by automating a few smaller (~20-200 line) refactors, which we closely reviewed before shipping. Once we felt confident that these changes didn’t introduce any regressions, we rolled out the codemod to our entire service, refactoring over 2,000 lines of code in an excitingly seamless deployment.

Conclusion

We had a ton of fun building our first LibCST codemod and are already looking for new applications of the library. Running a codemod you’ve built on a large codebase is pretty exciting, especially as you look through a multi-thousand-line diff that could have been hours of tedious, error-prone dev work. Becoming comfortable writing codemods expands your imagination of the level of refactors that are possible, and potentially quite easy to do, within your organization, no matter their scale.

We’d like to thank the team behind LibCST for being friendly, welcoming people and for frequently pushing out new, awesome features to the library.

If you’re interested in helping us build codemods for our codebase at scale, check out our Jobs page at https://seatgeek.com/jobs. We’re hiring!

Code, Design, and Growth at SeatGeek

Jobs at SeatGeek

Refactoring Python with LibCST

Choosing a Refactor

Benefits of Using Native `async/await` Coroutines

Using TDD to Build the Codemod

Unsupported Refactors

Demo Site

Rolling Out to Production

Conclusion

Comments

Jobs at SeatGeek

Choosing a Refactor

Benefits of Using Native async/await Coroutines

Using TDD to Build the Codemod

Unsupported Refactors

Demo Site

Rolling Out to Production

Conclusion

Comments

Benefits of Using Native `async/await` Coroutines