2.5 Years, 5 Developers, 1 Django Upgrade

For the past two and a half years, our core Texas Tribune website has been running on Django 1.5. In the meantime, the Django project has made rapid strides; it’s now all the way up to version 1.9, and has brought scores of new features, like native migrations, better testing, and exciting APIs. It has also incorporated plenty of performance and security features and enticing new libraries.

Meanwhile, we at the Tribune had been stuck on 1.5. We wanted to upgrade, of course, but with each passing feature and change in personnel, the prospect became increasingly daunting. Our core product had a mix of our site, CMS, data apps, low-level customizations, and (of course) many third-party libraries. Some of these, including the Tribune’s own Armstrong project, have fallen behind the Django update cycle. Others had updated versions, but we’ve been afraid to touch them for fear of incompatibilities. And our developer time was split by demands from our editorial, news apps, and business teams alike.

So by the time we got to developing a plan for upgrading, it had been over a year, half of our team had turned over (now consisting of Amanda, Chris, Kathryn, and Daniel), and Django was already on version 1.7. If we were ever to upgrade again, we needed a game plan.

Phase one: Diagnoses, Deletion, and Triage

We’d talked about it before, but the first written evidence of attempting to upgrade was in early December 2014. Chris sketched the beginnings of our upgrade workflow. The initial plan was to just try the upgrade and see what broke.

Chris’ first list included just three things:

This seemed manageable, but of course, demons lurked below. Chris passed the task to Daniel, and Daniel soon discovered that two of our Armstrong apps were likewise not compatible with 1.6:

  • Donations, our app for handling member donations
  • Hatband, a tool that brings JavaScript UI enhancements to the Django admin

These projects, like many in Armstrong, have not been actively supported for a while. We made the choice to put the upgrade on the backburner while we figured out what to do with them.

In the meantime, we deleted, deleted, deleted. We deleted so much that we wrote a whole separate blog post about it. Our site has been up since 2009, and we had piles of obsolete and superseded code. What’s more, many of our old data apps and applications still lived in the core repository (although we now start new projects in separate repos). Kathryn and Amanda led the charge in upgrading and deleting scores of apps, and tens of thousands of lines of code.** (rough estimate: 25,000). In the midst of deleting, Liam joined the team to replace Chris, who had left for greener pastures. Liam was grateful to have less code to wrangle off the bat.

Phase two: Armstrong

The Armstrong donations app was old and unsupported; moreover, our spring membership drive was around the corner. The timing was perfect to create a new donations app, which we knew we wanted to separate from our core Django site. So Kathryn built a shiny new app using middleman, now living at support.texastribune.org (feel free to, you know, go there and become a member or donate!).

Armstrong’s Hatband, meanwhile, was used in a small but crucial corner of our CMS, and it depended on a Frankenstein’s monster of old crusty JavaScript libraries. Liam first attempted to hack around it, but the interface was maligned by reporters and our performance was taking a hit from the weighty old tech; he ultimately decided to rip it out, swapping it with a lighter customization of the Django admin.

Phase three: Bigger problems

It seemed like we had cleared the way. We started deploying smaller changes that were backwards-compatible. But once we pushed the upgrade button on our staging site, more tragedy struck:

  • our version of raven (a client for Sentry, our log monitoring tool) was out-of-date…and an upgrade would require an upgrade to our Sentry server;
  • an upgrade was going to break our outdated version of django-compressor; and
  • our core MySQL database was giving some complaint about rolling back transactions…

We hadn’t discovered these because of the differences between local and production environments. The first two were annoying, but the last one turned out to be the worst: we were using a five-year-old MySQL database on a MyISAM engine, which did not support transactions and rollbacks. There was no way it would play well with Django 1.6’s new transactions API. It was time to upgrade our database.

Daniel treated this roadblock as an opportunity, and we ultimately switched our database from MySQL to PostgreSQL. Our reasons and process for this could take up multiple blog posts on its own (we just might do it). For now, suffice it to say that after many weeks of work, test runs, and false starts, we were ready to swap. The first maintenance window didn’t go so well; all of our timestamps were five hours behind, wreaking havoc on our sessions, and we had to roll back. But on Halloween morning, the day after an epic flood in Austin that left Liam without electricity (he watched from a coffee shop), we swapped out our ancient database with a fresh copy of Spooky Postgres.

While Daniel focused on the Postgres transition, Liam spent one sprint upgrading our Sentry server and improving our log monitoring, and another sprint (or two) dropping django-compressor from our codebase, replacing it with WhiteNoise as the manager of our static assets.

Phase four: Finally…

On November 12th, after nearly a year of planning, thousands of hours of typing, and several barrels worth of coffee, we pulled the trigger. It was a disarmingly small code commit, a simple line change in our requirements file. Two or three small bugs emerged on old corners of the site that we hadn’t thought to test, but no big or systemic problems; we spent the next day squashing these bugs, then getting celebratory drinks.

Lessons learned

It took 30 months to get to 1.6, but less than 2 months to upgrade further to 1.7. It was a much less daunting task, not least because we had some upgrade experience under our belts. As we continue to modernize, we hope to take some lessons in tow from our last upgrade process.

Three choices: update, swap, kill

For every old feature or library that was about to break, we were generally faced with the same few choices for how to deal with it: update, swap, or kill. Our rough algorithm for dealing with this, in pseudocode form:

if there is an easy update or fix
else if there is an alternate library
Rock, paper, scissors
Update (rock), swap (paper), kill (scissors)

In the case of Armstrong components, we chose to kill, and roll our own lightweight replacements. For django-compressor and MySQL, we swapped to WhiteNoise and Postgres respectively. And for Sentry and Reversion (along with many other smaller Python libraries), we updated. These decisions aren’t always easy, but we erred on the side of using the most recently updated libraries, and we looked for overlaps with our other goals. Then again, we’re still using the old test runner, so sometimes our solution was “put it off.”

Look for overlaps

Many of these old versions and libraries were, unsurprisingly, in neglected corners of our codebase, where we were previously afraid to touch anything. The drive to upgrade put incentive on developers to go in and clean out old cobwebs. Moreover, many projects served more purposes than merely upgrading; we had wanted to switch to Postgres and drop old Armstrong projects anyway. While they were side-upgrades in our path to our Django mega-upgrade, they improved the performance and usability of our site in crucial ways. Our fresh new Sentry and WhiteNoise implementations likewise improved our development workflow, allowing for benefits like integrating error logs with Slack, and streamlining deployment of static assets. Finally, they were also important benchmarks to show to editorial and business staff; progress was being made as a byproduct of these upgrades.

In short, keeping Django up-to-date is not just helpful for its new features, its security, and its performance: it’s also an effective way to audit our codebase for old smells, and provide incentive to developers and managers alike to tackle them.

Upgrade in stages

The majority of our roadblocks to upgrading were due to code deprecations; given the way Django’s release cycle works, this meant that most of the stuff we needed to change for 1.6 would also work with 1.5. So we made 95% of the necessary changes before actually upgrading; this let us identify problems piecemeal. By the time we upgraded, we had tackled most of the potential problems already, and the upgrade itself was mercifully anticlimactic.

Some people advised us to rip the Band-Aid off and upgrade three versions of Django at once, rather than just going to 1.6. But the scale of even this single-version change made a multi-version upgrade seem nigh-impossible. At best, we would have been left with smells and deprecations we weren’t aware of. At worst, it would have been daunting to the point of demoralizing.

A quick note on testing: it’s probably impossible to test for every single thing that could go wrong. We have automated tests that helped us identify many early problems. We navigated around every conceivable corner of our site, and went through a checklist of major editorial actions that happen in our CMS. We even tried to run a sample of our access logs through our test site to check for errors. But these still weren’t enough, as we didn’t discover some of the biggest problems until they hit staging, and smaller ones even popped up in production. This might be unavoidable, but the earlier we can poke at the upgrade, the better in terms of knowing what might go wrong.

With these lessons and patterns in mind, we already upgraded to 1.7 and have a more robust roadmap towards upgrading to Django 1.8 and 1.9; we have set up a more rigorous testing plan; and we expect to get there much faster than the 2.5 years that it took us to get to 1.6. For anyone sitting on an old stack of technical debt, be encouraged: it can be done!

Deletions Sprint

Deletion celebration tweet
Deletion celebration tweet

Over 15,000 and still counting. That’s how many lines of code we removed from the Tribune codebase in June of 2015. That’s about 3% of the codebase, which remains over 455,000 lines strong, according to a Sublime Text regex search.

How did we get here? Our team works in two-week sprints, and we sometimes give our sprints themes, which means all three of us work on tasks focused around a single goal, like performance. We’d had deletion of legacy code hanging around on our list for awhile. The Tribune has been around for over five years now, so there was quite a bit of code that the project had evolved beyond. We have code to be deleted detailed in Github issues and Basecamp lists. Some of this code has been a part of the site since as early as October 2009!

When you tell non-programmer colleagues that you want to spend a few weeks focusing on deleting code and lessening your technical debt, be prepared for a blank look. Then you’ll have to translate “code deletion” and “less technical debt” into an explanation of why it’s important to you (and to them, and the rest of the organization) that you delete that code.

I like to use the analogy of a house; if your rooms are cluttered with unused items, it’ll take you a lot longer to navigate through and get things done. If you have a nice clean house, however, you’ll be able to, for example, easily find your way to the kitchen and cook up a delicious dinner. Same with a codebase. If there’s unused code hanging around, it’ll take you longer to wade through it to see what it’s really doing and add a new feature or improve the existing ones. But when the codebase is clean and shiny, the new feature is easier to add in and can be built faster.

So recently, after explaining why cleaning up legacy code is beneficial to everyone within earshot of our corner of the newsroom, and reminding everyone multiple times that’s what we’d be focusing on for our sprint, we finally got to roll up our sleeves and get down to the business of deletion.

This summer was the perfect time for us to clean up the codebase, too, before our newest team member, Liam Andrew, joined us. This way, Liam wasn’t distracted by unused code, which eased his period of orientation with the codebase. We’re also gearing up for a few big projects on the horizon, so this helps us get ready to attack those. Our codebase feels fresh and clean now; although it’s only about three percent, it feels like more. Immersing ourselves in code deletion has also been a positive because it’s made us better programmers. When you’ve been in the weeds deleting unused code, it makes you approach adding new code with a more complex perspective and refactor code as you work.

This is how light it feels
This is how light it feels

Much of the code that we deleted was replaced with something else that provided a similar, but improved, functionality. For example, we removed the code powering our previous donation page when we went live with our new donations app where people can become members and support our work. We also removed our homegrown Twitter app taking up a large amount of database tables that were powering Twitter widgets on our site, and replaced them with the widgets provided by Twitter, removing the burden on our database and codebase and placing it onto theirs. In addition, we removed the paywall that we had been housing inside our code for our premium Texas Weekly product and replaced it with a third-party paywall, Tinypass.

Some things we archived and then deleted the code powering them. From this process of deletion, we’ve come to appreciate more deeply how important it is to think through the process for sunsetting a project once it’s no longer relevant. This is especially important for news organizations where most stories are at least somewhat time-sensitive. Otherwise, projects will hang around forever and ever and ever and related code will break. Sunset plans allow you to create a pristine archive of your organization’s work, and they’ll ensure old projects don’t keep you reliant on requirements and legacy code that could hold you back.

The beauty of having a sunset plan
The beauty of having a sunset plan

When you’re removing code from a large project, especially code that was written by a previous developer, it can be scary to delete a significant number of lines. What if you forget to delete something related and leave behind orphaned code? Most frighteningly, what if you miss something dependent on the code you’re deleting and break parts of your site?

Our test suite gave us some comfort when it came to removing this code. So did deploying and testing out the code in our staging environment, which closely mimics our production site. We also followed our policy of always having another team member comb through our code changes. That got-your-back help of having another pair of eyes on your work can be integral to catching all the details.

We’ve also stepped up our documentation game. We’d already started fattening up our internal wikipedia prior to all these deletions, but the deletions definitely kept the momentum going. I think of looking at legacy code like an archeological dig; the more of the bones and tools you have, the better you can understand what you’re looking at. And if you understand how code was used, you’ll know what you can safely remove when that code’s no longer needed.

Don't let your codebase be as mysterious as Machu Picchu
Don’t let your codebase be as mysterious as Machu Picchu

Have you recently deleted some code from your organization’s codebase, or another project you’re working on? Do you have some code you’ve been meaning to remove but haven’t found the time and space to get around to it yet? We’d love to hear the story of your deletions, any challenges you came across, and how it feels now if you’ve already deleted the code and lessened your technical debt!