Squarespace / Engineering

View Original

Unfold's Modern Mobile Release Process and the Subtle Art of Making Them Boring

On the Unfold team at Squarespace, we build our mobile app for both iOS and Android, and our releases don't require much in the way of manual intervention or human oversight. In fact, we don’t have to give releases much thought at all.

This was not always true. There was a time, not all that long ago, when our mobile release process was often busy and cumbersome, slowing us down and taking our focus away from what we do best. To try to find a solution, we set out on a journey to refine our process and leverage tools that could support us in standing up seamless mobile release management. Our goal was set: get good at running pain-free mobile releases.

Over the course of this journey we went from shipping ad hoc releases that we spun up at random, to one week release trains for both Android and iOS that turned all our releases into boring, background events that just… happened.  While “boring” might not typically be considered a good thing, in this case, it is. Boring means that the product we are building and releasing to our customers is high quality, giving our engineers more time to focus on what they do best – building excellent products.

Let’s look at what these non-eventful mobile releases are like, dig into how much more eventful they used to be, and walk through the work it took us to get here.

Our delightfully boring mobile releases

On our team, everyone takes a turn as release captain, it doesn’t matter if you’re an individual contributor or a manager. If you’re part of the team, you’re in the rotation. This reinforces a sense of ownership over the product we deliver to our users and ensures no one is in the dark about how we get things done. Plus, having a variety of people involved means each person runs a release no more than a few times a year, keeping it from ever being the sole responsibility of any one person.

Our mobile releases require very little in the way of manual intervention. Release captains acknowledge Slack messages; these include operations we have yet to automate, for example, ensuring the latest translations have been merged ahead of a branch cut. They still keep an eye on steps in the release even though they are not the owners. An example of this is updating the release notes, as this is owned by our product managers. If there are any problems, they are responsible for ensuring the given step is unblocked. However, in most cases they’re taking no action to make things run. I really meant it when I said that our releases just happen. Our releases are standardized across iOS and Android, and a single release captain is able to run it without much effort.

We have a one week release train that each captain takes their turn running. Things kick off on a Friday evening when everyone is peacefully asleep — since we have teammates in Europe and the US, we want to make sure engineers in timezones farther west are able to get their work in.

Figure 1 – Timeline of our release events from week to week

At the scheduled kickoff time, our mobile release management platform automatically cuts the release branch, triggers a first release candidate, and signals to engineering and QA when the new build is ready and waiting for testing to begin.

We don’t dive into testing blind. On the Wednesday of the prior week, the QA team prepares a test plan and shares that plan with the engineering team. We then let them know what’s been modified since the previous release, and together we decide what test cases to execute and ensure we have enough capacity within the team to complete all the testing.

QA gets going on Monday and starts reporting any bugs they find. For showstoppers, we implement fixes and cut a new release candidate, which is a really simple workflow for our engineers:

  1. The engineers push a fix.

  2. It’s automatically pulled over into the release via a cherry-pick automation.

  3. Out pops a new release candidate.

  4. QA gets notified, switches the build they’re testing against, and we continue on.

We aim to perform and complete all testing Monday through Wednesday to keep it from spilling over to the latter half of the week. This ensures that we do not have to deal with any issues right as we’re getting ready to push things out.

We submit for store review on Friday, then on Monday morning — assuming review was successful over the weekend — we roll the release out to the App Store and Play Store in stages.

These staged rollouts go out over several days in both stores, letting us track crash rates and other potential problems. Another automation takes monitoring data from Crashlytics and Amplitude and can automatically halt the rollout if the new release falls short of any of the app health and stability expectations we have defined. We’re notified and the team starts triaging so they can prepare and ship a hotfix if needed. And that’s that. Looking back at the previous year we have been able to reduce the number of hotfixes we have had to ship from many a month to 5 times a year.  With the evolution of our process our releases typically pass off without incident and our engineering teams are shipping features with confidence.

If this all sounds so simple and boring, it’s because it is. But, how did we get here?

Our much-less-boring mobile releases of the past

In those earlier days, we had no release captain rotations. It was the job of our engineering managers to corral the various stakeholders and try to keep the iOS and Android teams on a schedule, without causing a release train. The engineering managers had to talk to multiple product managers to get release notes together, go back-and-forth with QA to get a testing plan in place and keep up with progress, and make sure engineers were working on bug fixes as needed — they were running the entire process from top to bottom and it was pretty dynamic and overwhelming.

Without a release train, we were just shipping releases whenever new features were ready. This often led to misaligned releases and even stability issues. In some cases we’d actually end up releasing four or five times in a week simply because we'd run into issues, or we’d keep trying to squeeze more changes in. It was extremely timely and inefficient. We estimated losing nearly a third of our engineering time just from the work to coordinate and execute a release. Our engineers felt the extra cognitive and mental load and we knew we had to make a change: no one enjoyed being a release captain and now they do.

This process was also challenging for QA because we asked them to test builds quickly, often, and on what could be an unpredictable schedule. This made it difficult for them to do their best work efficiently.

This went on until we finally encountered our “line in the sand” moment. It was during the holiday season several years ago. We ended up shipping four hotfix releases at the end of December, right when people were signing off for the holidays. It was challenging, and coming off the holidays into January we decided we’d had enough. This was the catalyst that convinced us we had to put in the work to make big changes to our release process.

You know what happened next: we waved a magic wand and our releases were fixed…

… I wish it was that easy, it took a little longer and with a lot more effort. Here’s what we did.

Mobile release management: The journey from chaos to boredom

We told ourselves it was time to “get good” at releasing software, and that as we got good at it, we could further refine our process and shorten it. Our ultimate goal was to get to a smooth and hassle-free one week release train, but we initially set out to establish a more achievable biweekly train.

To that end we created a working document, an RFC — Request For Comment — to lay out the plan. It was here that we set out the responsibilities of our release captain role and designed our release rotation so each member of the engineering team would have their turn. This plan belonged to the engineers, but we needed to get buy-in from all the stakeholders who would be impacted by it, particularly Product and QA, whose work directly fed into and was influenced by our release process. We sold them on our vision, and found there was a lot of excitement to make these much more consistent and boring releases happen.

We presented our working document to the various stakeholders (product management, QA, marketing). We got buy-in at every level and proceeded to start our journey.  We ran 3-4 releases, then organized a retrospective to discuss how those releases went and what we could do to improve. Right away, the release captains noted that they weren’t sure when a problem was serious enough to halt a rollout. So, we established a set of heuristics (you could call them SLAs) around when they should take action, defining very specific stability and app health numbers that indicated when a problem was big enough that they should halt a release and ship a hotfix. An example of an SLA we use is crash free sessions to stay above 99.5% on both platforms.

This was an enormous help and we began to settle into our routine of releasing every two weeks. We felt really good about it, but the Product team hadn’t forgotten our initial goal: they ambitiously pushed us further and kept reminding us, “You promised us one week releases!” Going from two weeks to one week may not seem like a big deal. Two weeks sounds short enough, but that’s deceptive because you’ve got two weeks of development, a week of QA, and then a week of phased rollouts. You actually end up waiting four weeks until a new update or feature gets into users’ hands. This longer cycle also meant frequent pressure to fit late-arriving features into a release mid-QA via a hotfix or a new release candidate so we wouldn’t have to wait and fit it into the next cycle. From a business perspective, we still needed to lift our pace.

We went back and looked at where we could improve, and saw that it would take a huge amount of human effort to make this happen. Even with our release train and with captains splitting responsibilities, individual engineers were still putting several hours of their time into every release, pulling their attention away from product work. If we were to release twice as frequently, time spent shepherding releases would double.

For QA, it was hard to plan test execution. We previously operated from a Google Sheet template that was cloned and updated on a per-test run basis. This required a lot of manual effort and was often error prone. During our first steps to remedy this, we leveraged a service called Xray for our test case management and execution, migrating the contents of our spreadsheet to our new tool.  We introduced the idea of a test plan to be executed with each release. This was a big leap in terms of productivity but also insights, as we could now report on metrics like the number of defects discovered by test execution, and insights like these were hard to arrive at using our old approach.

Our engineers and their supporting cast needed more tooling, infrastructure, and resources to support their release work. That’s when we discovered Runway. Out of the box it offered additional automation that we didn’t have to divert even more time and energy to build and maintain ourselves, and it provided us with a single source of truth every stakeholder could access to check in on the progress of an ongoing or upcoming release. It became the control center for our entire mobile release process.

Establishing our release train, making a deliberate effort to become good at releases, and bringing in tools like Runway and Xray allowed our release captains to spend minutes a day managing releases instead of hours while also shipping features much faster than ever before. Our boring release process paid off.

Mobile teams shouldn’t need to worry about releases

When I shared my idea for this blog post with some non-mobile peers, their responses were similar – some variation of, “What do you mean ‘make releases boring?’ Releases are already boring. And you can always just roll things back when they aren’t.”

But not being mobile engineers, many weren’t aware of one of the biggest problems with mobile releases: there is no such thing as a rollback. You have to go through an app store review and once it’s approved, you have to trust that customers will actually install your latest update. It’s like on-prem software — you can’t feel good about a release unless you have a high level of confidence that what you’ve created is stable enough to put out there and have it stay out there, because it may never get updated again.

Every mobile release has the potential to be exciting because mobile releases are by their nature more complex and difficult to run. We need various stakeholders to sign off on our work, robust testing to keep bugs to a minimum, and crucial safeguards along every step of the process. We don’t have the luxury of fixing things live.

When we ask team members to take turns running releases, we're putting them in a position in which they have to wear several critical hats that don’t directly relate to their feature work. But, the Unfold team is made up of amazing product builders, not release builders. Releases are not their day-to-day specialty. The hat they wear best is their building-great-products-and-delivering-product-value hat. That’s why it’s important to us that we ensure all of the additional hats we ask our team to wear take as little extra energy and time as possible.

Process is great when it’s helpful, but to be genuinely helpful it needs to work, not just once, but repeatedly. No one should have to actively think about and manage all the steps of their process all of the time… and at Unfold at Squarespace, we’re thankful we don’t have to.