We build better systems when we work together. At Squarespace, we write a Request for Comments document—an RFC—when we’re creating a new system or making a major change.
RFCs, also called Design Documents, are a common industry practice and for good reason. They give us a mechanism for reviewing each other’s ideas, and describing our own.
RFCs turn ideas into words. They force clarity: it’s hard to write an RFC unless you’re sure about what problem you’re trying to solve, and why. They encourage the author to think through all aspects of their design, and to make and document decisions.
Having a written design allows other engineers to review those decisions. Reviewers may have opinions on whether the problem needs to be solved at all, or whether simpler approaches might work just as well. They may notice risks to the project, flaws in the design, or edge cases that won't work. Or maybe (rarely!) they'll think the design is perfect in every way and leave a comment to say so.
Around a year ago, we started iterating on the RFC process we use in our Infrastructure organization. We wanted to address the parts of our process that weren’t working well:
Designs didn’t always get deep review. Engineers complained about discovering systems that were implemented with subtle flaws, or that were more complicated than they needed to be.
Authors made decisions that were locally good, but that didn’t fit into the broader architectural picture. Two teams might solve related or overlapping problems, but not find out until they'd implemented systems that didn't work well together.
Reviewers weren’t sure what they should be looking for. They were wary of nitpicking, or asking "stupid questions" about issues that the author had probably considered but hadn’t written down.
Authors felt that reviewers were piling on criticism. When an author sent a new RFC for review, they sometimes got flooded with comments. It was difficult to distinguish between comments that were intended to be blockers and those that were just interesting asides.
Reviewers weren’t clear about whether they liked the overall design. If a reviewer had no objections to make, they didn’t call out that the design was good; they usually said nothing at all.
Authors weren’t sure when the review period was finished. They didn’t know when it was OK to start implementing.
We took three steps, each of which incrementally improved our process: we wrote an opinionated RFC template, created Infrastructure Council, and introduced Architecture Review.
Opinionated RFC template
As a first step, we rewrote our RFC template to give more opinionated guidance on what the RFCs should include. Rather than have separate best practices documentation for writing and reviewing designs, we want to make it easy and intuitive for people to do the right thing.
The new template has an “approvers” section, with space for approvers to sign off on the design. Although we value comments from many people, by naming the approvers, we are clear about where the decision lies: if the approvers don’t say yes, we won’t start implementing.
If I wrote an RFC for this blog post, its header might look like this:
Primary author(s): Tanya Reilly
Other reviewer(s): Laura, John, Kevin, Polina, Guislain |
I’ve added Eva and Alex as approvers, and I won’t continue working on this unless they think it’s a good idea. I’ve also added a few other engineers whose opinions I respect a lot. If they don’t have time to review, that’s OK. I won’t block on it.
Approvers say “yes” or “not yet.” We want to encourage constructive comments, so the template doesn’t suggest “no.” Instead of just vetoing an idea, it’s better to be clear on what it would take to approve it. A “yes” might come with caveats. For example, in reviewing the first draft of this post, Eva might write, “Yes, if you can show that each iteration was a useable milestone.” This sort of “yes, if” approval means that the design author isn’t blocked and doesn’t need to wait for the approver to read again later. The approver trusts them to do the right thing. It’s also more encouraging than a flat “no.” The approver is working with the author to get the best possible design; they’re not gate-keeping.
Note the status field here: it’s free text, and I’m using it to be clear about the kind of review I want. For a first draft, I might want reviewers to tell me whether this is even a good idea. If it’s duplicating effort or doesn’t seem useful, I’d prefer to know before I sink a lot of time into it. A different design might already have broad approval and just be looking for rollout risks. By being clear about the status, we make sure that the reviewer is leaving comments that are useful at this stage of the design.
To make reviewers’ lives easier, we added a few sections to help discover potential sources of conflict or wrong assumptions:
Alternatives Considered/Prior Art: List the other approaches you considered and why they didn’t work. Are there existing systems that almost worked?
Dependencies: What other systems do you depend on? Do they know you’re about to depend on them?
Operational work: What manual tasks are you adding and who do you expect to do that work? Will other engineering teams or our excellent Customer Operations folks need to do anything to make this design successful? Spell out what you expect.
These questions uncover some misunderstandings and assumptions. They also make reviewers feel more comfortable asking questions. If we have “Prior art” as a section, for example, we’re clearly interested in why we couldn’t just use those existing systems. It makes it easier for the reviewer to emphasize simple solutions.
The template’s boilerplate text suggests some categories of approvers: potential users of your system, teams that manage systems you depend on, maintainers of internal systems you evaluated but chose not to use, anyone you expect to do extra work. However, we found that teams were still sometimes surprised by RFCs they hadn’t noticed going by, or that authors had forgotten to include them on. That was the next problem to solve.
Infrastructure Council
We needed a way to make sure RFCs were seen by everyone who needed to know about them. There was precedent for this: our colleagues in Product Engineering had a regular meeting, Product Backend Council, where RFCs were discussed and reviewed by the whole organization. Rather than invent something new, we adopted their idea and created Infrastructure Council, a meeting every two weeks where we presented and discussed RFCs in front of most of Infrastructure. It proved popular and well attended, and we always had a full slate of RFCs to discuss.
Infrastructure Council didn’t solve all of our problems though. We had three RFCs to get through in an hour, so there wasn’t much time for deep review. With only a few minutes to comment on an RFC, nobody said “yes if…,” they just called out their objections.
It also still wasn’t clear how much weight to give to those objections. If someone didn't like the design, was that a veto? If that person hadn’t been an approver on the original document, did the author really need to block on their concerns?
It brought new problems too. It’s intimidating to present in front of your entire organization. Engineers weren’t willing to bring RFCs until they were very polished… by which time the author was invested in the idea and major changes weren’t welcome. It felt unkind to question a design decision in a forum this big, particularly when a more senior engineer criticized the work of someone more junior. Nobody enjoys feeling called out in front of a packed room.
Using Infrastructure Council for design review made RFC authors feel like they were running a gauntlet. It was intimidating. We needed to do better.
Architecture Review
We wanted RFCs to have in-depth review, but without making the author feel under attack. So we introduced Architecture Review, a twice-weekly meeting where a small group of our most senior engineers review an RFC in depth. Each major RFC gets a one hour session and, since the same people review them, the reviewers build up a picture of everything that’s happening in the organization, and can notice overlaps or incompatible initiatives.
Initially, only the reviewers and the document author attended Architecture Review. It didn’t feel quite right, though. Discussing the RFC in front of the whole organization had felt too public; having only five reviewers had the opposite problem: it felt like decisions were being made secretly, “in a smoky room.” Although we tried to select a group whose knowledge covered the majority of our infrastructure, the review members couldn’t be experts in all of our systems. We found a balance by inviting a small number of other people to participate in each review. These additional reviewers sometimes include key potential users of the system, technology experts in the domain, or someone the author suggests will bring an interesting perspective. Big meetings get unwieldy, so we aim to have no more than ten people in the room.
Architecture review doesn’t include a presentation. Attendees are expected to already understand the RFC, and they may have already had conversations or comment threads about it. The group spends 50 minutes on deep discussion of the proposal, taking detailed notes of objections, concerns and caveats that arise, use cases that should be included, risks that should be mitigated, and so on.
The final 10 minutes of the meeting are for making the decision. Each person in the room says whether they’re comfortable with the proposal, or what would have to change for them to say yes to it. The most common response is “Yes, if you make sure that…”.
Reviewers can and do say no, but, just like with our design template, these objections are more likely to be framed as “not yet”. Although some proposals really won’t work in any form, most ideas are worth exploring more, and we sometimes invite authors to iterate and come back for another review. We aim to find rough consensus in our decision: we're looking for general agreement among the group, but it doesn't have to be unanimous.
After the review, we add Architecture Review as an approver on the RFC, with a link back to the notes from the meeting. Since the notes include the list of attendees, the design now has the documented support of some of the most senior engineers in the organization. The author can feel confident that they’re not going to be surprised by objections after they’ve implemented the system.
The full notes are open to all of engineering, and a summary document shows all of the Architecture Review meetings we’ve had.
Conclusion
Architecture Review has been a success. We've used the model to make big decisions like adopting Go as a first class language for Infrastructure, and switching to gRPC for inter-microservice communication. We’ve used it to do deep review of several new systems that crossed multiple teams. RFC authors have told us it's been a good experience, and our colleagues in Product Engineering and Internal Engineering have begun to use the same model.
We still have Infrastructure Council, but we’ve turned it into a forum for building and maintaining community. Authors present RFCs there, but now it’s an opportunity to share knowledge, give visibility to good work, and practice presentation skills in front of a friendly audience. As well as presenting RFCs, we use Infrastructure Council to highlight excellent incident retrospectives and to deliver lightning talks. We build links across Engineering by inviting Product and Internal Engineering teams to present on how they use our infrastructure. We call out wins, we welcome new hires, and we celebrate people who have done great things.
Infrastructure Council Template
|
Infrastructure Council and Architecture Review are my favorite meetings. Reviewing each other’s work makes us write stronger designs and learn from each other, as well as contributing to a collaborative and open culture. We’ll probably always be iterating on how we do design review, but I like where we’ve gotten to so far, and I recommend this model for other organizations.
Find our RFC template here. Feel free to use it in your own organization.