“Tech debt” is a dirty word in the software engineering world. It’s often said with an air of regret; a past mistake that will eventually need to be atoned for with refactoring.
Financial debt isn’t universally reviled in the same way. Your friend takes out a mortgage to buy a house and what do you say? Congratulations! Bonds are a standard form of financing for infrastructure and public works. Businesses use all kinds of debt, and Wall Street shows its confidence in the form of higher stock prices.
The difference is intention. What if tech debt wasn’t always an accident, caused by incorrect assumptions and unexpected circumstances? How would you spend a tech debt mortgage?
If we define tech debt as work that will have to be done in the future, we can track spending in terms of the time that will be spent on that future work. We can also “invest” time in work done now.
This mental model has helped me avoid spending foolishly and paying in maintenance for a system I couldn’t afford, and it’s given me a way to see opportunities to use tech debt intentionally. Here are a few cases where taking on tech debt helped my projects succeed.
When I’m deciding how to prioritize the major parts of a big project, my goal is to validate the riskiest parts of the project first. By risky, I mean the parts that are easiest to get wrong, because they’re complicated, difficult to define, not well understood, or have some other smell that “here be dragons.” Because these components are the most difficult part of the project to build, I build them first while the codebase is smaller and easier to evolve.
It’s not enough to just build the risky components, however. I need to know that I built these risky components correctly. You never know until you try it, so I try to get my code into a real-world situation as quickly as possible. This usually means finding a way to make something useful with the risky parts before moving on to build the rest of the application.
When our team was developing Squarespace Email Campaigns, this meant building the email editor first. Squarespace is known for top-tier design and powerful content-editing tools, and we wanted to build something that topped the competition. After a couple of months, we had an editor we were excited to show coworkers, but we hadn’t yet built the system to actually send the email. We weren’t going to get the real-world feedback we needed if the editor was just a toy app that couldn’t send email.
The finished product was going to need the capability to send hundreds of millions of emails quickly and reliably, but we didn’t need that capability right now.
So we asked ourselves, “What’s the simplest thing that would make the editor useful to our coworkers?” Squarespace only had several hundred employees at the time and internal testers could tolerate a little unreliability. Something that sends a few hundred emails unreliably is WAY easier to build than the robust system we’d been considering. We realized we could build something cheap that we didn’t mind throwing away later—scaffolding—to unblock getting user feedback sooner.
Several self-imposed guidelines helped the implementation of our scaffolding go smoothly:
- We committed to our estimate. If we couldn’t complete the scaffolding in the estimated timeframe, it was a sign that we were taking on too much complexity and needed to rethink our approach.
- It was designed to be thrown away, and communicated as such from the get-go. The code was well contained, and we didn’t waste time nitpicking over minor implementation details.
- We understood the scaffolding’s limitations, and avoided using it in a situation where a failure would cause harm. We’d be going deeper into debt than expected if we spent time cleaning up errors caused by the scaffolding, so we were very intentional about the testing situations the scaffolding was used in.
- Stakeholders knew we were deliberately taking on debt. We bragged about these limitations with our stakeholders and users as if they were features—“Look at all the time we saved!”—and in the process made sure they knew we would need to invest time later to build the real email delivery system.
Scaffolding can help you adjust the order in which you build parts of your application by filling in for dependencies, so that you’re not stuck building components before you have a way to validate them. Our throw-away email sender got us feedback on the email editor while the code was still fresh in our minds, helping us iterate faster and perfect our hardest problem. When it came time to build the real email sender, we did it with lessons learned from building the scaffolding.
The dashboard area of a product I worked on recently features an area for seasonal messaging, e.g., “Wish your subscribers a Happy Holidays.” Every month or two, our team would schedule new content to show in that area of the dashboard.
We needed a Content Management System (CMS) to allow our product manager (PM) and designer to update the dashboard message themselves. Squarespace is a CMS company after all! Also, PMs usually wanted the dashboard to update at a specific time, and engineers didn’t want to time production releases manually in reaction to these requests.
One approach would have been to build a database-backed CMS, allowing the various members of our team to update the dashboard content instantly and without engineer intervention.
Unfortunately, this approach comes with costs: additional frontends, validations, and data to manage. These hidden costs are the bad kind of tech debt, saddling future developers with more maintenance work than the feature is worth. Additionally, we expected our dashboard message content to go weeks without changing, so we wouldn’t benefit from a solution that provided instant updates.
So instead of taking on the overhead of a database-backed CMS, we stored our content in a YAML file and reused existing tools to provide CMS functionality. The user interface to change the dashboard messaging became a text editor and Git instead of a browser UI. Other team members would validate these changes via code review, so we didn’t need to spend time writing validations and error handlers. Content changes are automatically synced across environments by our existing CI tools.
Exposing new content at a specific time was accomplished with yet more hardcoding. To paraphrase:
if now >= NEW_CONTENT_DATE show new content else show old content
I consider a few factors when deciding whether to hardcode something:
- Is it okay for changes to take several hours to take effect? Hardcoding comes with a clear limitation: changes require deploying new code to production. Failing to acknowledge limitations can tip something into a maintenance headache, creating bad tech debt.
- Is Git an acceptable update user interface? Updates will require interacting with your team’s version control tool, and maybe the CI system. To avoid introducing additional coordination or training overhead, we use the test: is there someone already involved in the process who knows Git?
- Are there existing UI, validation, and data patterns? Hardcoding brings the biggest benefits when it helps you avoid defining new patterns. If your application already has patterns that cleanly solve your problem, hardcoding might not be worth the drawbacks.
Allow-lists, form field options, and feature flags are other patterns that you can usually consider hardcoding.
Not Fixing All the Edge Cases
I once had to build a feature that allowed a user to create up to 10 items but didn’t allow more than 10. There’s a common race condition here: if we issue two create-item requests at nearly the same time, both requests will count the existing number of items, and then both requests will create items. If we do this when there’s already nine items, we can beat the limit and end up with 11 items.
Most QA testers would consider this a bug, and bugs are just tech debt that users can see.
Database transactions can help, but we’re using a NoSQL database that doesn’t support transactions. Read/write locks would work, but we’re operating in a distributed system so we’d need distributed locks. If we wrote a bunch of locking code to make the limit “perfect” we may end up introducing new, unexpected bugs. And we’d have to spend time writing a bunch of locking code—and every developer who works on this after us has to understand that locking code.
This feels like a lot of effort to keep a user from making one extra item. What about a non-technical solution: what if we intentionally didn’t fix the race condition?
Now I’m not saying quietly leave it there for the next developer to deal with! Being intentional here meant answering some questions:
- What happens when there are 11 items? Well, not much, actually. It was an arbitrary limit and the rest of the app doesn’t care about a few extra items.
- Can we find out if this race condition happens more often than expected? Yup, a simple database query found accounts with more than 10 items, and each item record had a timestamp. Turns out this race condition didn’t occur in production.
We were better off leaving this tech debt unpaid. Because the impact of creating an extra item was low and easy to monitor, we could spend time on higher-priority work instead of addressing a practical non-issue.
Good Tech Debt Is Intentional
A lot of bad tech debt comes from building too much and getting stuck spending more time on maintenance and bug fixes than expected. It’s like buying too much house and ending up in an underwater mortgage.
The key is to be intentional about what you invest time in and aware of the costs you’re taking on. Err on the side of building too little because you can always build more later. Build things to be easy to throw away and replace; it’ll make your code more modular. Good tech debt has clear, well-known limitations. Document these in code comments, READMEs, FAQs, and conversations with the people who’d care.
Used carefully, good tech debt will help you build software faster by focusing your time on the things that matter most.