Moving Fast in Software Development
Not that long ago shipping software was a physical act. Software was put onto disks and shipped to customers and retail outlets across the globe. Upgrades were generally not possible. Customers would wait until it made financial sense to buy the next version. Bug fixes didn’t happen. The cost of a bad release was high. Millions of dollars in labor and physical products wasted. Headlines about the incident would damage your company’s reputation. Customers would be hesitant to buy again from you.
The internet changed this. Rather than pushing bytes to a disk, we could push bytes over a wire at the speed of light. This change in the deployment of software enabled Software as a Service companies. These companies changed customer expectations. Customers didn’t want to wait for years for a new version of their software, they wanted it now. They came to expect this as the norm. Companies that didn’t adapt fell behind, and the ones that did had become market leaders.
Software as a Service companies didn’t grow because they gave customers what they needed. They grew by giving customers what they needed faster than their competitors. Bugs were no longer a permanent part of a software version. Bugs were something that were there one day and gone the next. Getting customer feedback on new features didn’t require bringing dozens of people to an office for a focus group. Now customers all over the globe could beta test new features before they were ready.
We all know the stereotype of small companies. With less bureaucracy and more focus they could come from behind leaving bigger, slower companies in the dust. This stereotype no longer holds true like it did before. Large companies move as fast if not faster than the small ones.
Software as a Service is no longer cutting edge. Software as a Service is the standard. Being able to deploy without your customer installing something on a server is no longer revolutionary, it’s expected. In this world, how do great companies succeed and differentiate themselves?
Giving customers what they need faster than your competitor is still a competitive advantage. Learning what your customers need faster than your competitor is a multiplier on that advantage. What has changed is the timescale. We are no longer competing on multi-year time frames, we are competing on months, weeks, and even days.
How Do We Move Faster?
It is imperative that we move fast. How do we do it? How do we improve our company processes, our team processes, even how we as individuals work to deliver more value to customers sooner? The best way that I’ve discovered is to view all of these as systems. A company looks at opportunities in markets and decides where to spend time and money to make more money. A team building a product looks at unmet customer needs and spends time and money into building a product that will meet those needs. An individual contributor looks at the work left to build a product and spends their time building a part of the product.
To move fast means that we are both moving in the right direction and moving in that direction at high velocity. If we are moving in the right direction but moving slowly, we may be working on the right problem but causing customers to wait too long for a solution to their problem. If we are moving in the wrong direction but at a high velocity, customers’ needs will not be met. The ideal is moving in the right direction at a high velocity.
At all levels from the company down to the individual we can ask ourselves: is what I’m doing having the effect I intend it to have and am I intending to have the right effect?
How does that translate to the team level? Are we building what we set out to build? Does the current state of what we are building meet the specified requirements? Does it have problems or bugs? Are we working on the right things? Is the backlog prioritized so that highest value items are first? Are we getting feedback from customers that what we are building is what they actually need?
How does that translate to an individual software engineer? Am I building what I set out to build? Does the code change I’m making do what I think it should do? Have I tested the change? Does this introduce new bugs or regressions? Am I working on the right thing? Am I working on the highest priority work items? Am I spending time on work that is important and not just urgent? Do I understand the product requirements? Do I think the product requirements meet the customer’s needs? Do I understand the customer’s needs?
In each of these examples we ask ourselves a question and use the answer to change our behavior. In most cases it will involve either changing what we are working on or how we are doing it. This process of continuously asking these questions and refining our own behavior based on the answers is an example of a feedback loop. Identifying and reducing the time between taking an action and collecting feedback on the result of that action is how we move faster. Another way of saying this: tighten your feedback loops.
Feedback Loops in Software Development
The qualities that separate good software from bad software are not code specific. High quality software enables developers to safely make changes at high velocity with minimal effort. It does this by minimizing the time between a developer making a change and seeing the impact of that change.
How is this possible? Most people are familiar with the adage “Good, fast, or cheap. Pick two.” I will make what may seem to be a heretical claim to those who believe this. You do not have to pick two. When it comes to software development, you can have all three. To understand, let’s break down what these three parts mean in relation to software.
Good implies that the software is of good quality. We believe that good code is clean, follows well established patterns, and contains tests.
Fast implies that the software will be delivered quickly. Software delivered fast is software that takes a very short time to develop and may even be well under estimate.
Cheap implies that the software was built either by low paid developers, or a very small development team.
Why do I believe you can have all three? Because most people have a misguided understanding of what software quality means.
What does good software look like?
To understand good software we can contrast it with bad software. Bad software has one or more common characteristics. The code may be difficult to read. Variables could be named in surprising ways. It may take a developer a long time to read through it and understand what it does. The code may lack test coverage. This makes understanding the code harder because you have no tests to validate what it is doing if you can’t understand it by reading. The combination of lack of tests and difficult-to-read code makes it brittle. Brittle code is code that, when you make a change in one place, changes the behavior in ways you did not expect. Brittle code is code that is very difficult to safely change; however, code does not have to be brittle to be hard to change. Other factors can impact the difficulty of changes.
In most modern systems, compilation is not what takes a significant amount of time in the build. Instead, static analysis, linting, and test running are what take up the bulk of the time. These are tools meant to enable quality but add time every time you want to make a change. Why can this be bad? Writing and making changes to software requires holding mental models about how the software works in your head. Most developers work by writing a small amount of code and then attempting to test out that the small change does what they think it does. The developer is looking for feedback that they are moving in the right direction. This iterative cycle is continued until the unit of work the developer hopes to accomplish is complete. When the time to test every little change increases past a certain threshold, the developer has to spend more and more time holding those mental models in their head. Distractions from emails, Slack, or nearby coworkers risk interrupting and causing the developer to lose the models. In extreme cases, a developer may wait hours to test out their change only to forget the details of what they wanted to test.
This iterative cycle of making a small change and collecting feedback is at the core of the agile philosophy. The cycle scales beyond a single developer working on a small change. It scales to the team, product, and even company levels. In the case of a larger feature requiring a whole team, their work must be integrated then deployed to a production environment where customers will begin to use it. Only then does the team get feedback on whether the feature did what they thought it would do, including whether it can be deployed successfully and without introducing bugs.
Now we can begin to more clearly define what good software looks like:
- Good software is easy to change. Changing one line does not have unexpected consequences. It does what you think it should do on the first try. Adding or modifying behavior does not require many iterations. It requires minimal lines of code to be added, modified, or deleted to implement the intended behavior.
- Good software is software that you have confidence in. It does not surprise you by only working in some environments but not others. It has automated tests in place to prevent you from breaking existing behavior.
- Good software allows you to gain quick feedback on the impact of your change. You do not need to wait minutes or hours. Feedback should be nearly instantaneous. Even as much as waiting a few seconds can be detrimental. Having quick feedback allows you to update your mental model quickly and minimize the risk of losing your mental model. This quick feedback helps put you in and keep you engaged in a state psychologists refer to as Flow.
If good software requires minimal effort to change and has a low risk of introducing new bugs, we achieve two additional benefits. First, because time to implement changes is low, software is able to be delivered fast. Second, because the change is so easy it can be accomplished by more junior engineers who don’t need to spend a large amount of time ramping up on the code base. Alternatively, more experienced developers may make the change very quickly requiring very minimal effort. This time can then be spent on other efforts.
Good software, then, enables developers to have a tight feedback loop between the change they make and observing the impact of that change.
The story of how software development has evolved to improve quality in the last few decades can be thought of as a story of tightening feedback loops. Waterfall development and its long planning and implementation cycle gave way to the more incremental development espoused by Agile advocates. Customer feedback was brought in earlier, prototypes and proof of concepts took favor over complex design documents. Instead of relying on quality assurance (QA) teams to validate and guarantee quality of software builds, Test Driven Development (TDD) and continuous testing became standard practice. DevOps is fast becoming the default culture for engineering teams. Breaking down barriers between operations and development has enabled faster deployments and given developers real time insight into customer behavior and experience. A more recent evolution has been GitOps and the move to continuous everything. In this model, code and infrastructure are defined in Git repositories and continuous deployment systems are constantly syncing the production environment with the state defined in Git.
How Can we Tighten our Feedback Loops?
If tightening feedback loops is the key to moving faster how can we as engineers and engineering leaders tighten the feedback loops that are within our control?
Make Testing Easy
As engineers we spend a large amount of our time writing code and implementing changes to build a product. The less time we spend implementing the change we need to make the more productive we are. The single most important piece of feedback we need is whether our change will do what we intend it to do.
- Find ways to make testing changes, all changes, as easy as possible. Code changes should have unit tests. Writing those unit tests should be easy. Running those tests should be fast.
- Seeing our code execute at runtime should be fast and easy. Reduce build times and automate all setup necessary to run your project.
- Automate your local development environment setup. Any developer should be able to clone a git repository and run it without taking any other actions. Any setup specific to a project should be automated in the project.
- Your local, test, and production environments should be as similar as possible. Most engineers have experienced the dreaded bug report that only reproduces in production. Debugging these is slow and painful. Constantly work to reduce differences between how developers work on local changes and how those changes run in production.
Deploy Early, Deploy Often
We can write unit tests until the sun goes down. QA teams could do exploratory testing for weeks. Our product managers could focus group our product roadmap until our competitors beat us to market. All this and we could still take down production with a feature our customers hate. The real test of every change and every feature we build is what happens when it gets used by real customers. Does it add a slow memory leak? Does it make it impossible for customers to proceed to the next page of a form? Will anyone choose to use it if it’s available? The best way to answer all these questions is to get your change in front of customers as early as possible.
- Use a continuous delivery system to ship changes.
- Relentlessly automate your deployment system. Keep all human involvement to the absolute essential.
- Make rollbacks easy. Most engineers think of automating a deployment system and think of automating moving bits from one server to another. If problems never occurred this would be the only automation necessary. Problems will occur and we need to find ways to detect and remediate problems as fast as possible. A deployment system that can detect a bad release and automatically roll back for you will always be faster than one that relies on humans for detection and remediation. Time spent monitoring and troubleshooting deployment systems is time not spent on improvement and feature development.
- Break up feature development into small changes that can be released individually. The bigger the change, the bigger the chance of a problem. If you can break down the changes necessary to implement a large feature each one could be released before the entire feature is ready. This gives you early feedback if one of those changes introduces a regression or causes some other problem. It also makes it much easier to identify the source of the problem if it does occur.
Invest in Observability
Every engineer should know exactly what their software is doing when it’s being used. Engineers should be able to see problems happening in real time without an end user complaining. Engineers should have the data available to investigate a problem and understand the root cause without needing to ship more changes. If the development team does not have the data to know what is happening internally when their software is in use, they’re missing critical feedback that they will not be able to act on.
- Measure the right things. Ensure what you are measuring is actionable and informative.
- Ensure every engineer has access to and knows how to read the logs and metrics that your system outputs.
- Encourage and empower every engineer to proactively find ways to improve logs and metrics.
- For distributed services, consider leveraging distributed tracing.
Resist Bazooka Solutions When Things Go Wrong
Things will go wrong. Your service will take an outage in production. You will ship bugs. Making testing as easy as possible will help to reduce the probability of these events but will never eliminate them. Ever heard the phrase: the road to hell is paved with good intentions? Teams don’t usually start out with a lot of byzantine processes that slow them down. These get added over time in response to events. Most teams have the instinct when something goes wrong to figure out how to prevent that same problem from occurring in the future.
What separates teams that maintain a high velocity over time from those that slow down are the actions they take in response to these problems. Often you’ll find requirements that take the form of “this person needs to sign off on these types of changes before they occur” or “all changes of type X may only be performed by team Y.” These are well-intentioned because they attempt to ensure actions are double checked for accuracy and to route certain actions to subject matter experts. The effect these have is to implement gatekeepers and bottlenecks. This is an example of a second-order effect. I refer to these solutions as Bazooka solutions. A Bazooka solution is a solution to a problem that attempts to completely eliminate the entire class of problems without regard for the collateral damage.
How can you resist Bazooka solutions?
- Take time to consider the second-order effects of proposed process changes. Pay special attention to the potential effect of a process change on any feedback loops.
- Preventing a problem from ever occurring again may not be worth the effort. Consider if there are ways to identify a problem earlier. Sometimes the right solution is even tighter feedback loops.
- Treat any change that introduces new human judgement or human intervention with skepticism. We as people are flawed, imperfect, and we are constantly making mistakes. The original problem likely occurred due to a person’s mistake. Adding more people to a process might collect more input, but it also opens the door for more human error.
Moving fast isn’t a nice to have, it’s a business imperative. Making the changes necessary to allow you to move faster is not the responsibility of your CEO, your VP, or even your manager. It is everyone’s responsibility from individual contributors to the top of the organization. Each of us have tools and processes within our control that we can improve to move faster. We can find improvements by analyzing these tools and processes to find the feedback loops within them. Make testing and deploying easier to give your team faster feedback on their changes. Invest in observability to bring information from your customers to the people best able to act on that information. Finally, carefully evaluate any proposed changes in the future from this framework to ensure that your feedback loops stay tight.