This book, I feel, has a great example of an effective subtitle. "The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations" - as a freshly minted leader in a technology organization I have a lot to learn and learning to build and scale a high performing one certainly sounds appealing.
This blog post will summarize 3 themes and big takeaways I got from reading the book.
I’ve always seen the business need for measuring KPIs and performance - what you can’t measure you can’t improve right? My problem with frequently suggested development KPI is that felt it’s a bit too arbitrary and either easily gamed, or become a priority over the bigger picture of delivering quality features.
Let’s look at velocity as a common metric, to make velocity useful your team has to be good at estimating stories accurately. I think developers are ambitious and want to please and deliver more than they often can, resulting in estimates being all over the place. Plus we’re an agile group right? How much time do we spend figuring out the details before we estimate? How much time do we actually spend estimating? That’s time that could have been spent doing the actual work. There are of course many other ways to measure velocity, it can all be 1 points per story, just keep the story to an average of about a day, etc. but digging into those details is not the point of this post.
Accelerate helped me verbalized the common development productivity metrics flaws:
- Velocity is a relative and team-dependant measure, not an absolute one. It's difficult to make meaningful comparisons between teams
- When velocity is used as a productivity measure, teams inevitably work to game their velocity. They inflate their estimates and focus on completing as many stories as possible at the expense of collaboration with other teams and team members
- When utilization gets above a certain level, there is no spare capacity (or "slack") to absorb unplanned work, changes to the plan, or improvement work. This results in longer lead times to complete work.
What is the solution? "Our measure should focus on outcomes, not output." The authors suggest the following software delivery performance metrics:
Lead Time - Measuring the timeframe between the first accepted commit and the deployment to production.
Deployment Frequency - The number of times the team deploys during a specific time period.
Change Fail Percentage - The rate at which deployments lead to failures and outages.
Mean Time to Restore (MTTR) - The average time it takes to restore the service after a failure.
I'm going to admit, when I picked up the book I didn't realize it would be primarily focused on DevOps. Yes I read the Phoenix and Unicorn Project, seeing Gene Kim as one of the authors should have been a dead giveaway, not to mention...
But like I said... that subtitle had me hooked.
At the end of the day, what matters is shipping software. According to the research the authors undertook, teams that focus on the KPIs above perform much higher than teams that don't. The individual on the team don't matter as much as how the team members interact and structure their work.
Unlike velocity as a metric the KPI above are interchangeable between teams and companies. What are differences between a low and a high performing team?
|On demand (multiple deploys per day)
|Between once per week and once per month
|Between once per week and once every six months
|Lead Time for Changes
|Less than one hour
|Between one week and one month
|Between one month and six months
|Less than one hour
|Less than one day
|Between one day and one week
|Change Failure Rate
I'd consider these actionable KPIs, something I should start tracking and making progress on. There are 2 main challenges I face around KPIs this book didn't quite help me see an answer for
...as a KPI. As a manager my first priority is delivering on the goals of the organization, and when it comes to the engineering department that goal is to deliver on the roadmap. Many factors contribute to an organization missing a roadmap (delays, cutting features, low quality). The above KPIs do have Change Failure Rate as a KPI directly responsible for improving quality, but none addressing the roadmap in general.
Accelerate stresses that the performance of a team is more important than that of individuals, logically if KPIs are tracked and show our teams moving towards being a high performing team perhaps it automatically addresses the roadmap concerns.
Individual performance isn't covered much in the book, and I strongly agree that velocity is not a great metric for individual. However I still need something to help developers progress in their career. I was hoping the book would touch on that, but didn't.
Maybe the argument can be made that if a team falls under the high performing category they will hit roadmap milestones and deadlines and the need for individual KPIs disappears.
One of the most interesting topics in Accelerate to me was on measuring performance on development teams. I've long been concerned about using velocity and other similar metrics as performance indicators and was hoping to find an alternative that's actionable but also provides a good developer experience. This book helped me verbalize my concerns about velocity (mentioned above) and provided alternatives that may not fix all my issues, but will provide value for all dev teams and software organizations.
Other Interesting Highlights
- Regarding change management - "In short, approval by an external body (such as a manager or CAB) simply doesn't work to increase the stability of production systems, measured by the time to restore service and change fail rate. However, it certainly slows things down. It is, in fact, worse than having no change approval process at all." - pg 79
- "Our analysis showed that the ability of teams to try out new ideas and create and update specifications during the development process, without requiring the approval of people outside the team, is an important factor in predicting organizational performance as measured in terms of profitability, productivity, and market share." - pg 86
- "Implementing CD at Microsoft's Bing Team - Satisfaction on work/life balanced jumped from 38 to 75% - Technical staff were able to manage their professional duties during work hours, they didn't have to do deployment process manually, and were able to keep the stresses of work at work" - pg 90
- "Establish a dedicated training budget and make sure people know about it. Also, give your staff the latitude to choose training that interests them. This training budget may include dedicated time during the day to make use of resources that already exist in the organizations" - pg 123
- Make monitoring a priority - "Refine your infrastructure and application monitoring systems, and make sure you're collecting information on the right services and putting that information to good use. The visibility and transparency yielded by effective monitoring are invaluable. Proactive monitoring was strong related to performance and job satisfaction in our survey." - pg 127
I feel like if I put all my highlights here, it'll get to long. So let's quit while we're ahead!