Strategy Deployment and Developer Experience

Triangle depiciting the three Developer Experince dimensions of Feedback Loops, Cognitive Load and Flow State.  Each dimension is at a different corner of the triangle.

In a 2021 paper, Michaela Greiler, Margaret-Anne Storey and Abi Noda defined Developer Experience as “how developers think about, feel about, and value their work“. Subsequently in 2023, along with Nicole Forsgren, they published a follow-up paper titled: “DevEx: What Actually Drives Productivity“. In that paper, they describe three dimensions of DevEx; Feedback Loops and Cognitive Load and Flow State. This post, as part of the series that looks at Strategy Deployment and other approaches, explores how these DevEx dimensions can be applied.

Developer Experience Strategies

Essentially, the three dimensions can be interpreted as strategies themselves:

  • Shortening Feedback Loops
  • Reducing Cognitive Load
  • Increasing Flow State

Using them as guiding policies in this way means that they can inform more tactical investments and initiatives. For example, implementing tooling and automation (e.g. Continuous Delivery) might help shorten feedback loops. Reorganising around value streams and platforms (e.g. Team Topologies) might help reduce cognitive load. Aligning and limiting WIP to strategic portfolios (e.g. Flight Levels) might increase flow state. As an aside, I’m also curious how the current buzz around AI might also help deliver on these strategies by providing faster feedback, being an external augmentation of memory, or reducing interruptions.

Developer Experience Evidence

The paper also discusses measuring DevEx, and suggests both qualitative and quantitative metrics for the three dimensions. See the table below. When I describe the outcomes which would demonstrate Evidence of a successful agile transformation, one of those outcomes is Sustainability. That is being able to continue to deliver successfully over the long term. DevEx can be highly correlated with Sustainability. Developers who have good experiences are likely to be happier, more engaged and work in a way which produces results over an extended period. There are also strong arguments for correlations to many if not all of the other outcomes. As a reminder, they are Productivity, Responsiveness, Predictability, Quality and Value.

Table of metrics for the three dimentsions of developer experience.

Some of these metrics are also associated with DORA and SPACE, which is not surprising given the involvement of Nicole Forsgren in all this research. Those bodies of work provide additional areas to look for DevEx evidence. More traditional Flow metrics (e.g. Cycle Time, Throughput, WIP etc) are also relevant.

Conclusion

Thus, Developer Experience, and specifically these three dimensions, provide a useful lens through which to look at Strategy Deployment for an agile transformation with the X-Matrix. The dimensions can be used as the Strategies themselves and the related measures for the Evidence. The exercise for the organisations and teams is then to correlate those to their Aspirations and decide what Tactics are the right ones to start making improvements.

Yes, You Can Measure Software Development Productivity

Hands moving quickly over a keyboard representing productivity

This post is (yet another) response to the McKinsey article “Yes, you can measure software developer productivity” Eagle-eyed readers might notice that while the title of his post is very similar to the original, it has one slight difference. I refer to “development” and not “developer” productivity. This is the key point for me. While there have been some excellent responses from Kent Beck and Gergely Orosz, and more recently Dan Terhorst-North, this is the point I want to emphasise in this post.

I have previously written about metrics and am a strong proponent of having data to guide improvement. Evidence is one of the elements of my TASTE model and the X-Matrix. So in all the justified backlash against measuring software developers, I don’t think we should throw the baby out with the bath water.

Measuring Outcomes

One of the mistakes that the McKinsey article makes is that it focusses on the developer. Thus it confuses productivity with being about developer activity. Dan describes this in much better detail than I can in his article. However, to summarise, productivity has nothing to do with how much code a developer can write.

Measuring the System

The follow-on from that is that productivity is more pertinent to teams, or teams of teams. Rather than measuring the activity of an individual, we should be measuring the outcome of the system. It doesn’t matter if every developer is cranking out hundreds and thousands of lines of code if the organisation is unable to get any of that code into production, and when it does it is full of bugs and delivers no value.

Balancing Metrics

That leads us to the fact that we shouldn’t overly focus on productivity alone. As I described in my earlier post on the evidence to look for in an agile transformation, I recommend 6 dimensions, of which productivity is just one. The other five are responsiveness, predictability, quality, sustainability and value. Focussing solely on productivity risks falling foul of Goodhart’s Law. This was popularised by Marilyn Strathern as “when a measure becomes a target, it ceases to be a good measure”. Having a balance of competing metrics can help avoid this. To be fair, the article does mention this at the end but it seems an afterthought.

Critical Factors

Having said all that, there was one thing that intrigued me.

I’m curious about the Developer Velocity Index benchmark. It has a horrendous name which I imagine puts a lot of people off even investigating it. However, it is described as something “which pinpoints the most critical factors (related to technology, working practices, and organizational enablement) in achieving Developer Velocity, as well as those that are not nearly as important as many executives and observers might believe”. With the caveat that we apply it to organisational productivity, I can imagine that being useful.

So, that’s my response. Just because we can measure software developer productivity, that does not mean we should. However, what is more important and more useful is to measure software development productivity, at the team, team of teams and organisational level.

The Evidence To Look For In A Successful Agile Transformation

Six forms of evidence which can indicate the sucess of an agile transformation.

I have recently been exploring in more detail how the X-Matrix might be used for an Agile Transformation. So far I have covered Aspirations, Strategies and Tactics. Following on from those, I will discuss Evidence in this post.

In some ways, this is probably the least controversial and builds on the work of two people who have influenced me greatly.

First is Larry Maccherone. I worked with Larry at Rally where he first published his research into The Impact of Agile Quantified. The second is Troy Magennis. Troy has built on Larry’s work and has written about the Six Dimensions of Team Performance. Consequently, it is these six dimensions that I use as the basis for describing evidence, albeit with some small tweaks in language. They describe important outcomes to look for as part of an Agile Transformation. The image to the right, which is one we use at TEKsystems Global Services, shows these dimensions.

Productivity

Evidence of productivity shows that work can be delivered in greater quantity. Throughput – the number of pieces of work per unit of time – is a good measure of productivity. For example, stories per week. Similarly, velocity can also be a proxy, although it can be too easily gamed. Additionally, the DORA metrics of deployment frequency can also correlate to productivity.

Predictability

Evidence of predictability shows that work can be delivered consistently and reliably. I talk about this in more detail in my post on how to measure the predictability of Agile. In that post, I recommend several measures. They include variability of cycle time, the amount of ageing work in progress or the number of blockers that are causing work to age.

Responsiveness

Evidence of responsiveness shows that work can be delivered quickly. Some form of cycle time is the most common way to measure responsiveness. That could be the full value stream cycle time. Or it could be a more qualified cycle time for a subset of the value stream. The DORA metric of lead time for changes is an example of this.

Quality

Evidence of quality shows that work is being delivered in the right way. The number of escaped defects that are reported after release is one simple way to measure quality. Additionally, some of the other DORA metrics are relevant here as well. Specifically, these are the mean time to restore and change failure rate. Finally, customer satisfaction can also correlate to quality, although this can also show evidence of value.

Sustainability

Evidence of sustainability shows that work can continue to be delivered in the long term. Two aspects of sustainability are important. Firstly, there is technical sustainability in terms of the state of the codebase and architecture. Code analysis tools which measure things like complexity, duplication or unit test coverage can be useful for this. A strong codebase and architecture will be more resilient to future changes. Secondly, there is human sustainability in terms of people’s ability to maintain their levels of performance. Employee or team engagement, or employee churn rates, can be useful measures for this. An unstable team with low morale runs the risk of losing valuable information and knowledge which will slow them down.

Value

Evidence of value shows that the right work is being delivered. Value as an aspiration overlaps somewhat with its use as evidence. Especially when it can often be very contextual and subjective for each organisation. However, strategic alignment can be a good indicator that the right work is being done. That is the percentage of the teams’ work that can be traced back to strategies, both intentionally and explicitly. Additionally, as noted earlier, customer satisfaction can also correlate to value.

It’s worth emphasising a few things for all these forms of evidence. Most importantly, none of these indicators are intended to be used as targets or incentives. As such, there needs to be a balance between them so that no single metric gets too much individual focus. Therefore, it is important to consider the impact of focusing too much on a metric, as well as focusing too little. There will have to be tradeoffs and they can’t all be perfect!

I haven’t given an exhaustive list here, and I’m always interested in learning about alternative indicators. So, if you have your favourites that I haven’t mentioned, please let me know!

How to Measure the Predictability of Agile

This post follows up a Twitter thread I posted in November exploring ways of measuring the predictability of teams. I also discussed some this these ideas in a Drunk Agile episode.

Fortuneteller
Fortuneteller by Eric Minbiole

When I begin working with an organisation on the agile transformation, an early conversation is around successful outcomes. My work on Strategy Deployment is all about answering the question “how do I know if agile is working?“.

Sometimes the discussion is around delivering more quickly, (i.e. being more responsive to the business and customer needs). Other times it is about delivering more often (i.e. being more productive to deliver more functionality). Both of these are relatively easy to measure. Responsiveness can be tracked in terms of Lead Time, and productivity can be measured in terms of throughput.

However, the area that regularly gets mentioned that I’ve not found a good measure for yet is predictability. In other words, delivering work when it is expected to be delivered.

Say/Do

Before I get into a few ideas, let’s mention one option that I’m not a big fan of – the “say/do” metric. This is a measure of the ratio of planned work to delivered work in a time-box.

Firstly, this relies on time-boxed planning. This means it doesn’t work if you’re using more of a flow, or pull-based process.

Secondly, the ratio is usually one of made-up numbers. Either story point estimates for Scrum or business value scores for SAFe’s Flow Predictability. This makes it far too easy to game, by either adjusting the numbers used or adjusting the commitment made. All it takes is to over-estimate and under-commit to make it look like you’re more predictable without actually making any tangible improvement.

System Stability

Another approach to predictability is to say that a system is either predictable, or it is not. With this frame, the concept of improving predictability is not valid. The idea builds on the work of Donald J. Wheeler, Walter A. Shewhart and W. Edwards Deming. These three statisticians would treat a system – in this case, a product delivery system – as either stable or unstable. An unstable system, with special cause variation, is unpredictable. By understanding and removing the special-cause variation, we are left with common-cause variation, and the system is now predictable.

For example, we can look at Lead Time data over time. For a stable system, WIP will be managed and Little’s Law will be adhered to. In this scenario, we can predict how long individual items take, by looking at percentiles. Different percentiles will give us different degrees of confidence. For the 90th percentile (P90), we can say that 90% of the time we will complete work within a known number of days.

With this approach, the tangible improvement work of making a system predictable is that of making the system stable. By removing the noise of special cause variation, we are able to make predictions on how long work will take and when it will be done.

Meeting Expectations

An alternative approach is to think of predictability as meeting expectations. Let’s assume I know my P90 Lead Time as described above. I know that there is only a 10% chance of delivering later than that time. However, there is still a 90% chance of delivering earlier than that time. Similarly, I might know my 10th percentile Lead Time (P10). This tells me that there is a 90% chance of delivering later, but only a 10% chance of delivering sooner.

If the distribution of the data is very wide, then there will be a wide range of possibilities. It is still difficult to predict a date between the “unlikely before” P10 date and the “unlikely after” P90 date. Thus it is difficult to set realistic expectations. Saying you might deliver anytime between 1 and 100 days is not being predictable. Julia Wester describes this well in her blog post on the topic using this diagram.

Seeing predictability at a glance on a Cycle Time Scatterplot from ActionableAgile
Seeing predictability at a glance on a Lead Time Scatterplot

With this approach, the tangible improvement work is reducing the distribution of the data to remove the outliers.

Inequality

One way of measuring this variation of distribution is to simply look at the ratio of the P90 Lead Time to the P10 Lead Time. (Hat-tip to Todd Little for this suggestion). This is similar to how Income Inequality is measured. Thus if our P90 Lead Time is 100 days, and our P10 Lead Time is 5 days, was can say that our Lead Time Inequality is 20. However, if our P90 Lead Time is 50 and our P10 Lead Time is 25, our Lead Time Inequality is 2. We can say that the lower the Lead Time Inequality, the more predictable the system is.

Coefficient of Variation

Another way is to measure the coefficient of variation (CV), which gives a dimensionless measure of how close a distribution is to its central tendency (Hat-tip to Don Reinertsen for this suggestion). The coefficient of variation is the ratio of the standard deviation to the mean. A dataset with a wide variation would have a larger CV. A dataset of all equal values would have a CV of 0. Therefore, we can also say that the lower the Lead Time Coefficient of Variation, the more predictable the system is.

Consistency

There are probably other statistical ways of measuring the distribution, which cleverer people than me will hopefully suggest. What I think they have in common is that they are actually measuring consistency (Hat-tip to Troy Magennis for this suggestion). A wide distribution of Lead Times might be mathematically predictable, but they are not consistent with each other. A narrow distribution of Lead Times are more consistent with each other and thus allow for more reliable predictions.

Aging WIP

One risk with these measures of Lead Time consistency is that there are essentially two ways of narrowing the distribution. One is to look at lowering the upper bound and work to have fewer work items take a long time. This is almost certainly a good thing to do! The other is to look at increasing the lower bound and work to have more items take a short time. This is not necessarily a good thing to do! That raises a further question. How do we encourage more focus on decreasing long Lead Times and less focus on increasing short Lead Times?

The answer is to focus on Work in Process and the age of that WIP. We can measure how long work has been in the process (started but not yet finished). This allows us to identify which work is blocked or stalled and get it moving again. Thus we can get it finished before it becomes too old. Measuring Aging WIP encourages tangible improvements by actively dealing with the causes of aged work. This might be addressing dependencies instead of just accepting them. Or it could be encouraging breaking down large work items into smaller deliverables (right-sizing).

In summary, I believe that measuring Aging WIP and Blocked Time will lead to greater consistency of Lead Times with reduced Lead Time Inequality and Coefficient of Variation, which will, in turn, lead to better predictions of when work will be done.

Caveat

A couple of final warnings to wrap up. The first is that these are just ideas at this stage. I’m putting them out here for feedback and in the hope that others will try them as well as me. Secondly, I am not promoting removing all variation completely. The following quote and meme seem appropriate given the seasonal timing of this post!

Deviation from the norm will be punished unless it is exploitable.

Measuring the X-Matrix

"Measure a thousand times, cut once"

Dave Snowden recently posted a series of blog posts on A Sense of Direction, about the use of goals and targets with Cynefin. As the X-Matrix uses measures in two of its sections (Aspirations and Evidence) I found that useful in clarifying my thinking on how I generally approach those areas.

Lets start by addressing Dave’s two primary concerns; the tyranny of the explicit and a cliché of platitudes.

To avoid the tyranny of the explicit, I’ve been very careful to avoid the use of the word target. Evidence was a carefully chosen word (after trying multiple alternatives) to describe leading indicators of positive outcomes. The outcomes themselves are not specific goals, and can be either objective or subjective. They are things we want to see more of (or less of) and should be trends, suggesting an increased likelihood of meeting Aspirations. Aspirations again was chosen to suggest hope and ambition rather than prediction and expectation. While they define desired results, those should be considered to be challenges and not targets.

To avoid a cliché of platitudes we need to focus on Good Strategy, beginning with the clear, challenging and ambitious Aspirations. I find it interesting that Dave cites Kennedy’s “man on the moon” challenge as a liminal dip into chaos, and Rumelt uses the same example as a Proximate Object source of power for Good Strategy. An ambitious yet achievable Aspiration helps focus on the challenges and opportunities for change. With proximate Aspirations, and a clear Diagnosis of the current situation, we can set some Guiding Policies as enabling constraints with which to decide what Coherent Action to take. Thus we can avoid fluffy, wishful, aimless or horoscopic Bad Strategy.

Put together, we have a set of hypotheses which are specific enough to avoid a cliché of platitudes, yet are speculative enough to avoid the tyranny of the explicit. We believe the Aspirations are achievable. We believe our Strategies will help us be successful. We believe our Tactics will move us forward on a good bearing. We believe the Evidence will indicate favourable progress. The X-Matrix helps visualise the messy coherence of all these hypotheses with each other and Strategy Deployment is the means of continuously testing, adjusting and refining them as we navigate our way in the desired direction.