Failure Is Not An Option

FAILBefore reading this post, I encourage you to have a go at this quick puzzle to test your problem solving.

Go on, you won’t regret it!

How did you do?

I’ve used this challenge numerous times in various talks and workshops and I find that the majority of people follow the same pattern. They only test their hypotheses by guessing sequences which they think will succeed in following the rule. Even when I’ve occasionally primed them by talking about the need for failure. Its only after repeated guesses that someone will eventually propose a sequence which does fail, and the “oohs” and “aahs” noticeably indicate learning.

This problem is a nice example to show that we learn more when we fail because we generate new information. Information Theory suggests maximum information is generated when the probability of failure is 50%. If we never fail then we must know everything already, and if we fail all the time then we must be repeating the same mistakes over and over again.

Of course, that doesn’t necessarily mean we always want to fail 50% of the time. We wouldn’t want any plane we fly in to have a 50% probably of crashing, but then we’re not really wanting to generate new information and learning when we fly. Context is important. However, when we are developing new products, we do want to learn, so a 50% failure rate is more appropriate. Failure is not an option, it’s a necessity!

That’s easier said than done though, so here’s some things I have learnt which may help.

Run Experiments

Firstly, to be open to failure we need to consider the assumptions we have made in taking decisions on what to do. We should treat those assumptions as hypotheses, and come up with ways to test them. A simple template which can be useful as training wheels is this one:

  • We believe that <solution>
  • Will result in <outcome>
  • We will know we have succeeded when <evidence>

Having said that, the puzzle this post opened with showed how we need our experiments to fail as well as succeed, so we also need to intentionally falsify our hypotheses. As Karl Popper wrote in The Poverty if Historicism: 

The discovery of instances which confirm a theory means very little if we have not tried, and failed, to discover refutations. For if we are uncritical we shall always find what we want: we shall look for, and find, confirmations, and we shall look away from, and not see, whatever might be dangerous to our pet theories. In this way it is only too easy to obtain . . . overwhelming evidence in favour of a theory which, if approached critically, would have been refuted.

Thus the evidence we look for should show that not only have we proved our hypotheses, but also that we have also failed to disprove them. This means moving from a fail-safe approach where we assume a low probability of failure, to a safe-to-fail approach where are can quickly and cheaply recover from failure without putting lives, careers or reputations at risk.

Illuminate Feedback

We need to run experiments because we are generally dealing with complex problems, where cause and effect are not repeatable and we can only explain results with retrospective coherence. We cannot rely on experts to prescribe good practice to follow and need to rely on emergent practice as a result of experimentation.

The consequence is that experts disagree, a tell-tail sign of a complex problem. Recognising a problem as complex, with no knowable solution, is the first step in breaking away from arguing over who is right it wrong. Instead we can use Chris Argyris‘ Ladder of Inference to explore why we have differing opinions and how we might reconcile views and achieve mutual learning to allow solutions to emerge.

The ladder has the following rungs:

  • Take Actions
  • Adopt Beliefs
  • Draw Conclusions
  • Make Assumptions
  • Interpret Meaning
  • Select Reality
  • Observe Reality

We naturally tend to climb to the top of the ladder, quickly taking action without considering the beliefs, conclusions, assumptions, meaning and reality that has led to the action. By climbing down the ladder we can understand both why we take certain actions, and why others take different actions. It is probably due to different beliefs, conclusions, assumptions, interpretation or selection of data.

This known as Assertive Inquiry, where we advocate for our view, while at the same time seeking to understand an alternate view. Discovering why we might recommend different action this way generates understanding which can lead to potential experiments to test various views.

We can think if this as shining a light on a problem. If we try and solve problems in the dark, we don’t see the information and feedback from which we can learn. I like the analogy of practising a golf swing in the dark. We won’t see where the golf ball ends up, so we can’t adjust accordingly.

This is similar to the Streetlight Effect; searching for something and looking only where it is easiest. Its like the “joke” about the drunk who has lost his car keys, and is looking for them under the lamppost. When asked why he is looking there, his response is that he won’t find his keys where there is no light.

Expect the Unexpected

Running experiments and searching for feedback in this way means intentionally widening the scanning range so that we are more likely to pick up on information that we might naturally ignore due to our inherent biases.

First we need to overcome the God Complex; an unshakable belief characterised by consistently inflated feelings of personal ability, privilege, or infallibility. In my last post, In the Lap of the Gods, I talked about how this can lead us to not even acknowledge the need to run experiments or search for feedback in the first place. It’s why disagreement can be healthy and why we need to create safe environments where people who can openly challenge our thinking.

Then there is Cognitive Dissonance; the inner tension we feel when our beliefs are challenged by evidence. Whenever I walk down a stationary escalator, something seems wrong. My brain is expecting movement but my body doesn’t experience any. It really should be just like walking down stairs, but it does’t feel that way. My belief is that the escalator is moving, but the evidence is that it is not, and our natural inclination is to believe our beliefs and ignore the evidence. Thus we dismiss any contrary feedback we receive as being wrong.

Related to this is Confirmation Bias; the tendency to favour information that confirms one’s pre-existing beliefs or hypotheses. This is what generally leads us to only try and prove our hypotheses, but also means that we only notice feedback which does prove our hypotheses. Again any contrary feedback we receive is ignored.

The situation is further complicated by Survivorship Bias; concentrating on the people or things that made it past some selection process and overlooking those that did not. A great example of this is the story of Abraham Wald, a statistician  during World War 2. We was tasked with prioritising where to reinforce planes with better armour to increase survival rates given that the planes’ weight limited the amount of armour possible. Available data from surviving planes showed the following patterns of damage and common theory was to reinforce those most damaged areas.

However, Wald’s insight was that this damage was from planes which had returned, and therefore could survive damage in those areas. As a result, it was likely that planes which did not survive would probably have been hit in the undamaged areas, and this is where any reinforcement should go. Thus we need to pay attention not just to the information that we can see from our experiments, but also consider any information that we don’t see from failed experiments.

And then there is Availability Bias; relying on immediate examples that come to mind when evaluating a specific topic, concept, method or decision. An example is whenever someone mentions a new model of car to us, and suddenly we see that model everywhere. Its not that its suddenly appearing more often, its just that our brain notices it more often because its more recently available for recall. Our preferred hypotheses will be more available, so we are more likely to notice feedback which relates to them.

So when considering a hypotheses its easy to notice information which is more immediately available, which survives experiments, and which confirms our opinions, formed from the belief that we are experts. And this is just a handful of the huge list of cognitive biases on Wikipedia!

Summary

There’s a nice acronym which suggests that a FAIL is a First Attempt In Learning. I use that to highlight that failure is not something that we should shy away from and treat as an enemy, but something we should embrace and befriend. That doesn’t mean encouraging and celebrating failure though. Too much failure is as bad as not enough.

What’s needed is Critical Thinking; the intellectually disciplined process of actively and skilfully conceptualising, applying, analysing, synthesising, and evaluating information to reach an answer or conclusion. The Backbriefing & Experiment A3s I use are intended to encourage this by helping focus on the three areas in this post –  running experiments, illuminating failure, and expecting the unexpected. 

To close, I would recommend Black Box Thinking by Matthew Syed to explore these ideas in more depth. The metaphor comes from the aviation industry, and in particular comparing it to the healthcare industry. Syed cites a 2013 study published in the healthcare Journal of Patient Safety which put the number of premature deaths associated with preventable harm at more than 400,000 per year. That is the equivalent of two 747 airliners crashing every day. His compelling argument is that the aviation industry, where you really don’t want failures, is actually extremely safe because of the way it uses Black Boxes to conscientiously learn from any failures when they do occur. The healthcare industry, on the the hand, has a history of brushing failures under the carpet as inevitable and just the nature of the job.

We would do well to take note, and be more like the aviation industry, applying the same attitude and discipline so that we can befriend and learn from failure.

Insights from a Kanban Board

Donkey DerbyI was working with a team this week, part of which involved reviewing their kanban board and discussing how it was evolving and what they were learning. There was one aspect in particular which generated a number of insights, and is a good example of how visualising work can help make discoveries that you might not otherwise unearth.

The board had a “On Hold” section, where post-its were collected for work on which some time had been spent, but which was not actively being progressed or had any expectation to be progressed, and which wasn’t blocked in any way. Quite a lot of work had built up in “On Hold”, which made it an obvious talking point, so we explored what that work was, and why it was there. These are the things we learnt:

  1. Some of the work wasn’t really “On Hold”, but were really requests which had been “Assessed” (the first stage in the workflow) and deemed valid and important, but not urgent enough to schedule imminently. This led to a discussion about commitment points, and a better understanding of what the backlog and scheduling policies were. In this case, items were not really “On Hold”, but had been put back on the backlog. In addition, a cadence of cleaning the backlog was created to remove requests that were no longer valid or important.
  2. Some of the work was “On Hold” because while it was valid and important, the urgency was unknown. It was an “Intangible” class of service. As a result it was always de-prioritised in favour of work where the urgency was clearer. For example work to mitigate a security risk wasn’t going to be urgent enough until the risk turned into a genuine issue which needed resolving. To avoid these items building up, and generating even greater risk, a “Donkey Derby” lane was created as a converse of their “Fast Track” lane. Work in this lane would progress more slowly, but at least there would always be at least one “Intangible” items in process.
  3. A very few items were genuinely “On-Hold”, but they were lost amidst the noise of all the other tokens. Thus any discussion about what to do, or what could be learned, was being lost.

In summary, by visualising the “On Hold” work (however imperfect it was), and creating a shared understanding, the team were able to improve their knowledge creation workflow, better manage their risk and increase their ability to learn from their work.

Estimates as Sensors

Note: this is not an April Fool – honest!

I’ve been watching the #NoEstimates conversations with interest and decided its about time to pitch in with my own perspective. I don’t want to take any ‘side’ because like most things, the answer is not black or white. Estimates can be used for both good and evil. For me they are useful as a sensing mechanism. Put another way, by estimating, I can get a sense of how well I know my actual capability.

Lets take an example. I’m taking part in a 10K run this Sunday (*) and I am estimating that I will complete the distance in 55 minutes. This is based on an understanding of my capability from participating in regular 5K runs, and more general training runs over a 10K distance. I’ve never run an official 10K race, let alone this course, and I don’t know what the conditions will be like, but I’m aiming for 55 minutes. If I run quicker and do better than my estimate, then my actual 10K capability is better than I thought. If I run slower and do worse than my estimate, then my actual 10K capability is worse than I thought. Either way,  I will learn something about how well I know my 10K capability.

What helps even more with that learning is regular feedback! I use MayMyRun on my phone to track  progress and give me feedback every kilometre for total time, average pace and last split. This could be considered a distance-based cadence. I could probably also use a time-based cadence to give me equivalent feedback every few minutes. This feedback on a regular cadence helps me decided whether my estimate was way off, or if I should slow down because my pace is probably unsustainable, or speed up because I feel I can push harder.

How does this relate to product development? Well, we can use estimates in the same way to get a sense of a teams delivery capability, and use regular feedback to learn whether we’re making the progress we thought, and need to re-plan, slow down or speed up. Time-boxing, with Story Point estimates and Velocity can provide this time-based cadence and feedback. Alternatively, how long it takes to complete 10 User Stories can be used as a distance-based cadence and feedback.

To sum up, I recommend using estimates to sense capability and create feedback for yourself. I don’t recommend using them to make promises and give guarantees to others. Or maybe we could just call them sensors instead of estimates?

(*) Of course this post is primarily an excuse to invite readers to sponsor me. If you’re so inclined, or would like to show support for this post in a charitable and financial way, then please head over to my JustGiving page and do so there 🙂

Update: My final time was 49:23 based on the timing chip, or 49:30 from the starting gun. I’ve learned that I’m better than I thought I was, and next time I’ll be estimating 50 minutes!

Kanban Values, Impacts and Heuristics

A recent thread discussing the values behind kanban on the kanbandev mailing list inspired a couple of great blog posts by Mike Burrows on “Introducing Kanban Through Its Values” and “Kanban: Values Understanding And Purpose“, which have in turn inspired me make some updates to the Kanban Thinking model.

The key points for me in Mike’s second post are these. First,

We often say what the Kanban method is (an evolutionary approach to change) without saying what it is actually for! Change what? To what end?

and then,

The Kanban method is an evolutionary approach to building learning organisations.

Impact

I have a different take on the values discussion and how they help answer the question “to what end?” I’ve come to the view that articulating values is not a useful exercise because they often end up being things that anyone could espouse. One alternative is to use narratives and parables to describe the values in action. With Kanban Thinking, I prefer to talk about the desired impacts of a kanban system. Knowing what impact we want the kanban system to have, and how to measure that impact, will inform our system design decisions.

Thus, in answer to the question “to what end?”, Kanban Thinking suggests 3 impacts; improved flow (demonstrated in terms of productivity, predictability or responsiveness), increased value (demonstrated in terms of customer satisfaction, quality or productivity) and unleashed potential (demonstrated in terms of employee satisfaction, quality or responsiveness).

Heuristics

Mike suggests that the purpose of a kanban system is to learn, and in light of the above, that would be to learn how best to have maximum impact. Up until now, I have talked about five leverage points (or levers) on a kanban system, with Learn being one of those levers. As a result of the insights I had from Mike’s post I have switched to referring to those five elements as heuristics rather than levers, with the fifth heuristic changed from Learn to Explore.

This is one definition of heuristic:

involving or serving as an aid to learning, discovery, or problem-solving by experimental and especially trial-and-error methods.

and

of or relating to exploratory problem-solving techniques that utilize self-educating techniques (as the evaluation of feedback) to improve performance

Thus, the five (updated) heuristics of Study, Share, Limit, Sense and Explore help with the learning about a kanban system in order to have the desired impacts of improved Flow, increased Value and unleashed Potential.

Exploration is a more active description of what I originally intended by Learning as a then lever. Exploration requires curiosity (another value suggested by Mike) and experimentation to try things out, observe the results, and amplify or dampen accordingly.

That leaves the updated Kanban Thinking model looking like this:

IMG_0065

The Science of Kanban – People

This is the second part of a write-up of a talk I gave at a number of conferences last year. The previous post was the Introduction.

Software and systems development is acknowledged to be knowledge work, performed by people with skills and expertise. This is the basis for the Software Craftsmanship movement, who are focussing on improving competence, “raising the bar of software development by practicing it and helping others learn the craft.” A kanban system should, therefore, accept the human condition by recognising sciences such as neuroscience and psychology.

Visualisation

Neuroscience helps us understand the need for strong visualisation in a kanban system. Creating visualisations takes advantage of the way we see with our brains, creating common, shared mental models and increasing the likelihood that the work and its status will be remembered and acted on effectively.

Vision trumps all our other senses because our brains spend 50% of the time processing visual input. Evidence of this can be found in an experiment run on wine-tasting experts in Bordeaux, France. Experienced wine-tasters use a specific vocabulary to describe white wine, and another to describe red wine. A group in Bordeaux were given a selection of wines to taste, where some of the white wines had an odourless and tasteless red dye added. These experts described the tainted white wine using red wine vocabulary because seeing the wine as red significantly influenced their judgement. The same experiment has apparently also been run on novice wine-tasters who were less fooled, showing the danger of how experts becoming too entrained in their thinking.

Visual processing further dominates our perception of the world because of the way our brain takes the different inputs from each eye. For each index card on a board, instead of seeing two, one from each eye, the brain deconstructs and reconstructs the two inputs, synthesising them into a single image by making up and filling the blanks based on our assumptions and experiences. Thus what we end up with is a mental model created by the brain, and the kanban board helps that mental model to be one that is shared between the team.

The more visual input there is, the more likely it is to be remembered, en effect known as the pictorial superiority effect (PSE). Tests have shown that about 65% of pictorial information can be remembered after 72 hours, compared to only 10% of oral information. Visualising work status and other dimensions on a kanban board can, therefore, increase the chances of that information having a positive influence on outcomes.

Multitasking

Neuroscience also explains one of the benefits of limiting WIP; that of reducing multitasking.

Multitasking, when it comes to paying attention, is a myth. The important detail here is that it is when it comes to paying attention. Clearly we can walk and talk at the same time, but if we have to concentrate on climbing over an obstacle we will invariably stop talking. The brain can only actively focus on one thing at a time and studies have shown that being interrupted by multitasking results in work taking 50% longer with 50% more errors. For example, drivers using mobile phone have a higher accident rate than anyone except very drunk drivers. In other words, multitasking in the office can be as bad as being drunk at work!

Other studies have shown that effectiveness drops off with more tasks. In “Managing New Product and Process Development: Text and Cases” by Clark and Wheelwright show that when someone is given a second task, their percentage of time on valuable work rises slightly because they are able to keep themselves busy when they are blocked. However, beyond that, with each additional task the time spent adding value reduces.

scotland_karl_02

Gerald Weinberg suggests similar results in “Quality Software Management: Systems Thinking” when he reports that for each additional project undertaken, 20% of time is lost to context switching.

scotland_karl_03

Learning

Recognising the way we deal with challenging situations, and how we can change, enables us to deal with the visibility and transparency that a kanban system creates in order for us to learn and improve.

As humans, we are natural explorers and learners. We evolved as the dominant species on the planet by constantly questioning and exploring our environment and trying out new ideas. However, when faced with difficult situations, we tend to react badly. Chris Argyris coined the term Action Science to describe how we act in these situations. The natural reaction is single loop learning, where we resort to our current strategies and techniques and try to implement them better. A more effective approach can be double loop learning, where we question our assumptions and mind-set in order to discover new strategies and techniques.

scotland_karl_04

Another relevant model is Virginia Satir’s Change Model which describes how our performance dips into the valley of the ‘J-Curve’ while we adjust to a new way of being. Being aware of the dip, its depth, and our response it, can inform an appropriate approach to influencing change.

scotland_karl_05

Next we’ll cover the science of process.

Running the Ball Flow Game

I previously wrote about the Ball Flow Game I ran at the Scrum Gathering in Amsterdam. I’ve updated the game quite a bit since then and its stabilised into something I’m finding very useful when I work with Kanban teams to help them understand some of the concepts behind Kanban Thinking. I hope this write-up enables others to use and run the game.

To recap, the Ball Flow Game is based on the Ball Point Game, with the following rules:

  • Participants play as a whole team
  • Balls must have air time between people
  • No passing to a direct neighbour
  • The start person must be the end person

The aim is to complete 20 balls (as opposed to complete as many in 2 minutes). Thanks to Rally colleague Eric Willeke I now add some fun context as well. I tell the team that they are producing magic balls. Magic is added to a ball only when everyone has touched it. If two people touch a ball at the same time, the magic is dispersed. Further, magnetic fields mean that passing a ball to a direct neighbour also stops magic being added. The start/end person is the customer wanting magic to be added to the balls. Its silly, but it adds an extra element of fun, and reinforces that the rules are constraints in the context that can’t be changed.

I use a spreadsheet to help automatically capture data about the performance of the team. (Download the template). It works using four macros, which are assigned to buttons and hot-keys:

  • Begin (Ctl-B). Begins a round by starting a timer.
  • In (Ctl-I). Captures the time a ball is added into the system.
  • Out (Ctl-O). Captures the time a ball come out of the system.
  • End (Ctl-E). Ends a round by calculating the metrics.

Once the metrics are calculated, three charts on the worksheet will populate themselves.

  • Lead Time. The time each ball took to work its way through the team (assuming balls enter and exit in the same order). The dotted line is the average.
  • Throughput. The number of balls the team completes every 10 seconds
  • Cumulative Flow Diagram. The number of balls either not started, in progress or done every 10 seconds

Upper and Lower Control Limits can also be calculated and displayed for Lead Time. They are currently commented out in the macro code because I found that they weren’t necessary for the core learning objectives. You may also need to tweak some of the macro functions for different versions of Excel (I use Windows Excel 2010). The template has worksheets for 5 rounds. For additional sheets, simply one of the existing rounds.

One of the things I’ve noticed is how the dynamics of the conversations are different from when I run the traditional Ball Point Game with teams. In particular, the team I was working with today really understood the idea of scientific management and made very small, quick and focussed changes each round as experiments, hypothesising on how the metrics would change. With the Ball Point Game I find teams want to debate in depth all the options in their attempts to get an improvement.

Note that as I said in a post of Balanced Software Development, “scientific management is still relevant for knowledge work, when the workers are the scientists.”

Here are the results. (Apparently we started with only 19 balls due to a facilitator error!)

Round 1

In this round, the customer ‘pushed’ balls into the system when he could. You can see the lead time increase as the WIP increases in the CFD, up to a point when a natural system limit is reached. Towards the end the customer stops adding more balls to let the system flush itself out a bit, which shows as the step in the CFD at about 60 seconds, the drop in lead time, and the spike in throughput. The final two balls went through when the system was virtually empty. Notice the short lead times again!

image

Round 2

In this round the team decided to limit the WIP to 6 – one per person. However, interestingly (to me at least) they decided to batch them, by getting all 6 through before they started the next 6. Lead time is much more stable this time. The spike is because the customer forgot to receive the last ball of the 1st batch before starting the 2nd batch, so it got stuck! Throughput is wildly variable though because nothing comes out for a while, and then all 6 come at once. You can see the batching in the ‘steps’ in the CFD.

image

Round 3

In this round the team wanted to experiment with an low extreme WIP limit of 1. Lead time is significantly better, but throughput is low because there is too much slack in the system now. Notice the smooth flow in the CDF. The patches where WIP drops to zero are because the customer’s process between receiving a ball out, and adding a ball in, added a noticeable delay.

image

Round 4

In this round the team increased the limit to 3, but continued to process those 3 in batches. Lead time remains the same, but with improved throughput. The CFD still shows signs of the batching with the customer delay between batches, but is much steeper due to the improved throughput.

image

Round 5

Finally, the team decided to remove the batches and solve the customer’s delay issue by re-organising themselves. This time the customer added 3 balls into the system, and then only added another when one came out. Lead time is slightly increased, probably due to the fact that there were always 3 balls in process which added some complexity. Throughput is improved again though, and the CFD is looking very smooth.

image

As you can see, the team made significant improvements over the five rounds by making small changes informed by the data they had about their flow and capability.

A Model for Creating a Kanban System

This post is a high level overview of the model I use when I think about Kanban Systems. As the saying goes, “all models are wrong, some are useful”. This is what I currently find useful based on working with teams and organisations in recent years.

At the heart of the model is Systems Thinking. Without looking at what we do as part of a system, with a purpose to be met by outcomes, we risk focusing too heavily on the activities and practices we perform. Having a clear understanding of a systems purpose, from a customers perspective, helps us to design a method which serves that purpose.

The model then has three foundational building blocks which underpin an effective process; Flow, Value and Capability.

  • Flow – Keeping the work progressing and avoiding delays by focusing more on the movement of the work, and less on the movement of the worker.
  • Value – Ensuring that the work serves the system’s purpose, satisfying customers and stakeholders and resulting in successful organisations.
  • Capability – Creating knowledge of how well the work serves the system’s purpose in order to maintain and improve the system’s effectiveness over time.

In other words, we want to flow value through capability teams.

Finally, the model has five aspects, from which we can look at a process to help us understand and improve it; Workflow, Visualisation, Work in Process, Cadence and Learning.

  • Workflow – how does the work progress through the system? Understanding workflow helps improve how the work moves from concept to consumption.
  • Visualisation – where is the work in the system? Understanding visualisation helps create a common mental model of the current state of the work.
  • Work in Process – what work is in the system? Understanding Work In Process helps identify bottlenecks and impediments to improving flow.
  • Cadence – when does the work in the system happen? Understanding Cadence helps with co-coordinating the work and improving system reliability.
  • Learning – how does the system continuously improve? Understanding further models with which to view and explore the system ensures the system gets better at serving its purpose.

While this is only a model, and contains no specific practices, I believe that it can be useful in describing why some techniques work in some circumstances, and provide context for applying the right tool to the right job.