Why we will no longer be estimating work

in Delivery

I work as a project manager in the Foxhound dev team at Code. Andy from Foxhound has recently added a post discussing some of our team metrics relating to estimates, and it has been a discussion point in the team for a few weeks as we've considered what these metrics mean for us and how it potentially changes our approach with clients.

Recent team history

Despite changes in personnel recently, the team has been on a steady journey of improvement on process and estimation for the past 18 months, as Code has moved away from being a scrum agency working in two week sprints to a lean studio using kanban and continuous integration.

I've been a follower of the #NoEstimates debate and am in agreement with the arguments against estimates and the harm they can cause. As a team, we've been moving towards #NoEstimates almost as a logical conclusion of the practises we're following, but this hasn't been an aim in its own right.

We have arrived at the conclusion that now is the right time for us to move away from estimating, based on evidence that shows us they are now hampering our progress and failing in the remaining areas we thought they were still useful.

How we use complexity points currently

Although we already take estimates with a heavy pinch of salt, would never use them to create a fixed project scope or cost and are clear that there are likely to be lots of changes to estimates as our understanding of a task increase, the ways that I can think we might still use the complexity scores currently are:

  • Planning approximately how much work we can do each week
  • Determining priority when two tickets of similar levels of urgency appear together
  • Tracking the team's velocity and using this as a target for future improvement
  • Determining if a ticket is 'too big' and needs further breakdown
  • Providing costs to clients to decide if work should be done

1. Planning approximately how much work we can do each week

We currently use complexity points to state that we can do approximately 30 points per week and use this value to line tickets up in weekly columns, giving us a quick view of what we roughly expect to be achieved by the end of each week. Should it be needed, we would know that based on our average velocity, ticket X would be done during a particular week, whether that's this week or 4 weeks away based on where it currently sits in the list of priorities.

The usefulness of complexity points here would be that they allow us to 'correct' or normalise the fact that we might have one week containing just a few large tickets and another week containing lots of small tickets - by stating this in advance, we can get a better balance of tickets together. In other words, we would expect to see higher variance in the number of tickets per week and less variance in total number of complexity points.

What we saw when we looked at the data was surprising - there was no such 'levelling out' of the complexity line, the two numbers were so closely linked every week that the correlation was quite startling.

Complexity points versus number of tasks

The conclusion to be drawn here is that planning using the number of tasks is just as reliable as planning using complexity points. (During the period recorded here, they were both rather unreliable - something we need to work on improving.)

2. Determining priority when two tickets of similar levels of urgency appear together

I don't actually know how often this is true for our clients, but it is definitely one way that estimates are used elsewhere - and I'm sure occasionally that's true for us and our clients too. Given a situation when all else appears to be relatively equal and no ticket is clearly 'urgent', we would likely prioritise the lower estimate ticket.

As Andy's previous post showed, the data from our team shows that if our estimates are being used to make such a decision, our 'small' ticket actually has a fairly high chance of taking longer than the 'big' ticket.

Time taken to code a story against estimated size

The bottom axis shows our story sizes.
1 = a text change, for example. These are quite rare.
8 = what we call 'too big'. We have a team rule not to let these through the board, but occasionally breaking the ticket down further is not possible and we proceed with caution. These are also rare.

87% of the tickets we've worked on are 2s, 3s or 5s, so these are the sizes we're interested in.

The chart above shows that there is almost no difference at all in how quickly we complete a story regardless of whether it's been estimated as a 2, 3 or 5. Were our estimates working as they are expected to, we would see 2s batched lower on the dev time axis and 5s batched higher - this is clearly not the case. The longest ticket in the sample was estimated as a '3'.

(We also have a small batch of data for doing no estimates already - the 0 column shows tickets that have progressed through dev without being estimated. This sample fits exactly the same pattern as the 2s, 3s, and 5s!)

The conclusion from this graph is that our estimates are not a good guide to how long a ticket will take to complete and, furthermore, if a client is basing their priorities on these estimates, we are actually misinforming them - the 'small' ticket they prioritise is just as likely to take longer than the 'big' ticket that was disregarded.

We are already using a much better way to prioritise work anyway - the ticket that has the most value should be done next, not the ticket that costs the least.

3. Tracking the team's velocity and using this as a target for future improvement

We track velocity as the number of complexity points per week, and this has really been our main measurement of team performance and improvement to date.

This is however the wrong metric to base success on, and moving away from complexity points would be a good opportunity to refresh how we judge ourselves.

I think that the reasons that velocity is a bad team performance indicator are best summed up by the ways that we can increase velocity

  • go 'fast' - cut corners, make mistakes, lose morale
  • estimate higher - if we double all our estimates, we double our velocity. Lie, in other words
  • actually improve - increase platform knowledge, increase client knowledge, gain skills, work better as a team
  • decrease waste

Clearly, we're most interested in the bottom two. Using velocity as the measure doesn't feel like the best way to achieve those. Our challenge as a team as we move away from estimating will be to improve what we do track, and we will update on our choices once we have some data. As a start...

  • cycle time - will show us a better measure of 'efficiency' and embeds things like ticket quality, how fast we remove blockers, how quick we are to get tickets through QA into the number. We can also use it to plan better.
  • waste tracking - we currently only tend to tackle waste in an anecdotal way via retros and process improvements - making waste reduction a formal team metric should help us to go further in identifying and reducing waste
  • consistency - the chart in point 1 above shows that we've had an inconsistent few months in terms of what we deliver, due in part to turnover in the team. Now that everybody is settled in, we'd like to see this level out. Tracking consistency will help us to see this and address if we need to.

Further reading: Velocity: the Killer of Agile Teams by Tomas Kejzlar

4. Determining if a ticket is 'too big' and needs further breakdown

This is still crucial, and for every ticket we work on we'll ask this question. We don't need to give it a value to know it's too big and the team's ability to slice stories in the best way is already well established.

The time that we'll save by not estimating will be used instead to have a more detailed discussion on the content of the ticket, ensuring we have determined the value of the ticket, added test criteria and included all key stakeholders in the discussion (developer, tester, product owner).

5. Providing costs to clients to decide if work should be done

This is actually not true for our team at all, but does need to be addressed for this post to be useful for others.

Rather than cost on a per-project or per-feature basis, the clients in our team have committed to a fixed monthly cost for the rest of the year. This has been very successful and has saved a lot of admin time within the team which we've been able to offer back to those clients as a discount incentive for their commitment.

However, before this, we used the average weekly velocity of the team to work out the 'cost per point' and costed each ticket (or batch of tickets) based on this. Using a 12 week average gave us the knowledge that as long as we didn't start to go consistently slower, our costs would be fair and our projects would be close to on-budget.

This is the best approach to costs that I've used to date - and the data above shows that using the number of tickets per week or average cycle time would be just as reliable a metric to use.

More comprehensive detail on providing costs while practising #NoEstimates will be better supported elsewhere, and a good starting point would be the NoEstimates book by Vasco Duarte.

Conclusions

We have solid data that shows that even in the few areas where we still thought estimates helped us out, they are actually having a negative effect - planning is no better than just counting the number of tasks, prioritisation is best done on expected value rather than expected cost and velocity is not the way we want to track performance in future.

We don't currently spend a huge amount of time estimating tickets, but all the time that we do spend would be much better spent on improving the quality of the ticket itself - adding acceptance criteria, completing deeper analysis and establishing how we will track the ongoing success of the ticket once it is live.

Future posts will cover how we approach this with clients and updates on progress, particularly which metrics we use for team performance.