estimates – use-cases.org – Matthew Tippett's Blog

Estimating for Software is Like Estimating for Skinning a Cat

As I’ve mentioned a few times, estimation is an imprecise art. There are ways to increase accuracy of the estimation either through consensus based estimation or other methods. This post explores why estimations are hard and why the software world struggles to find tools, techniques and methods that provide for consistent and accurate estimations.

I’ve recently been playing with Codewars (connect with me there) and have been intrigued by the variance of the solutions that are provided. In particular, you have the “smart” developers who come up with one-liners that need a PhD to decode, tight code, maintainable code, and then clearly hacked till it works code. This variance is likely the underlying reason for the difficulty in getting consistently accurate estimates.

Read on for more some of the examples that I have pulled from the Convert Hex String to RGB kata. The variance is quite astonishing. When you dig deeper into the differences, you can begin to see the vast differences in approaches. I’m not going to dig deeper into my personal views of the pro’s and cons of each one, but it did provide me a lightbulb moment as to the how software in particular always going to be difficult to estimate accurately.

I came up with an interesting analogy when pushed by a colleague on systematic ways of estimating. His assertion was that other industries (building in his example) have systematic ways of doing time and cost estimates for doing work. While not diminishing the trades, there are considerably less degrees of freedom for, say, a framer putting up a house frame. The fundamental degrees of freedom are

experience – apprentice or master
tools – Nails or nail gun
quality of lumber – (not sure how to build this in)

For an master carpenter, framing will be very quick, for an apprentice, it will be slow. The actual person doing the framing will likely be somewhere in between. Likewise, using nail and hammer will be slow, and a nail gun will be fast. The combination of those two factors will be the prime determinant of how long a piece of framing will take to complete.

I code however, we bring in other factors that need to be include in estimates, but are typically not considered until the code is being written. Looking at the examples below, we see the tools that are available.

Array utility operators (ie: slice)
string operators (ie: substring)
direct array index manipulation (array[index])
regular expression (.match)
lookup Tables (.indexOf)

Using each of these tools has an general impact on the speed that the code can be written, and the speed that it can be debugged and the general maintainability of the code.

With this simple example, I feel that the current practice of “order of” that is popular in agile is sufficient for the type of accuracy that we will get. Fibonacci, t-shirt sizes, hours/days/weeks/months/quarters, are really the among the best classes of estimates that we can get.

function hexStringToRGB(h) {
  return {
    r: parseInt(h.slice(1,3),16),
    g: parseInt(h.slice(3,5),16),
    b: parseInt(h.slice(5,7),16)
  };
}

function hexStringToRGB(hexString) {
  return {
    r : getColor(hexString),
    g : getColor(hexString, 'g'),
    b : getColor(hexString, 'b')
  };
}

function getColor(string, color) {
  var num = 1;
  if(color == "g"){ num += 2; }
  else if(color == "b"){ num += 4; }
  return parseInt("0x" + string.substring(num, num+2));
}

function hexStringToRGB(hs) {
  return {r: parseInt(hs[1]+hs[2],16), g : parseInt(hs[3]+hs[4],16), b : parseInt(hs[5]+hs[6],16)}
}

function hexStringToRGB(s) {
  var foo=s.substring(1).match(/.{1,2}/g).map(function(val){return parseInt(val,16)})
  return {r:foo[0],g:foo[1],b:foo[2]}
}

function hexStringToRGB(hexString) {
  var hex = '0123456789ABCDEF'.split('');
   hexString = hexString.toUpperCase().split('');

  function hexToRGB(f,r){
    return hex.indexOf(f)*16 + hex.indexOf(r);
  }

  return {
    r: hexToRGB(hexString[1],hexString[2]),
    g: hexToRGB(hexString[3],hexString[4]),
    b: hexToRGB(hexString[5],hexString[6])
  }
}

About Skinning the Cats

“There are many ways to skin a cat” has it’s earliest print etymology in Money Diggers article in the 1840’s Gentleman’s Magazine, and Monthly America Review. Even that reference implies it is a phrase already in use before then. Generally the phrase implies there are many ways to achieve something. In the context of this article, the analogy is that even simple tasks like writing a hex to RGB converter can be achieved in may different ways.

As always, vehemently opposed positions are encouraged in the comments, you can also connect with me on twitter @tippettm, connect with me on LinkedIn via matthewtippett, and finally +Matthew Tippett on google+.

High Confidence/Low Information vs High Accuracy/Low Information Estimates

Quite often estimates are needed where there is low-information, but a high-confidence estimate is required. For a lot of engineers, this presents a paradox.

How can I present a high confidence estimate, when I don’t have all the information?

Ironically, this issue is solved fairly easy by noting the difference between high confidence and high accuracy estimate. A high confidence estimate is defined by likelihood that a task will be completed within a given timeframe, while a high accuracy estimate provides a prescribed level of effort to complete the task. This article presents a method of balancing a high confidence estimate balancing analysis effort against accuracy.

This is a refinement on the “Getting Good Estimates” posting from 2011.

The Estimation Model

The basis for this method is captured in the diagram below. The key measures on the diagram are

Confidence, the likelihood that the task will be completed by a given date (Task will be complete in 15 days at 90% confidence)
Accuracy, the range of effort for an estimate (Task will be complete in 10-12 days)
No Earlier Than, absolute minimum effort for a task.

In general, I never accept a naked estimate of a number of days. An estimate of a range will usually imply a confidence level. An estimate of a confidence level may or may not need an indication of accuracy – depending on context for the estimate.

Gaming out the Estimate

As a refinement to the method outlined in Getting Good Estimates, the same technique of calling out numbers can be used to pull out an estimation curve from an engineer. The method follows the same iterative method outlined in Getting Good Estimates By asking the question, “What is the confidence that the task would be complete by date xxx?, you will end up with a results similar to

Question	Answer
What’s the lowest effort for this task?	2 weeks
What’s the likelihood it will task 20 weeks	100% (Usually said very quickly and confidently)
What’s the likelihood it will take 10 weeks	95% (Usually accompanied a small pause for contemplation)
What’s the likelihood it will take 5 weeks	70% (Usually with a rocking head indicating reasonable confidence)
What’s the likelihood it will take 4 weeks	60%
What’s the likelihood it will take 3 weeks	30%
What’s the likelihood it will take 2 weeks	5%

That line of questions would yield the following graph.

I could then make the following statements based on that graph.

The task is unlikely to take less than 2 weeks. (No earlier than)
The task will likely take between 4 and 8 weeks (50-90% confidence)
We can be confident that the task will be complete within 8 weeks. (90% confidence)
Within a project plan, you could apply PERT (O=2, M=4[50%], P=8[50%]) and put in 4.3 weeks

Based on the estimate, I would probably dive into the delta between the 4 and 8 weeks. More succinctly I would ask the engineer, “What could go wrong that would cause the 4 weeks to blow out to 8 weeks?”. Most engineers will have a small list of items that they are concerned about, from code/design quality, familiarity with the subsystem to potentially violating performance or memory constraints. This information is critically important because it kick starts your Risk and Issues list (see a previous post on RAID) for the project. A quick and simply analysis on the likelihood and impact of the risks may highlight an explicit risk mitigation or issue corrective action task that should be added to the project.

I usually do this sort of process on the whiteboard rather than formalizing it in a spreadsheet.

Shaping the Estimate

Within the context of a singular estimate I will usually ask some probing questions in an effort to get more items into the RAID information. After asking the questions, I’ll typically re-shape the curve by walking the estimate confidences again. The typical questions are:

What could happen that could shift the entire curve to the right (effectively moving the No Earlier Than point)
What could we do to make the curve more vertical (effectively mitigate risks, challenge assumptions or correct issues)

RAID and High Accuracy Estimates

The number of days on the curve from 50% to 90% is what I am using as my measure of accuracy. So how can we improve accuracy? In general by working the RAID information to Mitigate Risks, Challenge Assumptions, Correct Issues, and Manage Dependencies. Engineers may use terms like “Proof of concept”, “Research the issue”, or “Look at the code” to help drive the RAID. I find it is more enlightening for the engineer to actually call out their unknowns, thereby making it a shared problem that other experts can help resolve.

Now the return on investment for working the RAID information needs to be carefully managed. After a certain point the return on deeper analysis begins to diminish and you just need to call the estimate complete. An analogy I use is getting an electrician to quote on adding a couple of outlets and then having the electrician check the breaker box and trace each circuit through the house. Sure it may make the accuracy of the estimate much higher, but you quickly find that the estimate refinement is eating seriously into the task will take anyway.

The level of accuracy needed for most tasks is a range of 50-100% of the base value. In real terms, I am comfortable with estimates with accuracy of 4-6 weeks, 5-10 days and so on. You throw PERT over those and you have a realistic estimate that will usually be reasonably accurate.

RAID and High Confidence Estimates

The other side of the estimate game deals with high confidence estimates. This is a slightly different kind of estimate that is used in roadmaps where there is insufficient time to determine an estimate with a high level of accuracy. The RAID information is used heavily in this type of estimate, albiet in a different way.

In a high confidence estimate, you are looking for something closer to “No Later Than” rather than “Typical”. A lot of engineers struggle with this sort of estimate since it goes against the natural urge to ‘pull rabbits out of a hat’ with optimistic estimates. Instead you are playing a pessimistic game where an usually high number of risks become realized into issues that need to be dealt with. By baking those realized risks into the estimate you can provide high confidence estimates without a deep level of analysis.

In the context of the Cone of Uncertianty, the high confidence estimate will always be slightly on the pessimistic side. This allows there to be a sufficient hedge against something going wrong.

If there is a high likelihood that a risk will become realized or an assumption is incorrect, it is well worth investing a balanced amount of effort to remove those unknowns. It tightens the cone of uncertainty earlier and allows you to converge faster.

Timeboxing and Prototypical Estimates

I usually place a timebox around initial estimates. This forces quick thinking on the engineers side. I try to give them the opportunity to blurt out a series of RAID items to help balance the intrinsic need to give a short estimate and the reality that there are unknowns that will make that short estimate wrong. This timebox will typically be measured in minutes, not hours. Even under the duress of a very small timebox, I find these estimates are usually reasonably accurate, particularly when estimates carry the caveats of risks and assumptions that ultimately are challenged.

There are a few prototypical estimates that I’ve seen engineers give out multiple times. My general interpretation of the estimate, and what refinement steps I usually take. These steps fit into the timebox I describe above.

Estimate Style	Interpretation	Refinement
The task will take between 2 days and 2 months	Low accuracy, Low Information	Start with the 2 day estimate and identify RAID items that push to 2 months
The task will take up to 3 weeks	Unknown accuracy, no lower bound	Ask for no-earlier-than estimate, and identify RAID items.
The task is about 2 weeks	Likely lower bound, optimistic	Identify RAID items, what could go wrong.

Agree? Disagree? Have an alternative view or opinion? Leave comments below.

If you are interested in articles on Management, Software Engineering or any other topic of interest, you can contact Matthew at tippettm_@_gmail.com via email, @tippettm on twitter, Matthew Tippett on LinkedIn, +MatthewTippettGplus on Google+ or this blog at https://use-cases.org/.

Updates on “Getting Good Estimates”

This posting is an update to the Getting Good Estimates article based on the comments received and further research from a number of sources. I include discussion on who should do the estimate, what’s included, references to other estimation techniques, refinements on the probabilistic estimation curve with contrasts to PERT and other techniques. New discussion is had on the “doubling of estimates”, effort vs calendar time, the funnel of uncertainty and finally thoughts on the experience level for shaping estimates against the engineer’s experience.

UPDATE: Improved approach to estimation in this posting.

Who Does the Estimate?

The first comment I saw was from logjam over at hacker news who posted the following (emphasis is mine):

When managers request software estimates from engineers, engineers should frown, look them dead in the eyes, and tell them that making estimates is a managerial/administrative task.

Interestingly is the polar opposite to Joel on Software

Only the programmer doing the work can create the estimate. Any system where management writes a schedule and hands it off to programmers is doomed to fail. Only the programmer who is going to implement a feature can figure out what steps they will need to take to implement that feature.

I tend to agree with Joel on this one. The person solving the problem is in the best position to determine how long a particular task will take. Moving away from software, people get very frustrated with cookie-cutter estimates from tradespeople, independent of the actual effort associated with the problem.

The estimate is not a negotiated value between the engineer and the manager, it is instead a shared consensus on the effort of the task. The engineer’s responsibility is to integrate the assumptions, risks and their individual capability into an estimate. The manager’s responsibility is to provide the bigger picture to provide the information needed for the engineer to integrate, and the commitment to

Another point that was brought up by my friend and colleague, Piranavan, is that estimates should be tempered by the individual’s strengths and experience. An architect who knows the system through and through will usually be able to deliver a task within a considerably shorter time than an intern that is new to a system. This underscores what Joel and I mention above. The estimate should really come from the person doing the work. It can be workable to have experienced engineers create estimates, but before the estimates become plans of record, the person making the estimate needs to temper the estimate with the individual doing the work.

What’s Included in an Estimate?

Immediately below the statement from Joel on Software, there was the following statement.

Fix bugs as you find them, and charge the time back to the original task. You can’t schedule a single bug fix in advance, because you don’t know what bugs you’re going to have. When bugs are found in new code, charge the time to the original task that you implemented incorrectly. This will help EBS predict the time it takes to get fully debugged code, not just working code.

This concurs with what I look for in estimates. Completed work, done… Done, done… Hands off keyboard, done… Delivered with minimal bugs, done…

I look the engineer dead in the eye, ask them to put their hand on their heart confirm that their estimate includes all work that’s needed for the task to be complete. Most engineers will pause and possibly realize that there is other work or unconsidered risks that might affect the estimate.

The intent isn’t to beat the engineer up, the intent is to dig down and expose any assumptions, concerns or other issues that may affect the estimate. Remember that the captured form of the estimate is either an explicit range or a single effort value with confidence interval applied

What Other Estimation Techniques Are There?

There are obviously many different techniques that can be used for estimations. A google query for “Software Estimation” yields 31,400,000 results. 10 pages worth of results in,

There are many, many different methods. Here are a couple of interesting and accessible ones.

Planning Poker is a group consensus system. There is a group discussion on the details regarding the task, and then everybody creates their estimates. These are then combined to determine a group estimate.

Evidence Based Scheduling by Joel Spolsky in 2007 predicates estimations down to less than 16 hours. This forces a level of design as part of the estimate. The estimates are not trusted until they get down to that timeframe. I’d imagine that the estimate/design is revised and improved over time. Jump down to the uncertainty funnel below for a discussion there.

Probabilistic Evaluation and Review Technique (or PERT) for short, provides a full system for estimation. The methodology also takes the probabilistic estimation curve and boils it down to 3 points on the curve: the optimistic, most likely and pessimistic. These are then calculated into a single estimate as shown below.

Probabilistic Estimation Curve

One of the key parts of the previous post presented the characteristic curve. As part of the research for this update post, I saw the curve in multiple places from papers on terminology to more NASA handbooks on estimation. The research provided a lot more nuance to estimation. It is also referenced heavily as the basis for the 3 point estimation technique used in PERT.

Although I haven’t confirmed, I believe that the probability function that closely matches this shape of a particular beta distribution

Refreshing with the graph.

Notice that I’ve marked the three critical parts.

Absolute Earliest	The absolutely earliest that the task could be completed. This is assuming perfect understanding of the task and no unrealized risks. Basically the impossible estimate. Way too many estimates are based on this value.
Highest Confidence (engineer’s estimate)	This represents the highest confidence, and the likely point at which the task will be completed.
Mean (planning estimate)	This represents the mean of the estimate. I’ll dig deeper into this shortly.

PERT provides a basis for determining the estimate based on the formula

Estimate = Mean = (optimistic + 4*Likely + pessimistic)/6.

This of course assumes that the least likely estimate is captured as a number, which in a lot of cases is quite hard to do.

Padding Estimates

If you’ve been in software engineering for a while, you probably have heard someone say “Take the estimate and double it”. The paper by Grimstad et al actually positions this in context. They make a similar explicit observation that the estimates for any task have a probabilistic shape with the two critical points. The highest confidence and the mean.

These two points carry particular value and should be used in two different scenarios. The highest confidence should be used by the engineering in tempering and improving their estimation. The mean should be used within the project management team to determine a likely cost or planned effort for the project. Both are rooted in the same estimation but are derived differently.

The estimation doubling is triggered by a gross simplification of the estimation process. Simplifying the estimation to a single scalar value from a probabilistic range makes it easier to aggregate numbers, however the aggregation should be the mean rather than the highest confidence. If you conflate the two values together you will end up with poor overall planned effort. Remember that the engineers will optimistically provide something between the absolute earliest and their highest confidence estimate so this is generally the number used as a scalar estimate and hence as the basis for estimate doubling.

Since there is the tendency to use the highest confidence estimate as a basis for planning and these estimates will be typically be lower than the planned effort we end up with a shortfall. To recover from this shortfall, the simplest model is to use an arbitrary multiple. Falling back to our probabilistic model, we see that the mean is a non-linear distance from the highest confidence estimate. The management of risks and unknowns shape the confidence associated with a task.

A well understood task may have a small difference between the absolute minimum, the highest confidence and the mean, a poorly understood task will have a greater spread. The size of the task (or the absolute minimum) carries no direct relationship to the spread of the estimates.

This removes the “double the estimate” for the purpose of planning. The use of the more nuanced mean or planning estimate should be used instead. Or put differently, the factor by which the engineers estimate is transformed into the planning estimate is proportional to the level of risk and number of unknowns. The higher the level of understanding of the risks and issues for a task, the lower the multiple should be.

Of course if this means that doubling the estimate may make sense in some environments, particularly when the estimate carries a lot of unknowns and is known to be optimistic or has not been tempered by the sorts of discussions suggested in the original article.

The Uncertainty Funnel

An implication of the shaping and discovery process I described earlier is that over time the estimates become more accurate as more information is discovered and as the project continues.

A number of papers show this in different forms. Page 7 of the NASA Handbook of Software Estimation shows a stylized funnel, and page 46 of Applied Software Project Management (physically page 15 in the Chapter 3 PDF referenced) book shows an iterative convergence of estimations in its discussion of the Delphi Estimation model.

Both these reference re-iterate that estimates are not static. Estimates should be revisited and re-validated at multiple stages within a project. New information, changes in assumptions and changes in risk profiles will shape the estimate overtime. I’d also suggest that the engineers quickly do a sanity check on the estimates they are working against before starting work on a new task. The estimate will generally improve over time as the risk discovery, problem understanding and task detail awareness increase the accuracy of the estimate.

Visionary Tools provides an interesting observation that if you don’t see estimate uncertainty reducing over time, it is likely that the task itself is not fully understood.

Interdependencies & Effort vs Wall Time.

Piranavan highlighted another area that I had left ambiguous. The discussion on estimates is focused on looking at the particular effort associated with a singular task. It does not expand into managing interdependencies and their effect on meeting an estimate. For the purposes of an estimate in these articles, it is the time applied to delivering the estimate. The external factors such as interdependencies, reprioritization, etc should not affect the value of the estimate. Here I do see the prime value of the manager being in the running defence for the engineer and ensuring that they have the capability and focus to successfully deliver the work with minimal interruption or distraction. This may mean delaying the delivery of the work, or assigning other work elsewhere. Always remember that sometimes you need to take the pills and accept late deliver.

A further subtlety on the interdependencies is that if an interdependent task is not well-defined or delivered cleanly or completely, that may force rework or rescoping of tasks and consequently it is likely that these external factors will inject bumps into the uncertainty funnel.

On Feedback Cycles and Historical References

logjam made the following point:

Managers should collect, maintain, and have access to substantial historical data upon which they can make estimates and other administrative trivia. What else are managers for? Of course, what engineers need to understand is the game at work here: making an estimate is primarily about making you commit to a date, with which you will be flogged by those asking for such an estimate.

Piranavan wrote

I also think that estimating work is something that needs to be adopted in a weekly cycle. Capturing the changing estimates are important to understand that things are changing and need to be accounted for as well as a strong feedback tool for engineers to understand where previous estimates went awry (estimates vs actual). It also gives managers a chance to understand how close engineers were and whether or not that was an estimation error or an outlier (external priority change for example).

Whilst I don’t agree with logjam’s assertion regarding managers making estimations in isolation of the engineers, both of the responding comments point out the need to capture, manage and maintain estimates throughout the life of a project, and if possible educate the engineer on how to improve the accuracy of their estimates. That’s a topic for a later article.

Comments, suggestions or pointers are welcome below.

Getting Good Estimates

Good estimates are hard to come by. They are typically too optimistic or too pessimistic or aren’t grounded in reality. Here is my approach to effort estimation. I’ve used it successfully in a number of roles and have seen engineers go from poor to reasonable to good estimators.

UPDATE: I have gathered some thoughts and comments and included them in this update.

UPDATE 2: I have an update on the methodology and some further insights in this blog post.

What I look for in Estimates

Typically when asked for an estimate, you will get a single value with no qualification. “The work will take 3 weeks”. Experience has shown me that the a single value implies a lack of understanding the nuance of the problems and issues that the task might have lurking just below the surface.

When asking for an estimate, I’m looking for two things. 1) A baseline effort, and 2) A confidence interval. This comes in one of two forms

4 weeks of effort with 60-70% confidence
3-6 weeks of effort

Both these values are effectively the same. I let the engineers choose which ever one they are comfortable with.

Characteristic Curve

I can’t recall when I began to understand the characteristic curve within the methodology I use for engineering. I’d say that a long-term colleague Larry Bonfada was a strong influence in the thought process and I have since seen similar characteristic curves in Waltzing with Bears: Managing Risk on Software Projects by Tom DeMarco and Timothy Lister. I don’t have sufficient a background in statistics to define the shape. Feel free to leave a comment to educate me on the distribution type.

The critical sections of the curve in the table below.

Section	Description	Confidence
Absolute Earliest	The absolute earliest date that the task can be complete.	0%
Highest Confidence	The date that represents the highest likelihood of being delivered on or around.	60%
Long Tail	Worst case scenarios, if things go wrong, this date will be hit.	<10%

Typically engineers will choose one of those sections for their estimates. Optimists will communicate the absolute earliest date, pessimists will go for the long tail and your more experienced realists will go for the point of highest confidence – somewhere in the middle.

Shaping the Curve

Quite possibly you are thinking that to get this curve you have to apply painful or difficult to use models; fortunately, it’s not rocket science. Most engineers actually have a strong gut feel for the shape of the curve, so it’s a matter of teasing out a good estimate.

The way it works is through a set of questions to the person providing the estimate.

Question	Answer
What’s the lowest effort for this task?	2 weeks
What’s the likelihood it will task 20 weeks	1%
What’s the likelihood it will take 10 weeks	5%
What’s the likelihood it will take 5 weeks	30%
What’s the likelihood it will take 4 weeks	50%
What’s the likelihood it will take 3 weeks	60%

I intentionally use an number of extreme points (10,20 weeks) to drive the shape of the curve. When graphed, it comes up similar to below.

I find that most engineers will naturally have a strong gut feel for the estimates and in the majority of times will give numbers that result in more or less the same shape.

Now of course, there are a class of engineers who are either so cautious that they always estimate in the long tail – or too optimistic (or naive to the real effort) that they will always resist this sort of analysis. My advice is to push through with them (or at least work out a way to interpret their estimates).

From the answers to the questions in the examples above, I’d walk away with either of the agreed to estimates of 2½ – 4 weeks of effort or 3 weeks with 60-70% confidence. Each team or organization will have it’s own sweet spot of acceptable range. Tightening and getting the estimates to the right shape, usually involves a mixture of analytics and soft management skills.

Tightening the Curve by Managing Unknowns

The uncertainty in the curve is representative of a number of different factors, be it experience, unknown complexity, inter-dependencies and so on.

A hallmark of a large amount of unknowns in this sort of analysis is overtly broad ranges. I’ve had engineers give a range of 2 weeks to 3 months. Obviously the estimate isn’t workable by any stretch of the imagination. The engineer in this case is either being obstructionist or hasn’t, or isn’t willing, to look at the unknowns that would drive such a broad estimate range.

The types of questions that I tend to ask the person giving the estimate will be along the lines of

What could happen that will prevent the absolute earliest time from occurring?
What could happen that would push you from the 60% confidence date to a later date?

As each of these questions are answered with the unknown factors becoming more visible, you can revisit the original estimating questions again after those factors have been determined. If you are lucky there are some factors that are either issues that can be dealt with or risks that can be mitigated or removed. In addition, it is worthwhile and discover these issues and risks and have them tracked formally as part of the greater project.

I generally find repeated cycles of this sort of analysis serve to improve the estimates to the point where I am comfortable to accept the estimate into the project. With each iteration the discovery process either moves the overall curve to the left or the right (smaller or large effort) or tightens the shape of the curve (increasing the confidence).

Feel free to provide feedback below on how you deal with estimates.