ng-Whatever

We’ve all done it, sat around a table dissing the previous generation of our product.  The previous set of engineers had no idea, made some stupid fundamental mistakes that we obviously wouldn’t have made.  They suck, we’re awesome.  You know what, in 3 or 5 years time, the next generation of stewards of the system you are creating or replacing now will be saying the same thing – of you are your awesome system that you are slaving over now.

So what changes?  Is the previous generation always wrong?  Are they always buffoons who had no idea about how to write software.  Unlikely.  They were just like you at a different time, with a different set of contexts and a different set of immediate requirements and priorities.

Understanding Context

The context that a system is created is the first critical ingredient for a system. Look to understand the priorities, the tradeoffs and the decisions that had to be made when the system was first created.  Were there constraints that you no longer have in place, were they restricted by infrastructure, memory, performance?  Were there other criteria that were driving success at that stage, was it ship the product, manage technical debt or were there gaps in the organization that were being made up for?  What was the preferred type of system back then?

Understanding these items allow you to empathize with system creator and understand some of the shortcuts they may have made.  Most engineers will attempt to do their best based on their understanding of the requirements, their competing priorities and their understanding of the best systems that can be implemented in the time given.  Almost every one of these constraints forces some level of shortcut to be taken in the delivering of a system.

Seek first to understand the context before making the decision that the previous team made mistakes.  When you hear yourself making comments about a previous team, a peer team or other group not doing things in the way that you would like to see it, look for the possible reasons.  I’ve seen junior teams making rookie mistakes, teams focused on backend architectures making front-end mistakes, device teams making simple mistakes in back-end systems.   In each of these contexts, it is fairly obvious why the mistakes would be made.  Usually, it will be within your power to identify the shortcoming, determine a possible root cause by understanding the context and shore up the effort or the team to help smooth things over and result in a better outcome.

Constraining Your ng-Whatever

When faced with frustration on a previous system, consider carefully a full re-write into a ng-whatever system, or incremental changes with some fundamental breakpoints that evolve, refactor and replace parts of the system.

It is almost guaranteed that the moment a system gets a “ng-Whatever” moniker attached to it, it becomes a panacea for all things wrong with the old system and begins to accrete not only the glorious fixes for the old system, it will also pick up a persona of its own.   This persona will appear as “When we get the ng-whatever done, we won’t have this problem..”.

These oversized expectations begin to add more and more implicit requirements to the system.  Very few of these will be expectations will be actually fulfilled, leaving a perception of a less valuable ng-Whatever.

Common Defect Density

I’m going to come out and say that most engineering teams, no matter how much of a “Illusory Superiority” bias they may have are going to be at best incrementally better than the previous team.  With that said, their likelihood to have defects in their requirements, design or implementation will be more or less even (depending on how the software is being written this time around).

The impact will typically be that the business will be trading a piece of potentially battle hardened software with known intractable deficiences, with a new piece of software that will both have bugs that will be only be ironed out in the face of production.  Even worse, there will always be a set of intractable deficiencies that are now unknown – only to be discovered when the new software is in production.

When the original system was created, it is highly unlikely that the engineering team baked in a set of annoying deficiencies.  Likewise, the new system will, to the best of your teams understanding,  not baking any deficiencies into the system.  You need to make a conscious decision to take the risk that the new issues will be less painful than the old issues are.  If you can’t make that call, then sometimes refactoring and re-working parts of the system might be a better solution.

 

What have your experiences been with ng-Whatevers?  Have you found that your team can reliably replace an older system with a new system, and see that in a few years time the new system is held with a higher level of esteem than the original system?  Follow this blog for more posts, or post comments below on this topic.

 

Advertisements

Code and the Written Word

Code history is like a narrated history of code.  The ability for git rebase to reorder, rework and polish commits allow a developer (and code reviewers) to curate the code history so that it tells a well structured story.  This post will wander through how strongly the analogy can work.

TL;DR version in the slides.  Read on for the long form.

Continue reading “Code and the Written Word”

Ambiguous Requirements in the Simplest Places (and how to fix it)

Below is a photo from New Mongolian BBQ, a favorite dinner place for the family.  This is a really interesting example of an ambiguous requirement as demonstrated by an ambiguous API.  As part of the instructions at the start of the line, a patron is suggested to use two bowls – one for meat, and one for vegetables.

IMG_20150201_184436When the patron gets to the end of the line for the their Mongolian to be cooked and they are presented with this spot for two sets of waiting customers.  The first question that comes to mind is I have two bowls.

The two immediate options that I see for what this means is

  • Customers front and back, bowl 1 and bowl 2.
  • Customer 1 and customer 2

Judging from the customers choosing randomly from the two options above.  I generally opt for bowl 1/bowl 2 if there aren’t any bowls already up when I arrive.

So how do we take the ambiguous requirement and make it mostly obvious to most patrons?  My suggestion would be to place a thick line to separate the two customer spots.  This would rely on human nature to want to have their bundled things bundled together.  If you look carefully at the picture, this might be the intent since there is already a slightly larger gap between the front and back.

Any other suggestions on how to resolve this ambiguous requirement?  Any similar simple but confounding ambiguous requirements issues that you have found?  Post a comment below.

High Confidence/Low Information vs High Accuracy/Low Information Estimates

Quite often estimates are needed where there is low-information, but a high-confidence estimate is required.  For a lot of engineers, this presents a paradox.

How can I present a high confidence estimate, when I don’t have all the information?

Ironically, this issue is solved fairly easy by noting the difference between high confidence and high accuracy estimate.  A high confidence estimate is defined by likelihood that a task will be completed within a given timeframe, while a high accuracy estimate provides a prescribed level of effort to complete the task.  This article presents a method of balancing a high confidence estimate balancing analysis effort against accuracy.

This is a refinement on the “Getting Good Estimates” posting from 2011.

The Estimation Model

The basis for this method is captured in the diagram below. The key measures on the diagram are

  • Confidence, the likelihood that the task will be completed by a given date (Task will be complete in 15 days at 90% confidence)
  • Accuracy, the range of effort for an estimate (Task will be complete in 10-12 days)
  • No Earlier Than, absolute minimum effort for a task.

Estimate Confidence Accuracy

In general, I never accept a naked estimate of a number of days. An estimate of a range will usually imply a confidence level. An estimate of a confidence level may or may not need an indication of accuracy – depending on context for the estimate.

Gaming out the Estimate

As a refinement to the method outlined in Getting Good Estimates, the same technique of calling out numbers can be used to pull out an estimation curve from an engineer.  The method follows the same iterative method outlined in Getting Good Estimates By asking the question, “What is the confidence that the task would be complete by date xxx?, you will end up with a results similar to

Question Answer
What’s the lowest effort for this task?  2 weeks
What’s the likelihood it will task 20 weeks  100% (Usually said very quickly and confidently)
What’s the likelihood it will take 10 weeks  95% (Usually accompanied a small pause for contemplation)
What’s the likelihood it will take 5 weeks  70% (Usually with a rocking head indicating reasonable confidence)
What’s the likelihood it will take 4 weeks  60%
What’s the likelihood it will take 3 weeks  30%
What’s the likelihood it will take 2 weeks  5%

That line of questions would yield the following graph.

Worked Estimate

I could then make the following statements based on that graph.

  • The task is unlikely to take less than 2 weeks. (No earlier than)
  • The task will likely take between 4 and 8 weeks (50-90% confidence)
  • We can be confident that the task will be complete within 8 weeks. (90% confidence)
  • Within a project plan, you could apply PERT (O=2, M=4[50%], P=8[50%]) and put in 4.3 weeks

Based on the estimate, I would probably dive into the delta between the 4 and 8 weeks. More succinctly I would ask the engineer, “What could go wrong that would cause the 4 weeks to blow out to 8 weeks?”.   Most engineers will have a small list of items that they are concerned about, from code/design quality, familiarity with the subsystem to potentially violating performance or memory constraints.  This information is critically important because it kick starts your Risk and Issues list (see a previous post on RAID) for the project.  A quick and simply analysis on the likelihood and impact of the risks may highlight an explicit risk mitigation or issue corrective action task that should be added to the project.

I usually do this sort of process on the whiteboard rather than formalizing it in a spreadsheet.

Shaping the Estimate

Within the context of a singular estimate I will usually ask some probing questions in an effort to get more items into the RAID information. After asking the questions, I’ll typically re-shape the curve by walking the estimate confidences again.   The typical questions are:

  • What could happen that could shift the entire curve to the right (effectively moving the No Earlier Than point)
  • What could we do to make the curve more vertical (effectively mitigate risks, challenge assumptions or correct issues)

RAID and High Accuracy Estimates

The number of days on the curve from 50% to 90% is what I am using as my measure of accuracy.  So how can we improve accuracy?  In general by working the RAID information to Mitigate Risks, Challenge AssumptionsCorrect Issues, and Manage Dependencies. Engineers may use terms like “Proof of concept”, “Research the issue”, or “Look at the code” to help drive the RAID.  I find it is more enlightening for the engineer to actually call out their unknowns, thereby making it a shared problem that other experts can help resolve.

Now the return on investment for working the RAID information needs to be carefully managed.  After a certain point the return on deeper analysis begins to diminish and you just need to call the estimate complete.  An analogy I use is getting an electrician to quote on adding a couple of outlets and then having the electrician check the breaker box and trace each circuit through the house.   Sure it may make the accuracy of the estimate much higher, but you quickly find that the estimate refinement is eating seriously into the task will take anyway.

The level of accuracy needed for most tasks is a range of 50-100% of the base value.  In real terms, I am comfortable with estimates with accuracy of 4-6 weeks, 5-10 days and so on.  You throw PERT over those and you have a realistic estimate that will usually be reasonably accurate.

RAID and High Confidence Estimates

The other side of the estimate game deals with high confidence estimates.  This is a slightly different kind of estimate that is used in roadmaps where there is insufficient time to determine an estimate with a high level of accuracy.  The RAID information is used heavily in this type of estimate, albiet in a different way.

In a high confidence estimate, you are looking for something closer to “No Later Than” rather than “Typical”.  A lot of engineers struggle with this sort of estimate since it goes against the natural urge to ‘pull rabbits out of a hat’ with optimistic estimates.  Instead you are playing a pessimistic game where an usually high number of risks become realized into issues that need to be dealt with.  By baking those realized risks into the estimate you can provide high confidence estimates without a deep level of analysis.

In the context of the Cone of Uncertianty, the high confidence estimate will always be slightly on the pessimistic side.   This allows there to be a sufficient hedge against something going wrong.

High Confidence Estimate

If there is a high likelihood that a risk will become realized or an assumption is incorrect, it is well worth investing a balanced amount of effort to remove those unknowns.  It tightens the cone of uncertainty earlier and allows you to converge faster.

Timeboxing and Prototypical Estimates

I usually place a timebox around initial estimates.  This forces quick thinking on the engineers side.  I try to give them the opportunity to blurt out a series of RAID items to help balance the intrinsic need to give a short estimate and the reality that there are unknowns that will make that short estimate wrong.  This timebox will typically be measured in minutes, not hours.  Even under the duress of a very small timebox, I find these estimates are usually reasonably accurate, particularly when estimates carry the caveats of risks and assumptions that ultimately are challenged.

There are a few prototypical estimates that I’ve seen engineers give out multiple times.  My general interpretation of the estimate, and what refinement steps I usually take.  These steps fit into the timebox I describe above.

Estimate Style Interpretation Refinement
The task will take between 2 days and 2 months Low accuracy, Low Information Start with the 2 day estimate and identify RAID items that push to 2 months
The task will take up to 3 weeks
Unknown accuracy, no lower bound Ask for no-earlier-than estimate, and identify RAID items.
The task is about 2 weeks Likely lower bound, optimistic Identify RAID items, what could go wrong.

Agree? Disagree? Have an alternative view or opinion?  Leave comments below.

If you are interested in articles on Management, Software Engineering or any other topic of interest, you can contact Matthew at tippettm_@_gmail.com via email,  @tippettm on twitter, Matthew Tippett on LinkedIn, +MatthewTippettGplus on Google+ or this blog at https://use-cases.org/.

Desk-Checks, Control Flow Graphs and Unit Testing

Recently, during a discussion on unit testing, I made an inadvertent comment about how unit testing is like desk-checking a function.  That comment was treated with a set of blank stares from the room.    It looks like desk-checking is no longer something that is taught in comp-sci education these days.  After explaining what it was, I felt like the engineers in the room were having similar moments I had when a senior engineer would talk about their early days with punch cards just after I entered the field. I guess times have changed…

Anyway…

What followed was a very interesting discussion on what Unit Testing is, why it is important and how Mocking fills in one of the last gaps in function oriented testing.  Through this discussion, I had my final Unit Testing light bulb moment and it all came together and went from an abstract best-practice to an absolutely sane and necessary best practice.  This article puts out a unified view on what Unit Testing is, is not, and how one can conceptualize unit tests.

Continue reading “Desk-Checks, Control Flow Graphs and Unit Testing”

Root Cause Analysis; Template and Discussion

A typical interpretation of a Root Cause Analysis (RCA) is to identify parties responsible and apportion blame.  I prefer to believe a Root Cause Analysis is a tool to discover internal and external deficiencies and put in place changes to improve them.  These deficiencies can span the entire spectrum of a system of people, processes, tools and techniques, all contributing to what is ultimately a regrettable problem.

Rarely is there a singular causal event or action that snowballs into a particular problem that necessitates a Root Cause Analysis.  Biases, assumptions, grudges, viewpoints are typically hidden baggage when investigating root causes.  Hence  it is preferable to use a somewhat analytical technique when faced with a Root Cause Analysis.  An objective analytical technique assists in removing these personal biases that make many Root Cause Analysis efforts less effective than they should .

I present below a rationale and template that I have used successfully for conducting Root Cause Analysis.   This template is light enough to be used within a couple of short facilitated meetings.  This contrasts to exhaustive Root Cause Analysis techniques that take days or weeks of applied effort to complete.  In most occasions, the regrettable action is avoidable in the future by making changes that become evident in a collective effort of a few hours to a few days.  When having multiple people working on a Root Cause Analysis, this timebox allows analysis within a day.

Continue reading “Root Cause Analysis; Template and Discussion”

Regression Isolation vs Code Diving

As developers we deal with regressions on a regular basis.  Regressions are changes that are introduced to a system that causes a potentially unwanted change in behaviour.  Engineers, being wired the way they are have a tendency to want to fix first, understand later (or understand as part of the fix).  In a large number of cases however, it is considerably more effective to isolate and understand the cause of the regression before even diving into the code to fix it.

This is a continuation of a series of blog postings I am making on regression isolation  and bisection, the first of which was  “A Visual Primer on Regression Isolation via Bisection”.  If bisection and regressions are terms that you don’t solidly understand, I strongly suggest you read the primer.

Continue reading “Regression Isolation vs Code Diving”