Wednesday, October 31, 2012

Agile Metrics


Overview

There are no “standard” agile metrics, because what's easy to measure tends to distract us, and what's important to measure is hard to quantify.  The most important thing about Agile metrics is that we need to have a clear objective for using them--normally start with a hypothesis that, say, if defect count drops then lead time will go down too. Many of the metrics below would be used for a short period of time and then dropped when the objective is reached.

With that being said, I've seen clients dovetail push-and-pull metrics, like lead-time & defect count, so that they can be used for a longer period of time, and that if someone starts gaming one number they get penalized on the other.

The following may also be useful:

·       Lead Time
·       Defect count (at various phases; what’s a bug?)
·       Work in Progress
·       Code coverage
·       Unplanned Changes
·       Velocity (story points or story count per sprint)
·       Return on investment
·       Innovations per sprint
·       Artifacts generated
·       Slack time
·       Failure Load (firefighting time)
·       Iteration Burn-Down
·       Unfinished Stories
·       Customer Satisfaction
·       LOC (lines of code)
·       Un-deployed Stories
·       # Blocks
·       Budget/Schedule Compliance
·       Flow Efficiency (lead time / touch time)
·       Release Burn-Up

Definitions

Definitions and cross-references for all these metrics follow.

Lead Time
Defined:            Time from “concept to cash”, the total time it takes to develop an idea and sell it to a paying customer.
Caution:             It may be difficult to measure actual lead time, and many teams approximate lead time by capturing the time a request enters the development process and capturing the time it reaches the definition of done. This approximation may be a reasonable place to start measuring, but it may cause micro-optimization (changes that actually detract from corporate goals) or reduce customer discovery (learning what the customer would pay more for)
Side Effects:     If we blindly push this metric to a minimum, we may see:
·       increased defect count
·       reduced code coverage
·       increased failure load
Benefits:           customer satisfaction, flow efficiency, un-deployed stories, work in progress


Defect Count
Defined:            Total count of surprises, unexpected behavior, flaws, and shortcomings of the product identified during or after an iteration demo.
Caution:             Aggressive definitions of “defect” help everyone focus on customer satisfaction; anything short of the definition above will
Side Effects:     If we blindly push this metric to a minimum, we may see:
·       reduced velocity
·       reduced innovation
·       more unfinished stories
·       more blocks
Benefits:           code coverage, unplanned changes, customer satisfaction, lines of code

Work In Progress (WIP)
Defined:            Number of items we are actively working on. The higher the WIP, the more multi-tasking hurts our efficiency.
Caution:             While a WIP limit of 1 per person may seem ideal, research suggests it’s closer to 2 per person in the event a block prevents us from working the highest priority item.
Side effects:     If we blindly decrease this metric, we may see:
·       excessive slack time
Benefits:           lead time, defect count, velocity, ROI, unfinished stories, un-deployed stories, blocks, budget/schedule compliance, flow efficiency

Code Coverage
Defined:            Percentage of production code tested by the automated regression suite.
Caution:             Static and dynamic code coverage evaluation tools cannot tell us if the code just happened to be executed or if it was verified for proper behavior. The only strategy for full coverage of behavior is Test-Driven Design / TDD / BDD. Short of automation, we cannot find regressions fast enough to keep up with development.
Side effects:     If we blindly increase this metric, we may see:
·       increased failure load
·       decreased velocity
·       reduced innovation
Benefits:           lead time, defect count

Unplanned Changes
Defined:            Number of unanticipated change requests we were able to include in this product increment. Since Agile is all about being more responsive, this is a metric that shows how adaptive we’ve become.
Caution:             Tracking this metric could be burdensome—what counts as a change request? A font style change on the UI? An increase in scope? Pick a granularity to track and stick with it.
Side effects:     If we blindly increase this metric, we may see:
·       decreased velocity (churn)
·       excessive innovation (lack of focus)
Benefits:           customer satisfaction, return on investment, lead time

Velocity
Defined:            Abstract quantity of work that can be completed in a given iteration. Velocity automatically accounts for regular meeting overhead and business-as-usual activities. Velocity is often reported in units of Story Points, Ideal Days, Ideal Hours, or Story Count. Story Points tend to encompass effort, doubt & complexity, so they’re packed with more information than a simple estimate.
Caution:             For large organizations, it helps to normalize Story Points on approximately 1 Ideal Day to simplify strategic & roadmap level planning. Story Points should not used to evaluate past performance—they’re only intended for forward planning.
Side effects:     If we blindly increase this metric, we may see:
·       reduced customer satisfaction
·       increased failure load
·       reduced artifacts generated
·       reduced innovation
·       reduced slack time
Benefits:           lead time, budget/schedule compliance, flow efficiency, release burn-up

Return On Investment
Defined:            Percent earnings based on revenue, capital investment and operational cost.
Caution:             Many teams don’t have access to this data, or don’t track it long enough to see the impact of their work on ROI. Yet it’s key to justifying investment in software.
Side effects:     If we blindly increase this metric, we may see:
·       reduced innovation
·       reduced customer satisfaction
·       increased failure load
Benefits:           lead time, budget/schedule compliance

Innovations per sprint
Defined:            As an agile team becomes more cross-functional, the whole team gains a greater appreciation for what the customer finds valuable. When this results in feature ideas that the Product Owner selects for the backlog, we consider this a success of the whole team.
Caution:             Innovation must be customer-centric—in Kano’s terms, either a linear feature or an exciter/delighter.
Side effects:     If we blindly increase this metric, we may see:
·       reduced release burn-up
·       excessive unplanned changes
·       increased lead time
Benefits:           customer satisfaction, return on investment

Artifacts Generated
Defined:            Any document or non-source-code electronic file generated as a result of the software development process is an artifact. We may want to track help files generated to get a sense of whether our development is sustainable.
Caution:             Some artifacts were historically created for visibility into a long development cycle. If you can rely on automated customer tests instead, this type of “executable specification” will be demonstrably current.
Side effects:     If we blindly increase this metric, we may see:
·       increased lead time
·       increased work in progress
·       reduced budget/schedule compliance
Benefits:           n/a

Slack Time
Defined:            Buffer, maintenance, or creative work that is tangentially related to prioritized product backlog items. Just as a highway has serious congestion at 80% utilization, we see software teams loaded above 80% see serious performance bottlenecks.
Caution:             Slack time is not vacation or goofing off. It is one of the only steps in an agile SDLC that consistently reduces technical debt.
Side effects:     If we blindly increase this metric, we may see:
·       reduced velocity
·       more unfinished stories
·       more un-deployed stories
Benefits:           lead time, failure load, innovations per sprint, customer satisfaction

Failure Load
Defined:            Percent of time spent fixing defects. Failure load is waste; it’s forcing our customers to pay for features twice. We want to avoid failure load whenever practical. You can’t go fast without high quality!
Caution:             n/a
Side effects:     If we blindly decrease this metric, we may see:
·       reduced velocity
·       reduced innovation
·       more unfinished stories
·       more blocks
Benefits:           code coverage, unplanned changes, customer satisfaction, lines of code

Iteration Burn-Down
Defined:            Bar chart showing hours or story points remaining per day of the iteration. The trajectory of the bars shows whether we’re on schedule or not.
Caution:             Without small enough stories, teams will see a “clumping” effect where most of the work tends to get finished at the end of the iteration. This is not desirable—find ways to get to done earlier so there is time to make unforeseen adjustments.
Side effects:     If we blindly improve this metric, we may see:
·       increased blocks
Benefits:           unfinished stories, work in progress, return on investment, customer satisfaction, budget/schedule compliance

Unfinished Stories
Defined:            Any story that did not reach the “definition of done” in the same iteration in which it was begun is an unfinished story. A product owner may cancel, re-schedule, split, re-scope, or defer such a story.
Caution:             Unfinished stories come from a lack of discipline. There’s always a way to negotiate a good story so that it can be split or completed this iteration.
Side effects:     If we blindly decrease this metric, we may see:
·       n/a
Benefits:           customer satisfaction, lead time, release burn-up

Customer Satisfaction
Defined:            increased customer retention or increased revenue
Caution:             Learning about customer retention is slow, and we need safe sandboxes in which to experiment and learn more quickly (e.g., pilot markets or beta tests).
Side effects:     If we blindly increase this metric, we may see:
·       reduced innovation
·       reduced slack time
Benefits:           return on investment

Lines of Code (LOC)
Defined:            One source-code line; from an agile perspective, a line of code increases the risk of system failure and increases the cost of maintenance. We seek elegance, clean code, and avoid duplication in the code base.
Caution:             Mature software shouldn’t always grow. At some point, re-factoring will keep the LOC count stable while we continue to add features. At the same time, if we make code difficult to read or understand, we’ll introduce additional risk for system maintainers.
Side effects:     If we blindly decrease this metric, we may see:
·       increased defect count
Benefits:           lead time, failure load

Un-deployed Stories
Defined:            stories that have reached a team’s definition of done but are not yet actually earning money or being used by a customer
Caution:             until a paying customer uses our product increment, there is risk that delivery teams will need to get involved in supporting it
Side effects:     If we blindly decrease this metric, we may see:
·       decreased customer satisfaction (a product that changes too often?)
Benefits:           lead time, defect count, work in progress, unplanned changes, innovations

# Blocks
Defined:            The number of impediments that development teams have asked for help on.
Caution:             A large number of blocks may mean teams aren’t being as proactive as they could be, or they don’t have an adequate “definition of ready” before accepting work.
Side effects:     If we blindly decrease this metric, we may see:
·       unfinished stories
·       undeployed stories
Benefits:           lead time

Budget/Schedule Compliance
Defined:            compare the estimate of a strategic or roadmap level portfolio item with the team-level estimates (for completed stories only)
Caution:             until a product increment is considered deployable (a minimally marketable feature), we cannot make any assessment on its cost
Side effects:     If we blindly optimize this metric, we may see:
·       reduced innovation
·       fewer unplanned changes
·       reduced customer satisfaction
Benefits:           increased return on investment, reduced lead time, reduced work in progress

Flow Efficiency
Defined:            flow efficiency = lead time / touch time; that is, the amount of time to go through the whole system divided by the actual amount of time someone is actively working on it.
Caution:             Flow efficiency highlights wait time in the existing process, though we really need to focus on value added time. Use this to identify red flags but only as a secondary method to value-added optimization.
Side effects:     If we blindly decrease this metric, we may see:
·       excessive slack time
·       excessively limited WIP
Benefits:           lead time, return on investment

Feedback

What's missing? What would you change?

Sunday, October 21, 2012

The Keystone Habit: Thin Vertical Slices

What's a keystone habit?

A keystone habit is a practice that can make-or-break an organizational change. In a real archway, pictured right, if you forget to add a keystone, or if you remove the keystone, everything else comes toppling down. Culture hackers learn to recognize keystone habits and focus their energies on doing the simplest thing possible. Often this means focusing on just one change at a time.  If we've taught people what to do in times of stress, the new habit is more likely to stick. For example, if our project is running behind schedule, instead of doing mandatory overtime, do thin vertical slicing!  Photo Credit.


What's a Thin Vertical Slice?

The biggest difference between plan-driven requirements and agile user stories is in the way we break up the work, as illustrated for this sample web shopping cart application, below. Plan-driven approaches divide the work by skill, for example, UX designers focus on the presentation layer, software developers focus on the application layer, architects focus on the services layer, database administrators focus on the data layer. Kick-start a project in a plan-driven fashion, and because of dependencies in the work, the only people that actually start working are business analysts, database administrators and architects (and maybe UX designers). Developers and QA staff sit waiting for the back-end systems to mature.
In contrast, agile teams break up the work by business value. As illustrated in the shopping cart example below, we could ask the whole team to focus on the Search Products part of our application. Initially teams will follow their old habits, and only the analysts & back-end folk will start working. No worries, since this is a keystone habit! Since our focus is narrow, we finish soon, and developers/QA get involved immediately. Inevitably someone discovers a design flaw--and while the system is still fresh in everyone's minds, we re-design it to fix the flaw.



A Rose by Any Other Name

Synonyms: Story Splitting, Who-What-Why, As-a / I want / So That, INVEST
This past week when I ran a workshop for Agile Philly on Thin Vertical Slices, several people told me they were familiar with the idea but not by this name. They also never really understood it until they played the Story Splitting game (ask me, I'll be happy to run it for you). What I've found is that  people get overwhelmed with the canonical story template:
As a fitness enthusiast
I want to buy my favorite style of footwear for my workout
So that I can save time & money as compared to retail store shopping
Instead of trying to answer all three parts of the story template above, I coach clients to focus on the what, or the title of the user story.  The title should be 1-3 words, so sticking with the example above, the story's title is buy favorite footwear.

Assumptions

  • Vertical slices must be INVEST-worthy: Independent, Negotiable, Valuable, Estimable, Small, and Testable. The focus on business value tends to lead us to good slices, but if this acronym is new to you read more about INVEST-worthy stories. 
  • We believe the design is never really done until customer's problems are solved, so we optimize for validated learning--for end-to-end delivery of business value. If you know for certain that the market is wiling to pay for your next feature set, and there is no risk in implementing it, don't slice--just go!
  • Slicing is a form of planning, and it takes time (1-2 hours per week for the whole team). We only want this overhead if it reduces our risk or doubt, that is if we can validate our learning by deploying & trialing working solutions.
  • Slicing depends on automated regression testing. Short of regression tests, we can't afford the re-work costs of updating the design as we move from one slice to another. Note: test automation may not be what you think. See James Shore's Customer Tests.

The Essence in Just One Slice

Given a big scary idea, a vision statement, or an epic user story--what is its core value--the essence? Focus on the WHAT. What could we do in the next hour to learn whether our solution is better than doing nothing? Of course we can't validate the entire solution in an hour--but if every hour we're validating some part of the next most valuable feature or risky assumption, then we're learning very quickly, and we'll deliver sooner. When we deliver sooner, we're agile. When we build upon everything we learned, we increment towards a full solution. I'd rather have something that works for a limited set of customers, than something that doesn't work at all. Now that we've pushed our thinking to an extreme, let's think bigger--what could we learn in a week? Could we make a product increment that we could demo to a friendly customer or business leader?

Why Slice So Thin?

Want greater productivity? A good slice helps us deliver less, better. Why less? In surveying a cross-section of the industry, the Standish group reported in 2002 that 64% of software features are rarely or never used--see a summary of the report here. If only we knew in advance what that 64% was, we'd more than double our productivity! Wait a second--there is a way to find out sooner... Thin Vertical Slices! It's the keystone habit to agility!

What's a Good Slice?

  • focus on the WHAT. A good slice is the next most risky or valuable piece of our epic / vision / business ask / solution. (see Essence, above).
  • slice only what we need to understand next, that is, we're doing progressive elaboration. If we have external dependencies, vendor contracts, or long feedback cycles, we'll have to do some big slices up-front--just don't go thinner than necessary to coordinate with the plan-driven folk.
  • after you know WHAT to make, cross-check it: ask WHO cares about it. The WHO should be a paying customer, or someone outside the building. Even batch jobs and admin tools deliver value to the paying customer--they reduce the cost of ownership. Sometimes a story/slice is so granular the paying customer has no idea it exists--in which case I normally go up a level or so to be sure they care about a parent story/slice. The WHO of a parent is often the same for all children, and I may not go to the formality of tracking WHO for all children.
  • after we know WHO we're trying to satisfy, empathizeWHY is the seed for innovation--this is a huge difference between "requirements" and "stories". We're supposed to understand why the customer wants the WHAT, and look for a better way. As Henry Ford said, if you asked people WHAT they wanted, they'd just say "a faster horse". Asking WHY helps us understand that getting from point A to point B faster is something people would pay for, and leaves room to innovate on how to accomplish the goal.
  • a good slice is INVEST-worthy (see above)

The Keystone Habit of Agility: Thin Vertical Slices

If you get thin vertical slicing right, everything else agile comes along for the ride. When I was still learning how to explain this idea I asserted the following on twitter:
Minimalist Agile: thin, vertical slices + WIP limits + progressive elaboration (what's missing that won't be induced by these?)
@RonJeffries responded by asking about priority, tech practices and teamwork. My response today: Thin vertical slicing, as defined above, only slices off the next most important thing--that's value priority and progressive elaboration. As we push towards infinitesimally thinner stories, people are coerced into working on the same thing at the same time--both limiting WIP and forcing them to work together. Practically speaking, there may still be hand-offs, but it's still teamwork when everyone is focused on the same goal of completing a slice. I admit our whole archway will topple if we don't have a good foundation on technical practices. I've listed that as an assumption above.

The Keystone Habit At Scale

The beauty of simple truths is they apply at varying levels of hierarchy. While we get benefit from thin vertical slices at the team level, their true impact is realized with strategic & program level planning. Thin Vertical Slices are named in a language business leaders understand--and as Alistair Cockburn notes, this creates a Cooperative Game. Business leaders see the slices, and ruthlessly prioritize them--which is exactly what we need to build less, better. At some point, as Jim Highsmith reports in Agile Project Management, the business stops picking slices under a given epic--and they move on to slices under another epic.
Since slices are INVEST-worthy, we can schedule them in any order, making the Cooperative Game of prioritization easier--simple rules bring complex behavior. Business leaders start trading turns or pooling their budgets to get slices of mutual benefit. Vertical slices also help the businesses move to a SAFe (Scaled Agile Framework) mindset of release trains--if the features aren't in this release, they can be on the next one or the next one--we release early and often. Thin/small slices give us rapid feedback in terms of integration, customer reaction, and other learning.
At scale, I've personally coached a few customers with 300-400 developers who coordinate their work with Thin Vertical Slices. Executives like it since it promotes visibility, developers like it because it promotes autonomy and creativity.

A Keystone Habits is Just the Beginning 

A keystone habit is not a panacea. It is just what we must hold on to in times of stress. We need plenty of agile practices to successfully launch a productive agile culture. If we stay focused on thin slicing, though, and use thin slicing to reduce our stress, I'm convinced we'll keep our agile culture alive and well.

Acknowledgements


Thanks to Bob Gower for giving me a name for this idea--his talk at CULTUREcon Philly, "Kicking the Habit", will be summarized on this blog soon.   Similar thinking can be found in Charles Duhigg's The Power of Habit, and Martin Seligman's focus on well-being instead of pathology. Michael Margolis also teaches us that our new story must be grounded in the old--we can't simply abandon or ignore the old ways of thinking. Instead, when we want to change an old habit, or a culture, we seek to understand the keystone habits that exist, and build from that foundation as we replace the keystone with something new. Thanks also go to my colleagues at Rally Software who have helped me think out loud as I've been testing and formalizing this idea--Mark Kilby, Yvonne Kish, Longda Yin, Ben Carey, Chris Browne, Ann Konkler, and others!