Measuring Scrum Team Performance: Beyond Velocity
Velocity -- the sum of story points completed in a Sprint -- has become the de facto performance metric for many Scrum teams. This is a problem. Velocity was designed as a Sprint planning tool: it helps a team estimate how much work they can take on in the next iteration, based on how much they completed in recent iterations. Using it as a performance metric distorts that purpose entirely, and the distortions are predictable and harmful.
When velocity becomes a target, teams optimise for velocity rather than for outcomes. Story points inflate. Straightforward work gets estimated high. Refactoring, technical debt reduction, and quality investment -- work that does not produce story points in the short run -- gets deprioritised. Teams that are "high velocity" may be doing a great deal of activity that produces little of value, while teams with lower, steadier velocity may be delivering exactly what their customers need.
Better Measures: Lead Time and Deployment Frequency
Lead time from idea to production measures how long it takes from the moment a piece of work is conceived -- or a customer need is identified -- to the moment value reaches the end user. Short lead times indicate a flow-efficient organisation with limited queuing, small batches, and fast feedback loops. Long lead times indicate the opposite: queuing, large batches, handoffs, and slow feedback. Lead time is a customer-centred metric -- it measures the experience of waiting for value, which is what customers actually care about.
Deployment frequency measures how often the team releases working software to production. High deployment frequency -- daily, or multiple times per day -- is associated with both higher quality and faster learning. When deployments are frequent, each deployment is small, the blast radius of any given failure is limited, and teams get rapid feedback on whether what they shipped achieved the intended effect. When deployments are infrequent, batches are large, failures are harder to diagnose, and learning cycles are slow.
Defect Escape Rate and Customer Satisfaction
Defect escape rate measures the proportion of defects that reach production rather than being caught during development or testing. A high escape rate signals that quality is being deferred rather than built in -- increasing the cost of quality, damaging customer trust, and consuming disproportionate remediation capacity. Tracking it over time reveals whether quality practices are improving or degrading as the team scales.
Customer satisfaction -- measured through NPS, CSAT, or direct usability feedback -- provides the most direct signal of whether the team is delivering value. Sprint Goals should connect to customer outcomes and satisfaction metrics should be reviewed alongside Sprint retrospectives. A team that consistently meets its Sprint Goals but never improves its NPS is optimising for delivery activity rather than customer value.
The DORA Metrics Framework
The DORA metrics -- Deployment Frequency, Lead Time for Change, Change Failure Rate, and Mean Time to Restore (MTTR) -- emerged from research by the DevOps Research and Assessment team and represent the most rigorously validated engineering performance framework available. The research found that high-performing software delivery organisations significantly outperform low performers on all four dimensions, and that these metrics are predictive of both software delivery performance and organisational performance.
Deployment Frequency: How often does the organisation deploy to production? Elite performers deploy on-demand, multiple times per day.
Lead Time for Change: How long does it take to go from code commit to production? Elite performers measure this in hours or days, not weeks or months.
Change Failure Rate: What percentage of changes to production result in degraded service or require remediation? Elite performers see rates below 15 percent.
Mean Time to Restore: How long does it take to recover from a failure in production? Elite performers recover in under an hour.
The DORA framework is valuable precisely because it balances throughput (Deployment Frequency and Lead Time) with stability (Change Failure Rate and MTTR). Teams that optimise for speed alone tend to see their Change Failure Rate rise. Teams that optimise for stability alone tend to see their throughput slow. High performance requires both.
Presenting These Metrics to Leadership Without Creating Perverse Incentives
The Goodhart's Law problem -- when a measure becomes a target, it ceases to be a good measure -- applies to engineering metrics just as it does to velocity. Deployment frequency that is gamed by splitting trivial changes into separate deployments is not an improvement. Lead time that is artificially shortened by reducing the definition of "idea" is not better flow.
Present these metrics to leadership as a system, not as individual targets. Show the relationship between Deployment Frequency and Lead Time -- both should move together. Show that a team deploying frequently and restoring quickly is resilient; one that deploys frequently and takes days to restore is not. Invite leadership to understand the system rather than to manage individual numbers. Sprint Goals tied to business outcomes -- rather than feature completion -- help bridge the gap between engineering metrics and leadership language: "enable customers to complete checkout without calling support" is a business outcome that leadership can engage with; "complete 45 story points" is not.
XNM Consulting helps organisations mature their agile delivery practices, including measurement frameworks that connect team activity to business outcomes. Learn more about our program and project delivery services.