Serverless Architecture and Scrum: Delivering Without Managing Infrastructure
For most of the history of software delivery, deploying code meant managing the infrastructure it ran on. Servers needed to be provisioned, configured, and maintained. Capacity had to be forecast and allocated ahead of demand. Scaling required intervention — either manual or scripted. Operations teams maintained the infrastructure layer, and development teams wrote code that ran on it. The boundary between development and operations was, in large part, the boundary between software and infrastructure.
Serverless computing dissolves that boundary from one direction. In a serverless model, code is deployed as discrete functions that are invoked on demand by events: an HTTP request, a message arriving in a queue, a file upload, a scheduled trigger. The cloud provider manages the underlying servers, the runtime environment, the scaling from zero to thousands of concurrent invocations, and the scaling back down when demand subsides. The development team writes functions; the infrastructure largely manages itself.
For Scrum teams, serverless architecture is not just a technical change. It changes how work is structured, how sprints are planned, how done is defined, and where the new risks live. Understanding those changes is essential for a Scrum team adopting serverless — or for a Scrum Master or Product Owner working with a team that already has.
What Serverless Computing Actually Is
The term serverless is slightly misleading: there are still servers. What the development team does not do is manage them. The canonical form of serverless is a Functions-as-a-Service (FaaS) platform: AWS Lambda, Google Cloud Functions, Azure Functions, or Cloudflare Workers, among others. The team writes a function — a self-contained piece of code with a defined trigger and a defined output — and deploys it to the platform. The platform handles execution.
Serverless is particularly well suited to event-driven workloads: processing incoming data, responding to user requests, triggering downstream processes. It is less naturally suited to workloads that require persistent state, long-running processes, or very tight latency requirements — though the ecosystem of complementary services (managed databases, caching layers, message queues) has expanded significantly to address these limitations.
The billing model is a significant feature: serverless platforms typically charge per invocation and per compute duration, rather than per server running per hour. For workloads with variable or unpredictable demand, this can mean substantial cost savings relative to always-on infrastructure. For workloads at very high sustained throughput, it can mean higher cost. The cost model is one of the considerations the team should evaluate during Sprint planning when deciding how to implement a feature.
How Serverless Changes Sprint Planning
The reduction in infrastructure management work is real and significant. A Scrum team building on serverless does not need to spend Sprint capacity on server provisioning, patch management, capacity planning, or routine infrastructure maintenance. This frees capacity for feature development and improvement work — which is one of the primary reasons teams adopt serverless.
But serverless introduces new categories of work that must be accounted for in Sprint planning.
Cold start latency: serverless functions that have not been invoked recently may have a startup delay — the cold start — before they can respond. For user-facing applications with latency requirements, cold start behaviour must be understood, tested, and potentially mitigated (through provisioned concurrency, function warming, or architectural decisions about which functions are latency-critical). Work to characterise and manage cold start behaviour should appear in the backlog.
Function timeout limits: serverless functions have maximum execution time limits. AWS Lambda functions, for example, have a maximum timeout of fifteen minutes. Work that might exceed these limits must be decomposed or rearchitected — which is itself a design task that belongs in the Sprint.
Cost at scale: the pay-per-invocation model means that cost scales with usage. During Sprint planning, the team should have at least a rough model of how the cost of a new feature will behave at expected production volumes, and flag features where cost at scale is uncertain for explicit analysis. A function invoked millions of times per day is a very different cost proposition from one invoked hundreds of times.
Vendor lock-in risk: serverless functions typically use platform-specific APIs and invocation models. Migrating from one cloud provider to another — or from serverless back to containerised or server-based infrastructure — involves significant rework. Teams should be deliberate about where they accept vendor-specific dependencies and where they abstracted behind interfaces that could be swapped.
The Definition of Done in a Serverless Context
The Definition of Done is the Scrum team's shared agreement on what 'done' means for a Product Backlog item before it can be considered complete and potentially releasable. In a serverless context, that definition needs to be expanded to reflect the new failure modes and operational requirements of the architecture.
A serverless-aware Definition of Done should include criteria in several areas.
Monitoring and alerting: every function deployed to production should have monitoring in place before the Sprint ends. This means invocation metrics, error rates, and duration distributions are being collected, and alerts are configured to notify the team when error rates exceed threshold or latency degrades. A function deployed without monitoring is not done in any meaningful operational sense.
Cost monitoring: the team should be able to see the cost of each function in production. Most cloud providers offer per-function cost visibility through their billing dashboards or through cost allocation tags. Adding cost tags to every deployed function and confirming the team has visibility should be a DoD criterion for any serverless deployment.
Performance testing at scale: testing a serverless function at the invocation rates expected in production is a DoD criterion, not an afterthought. Cold start behaviour under load, concurrency limits, downstream dependencies under simultaneous invocation — these should be tested before the Sprint ends, not discovered in production.
Error handling and dead letter queues: serverless functions that fail silently — that consume an event from a queue and fail without surfacing the failure — create data loss or inconsistency that can be very difficult to detect. Every function that processes queued events should have a dead letter queue or equivalent failure handling configured before the Sprint ends.
What the Scrum Master and Product Owner Need to Know
The Scrum Master does not need to be a serverless architect. But they do need to understand that serverless introduces new categories of technical work — monitoring setup, cost configuration, performance testing at scale — that belong in the Definition of Done and in Sprint planning. A Scrum Master who treats these as optional or as post-Sprint concerns is inadvertently creating technical debt and operational risk with every Sprint.
The Product Owner needs to understand the cost model well enough to make informed prioritisation decisions. In a server-based architecture, the cost of a feature is largely the development time to build it; the operational cost is absorbed in the fixed infrastructure spend. In a serverless architecture, each feature has an incremental operational cost that scales with usage. A feature that generates millions of invocations per day at a fraction of a cent each has a very different cost profile from a feature used by a few hundred users per week. Bringing the Product Owner into cost discussions during Sprint planning — rather than surfacing unexpected cloud costs after the fact — is a practice that pays dividends quickly.
Serverless architecture reduces a real and significant operational burden for Scrum teams. The teams that get the most from it are those that also account for the new categories of work it introduces: the operational hygiene, the cost awareness, and the performance testing that turn a deployed function into a production-ready, maintainable piece of a well-functioning system.
XNM Consulting works with Scrum teams on agile delivery practices, technical ways of working, and the integration of modern architecture patterns into effective delivery models. Learn more about our program and project delivery services.