Zero Downtime Deployments: How to Ship Without Stopping

By XNM Technologies · May 22, 2023 · 4 min read

The maintenance window was the defining ritual of enterprise software deployment for decades: a Friday evening notification to users, a change control process that took longer than the deployment itself, and a Saturday morning rollback drill when something went wrong. The costs were real — lost availability, compressed windows that discouraged frequent releases, and a risk-averse culture that accumulated technical debt rather than shipping improvements continuously. Zero downtime deployment is no longer reserved for hyperscalers. It is a set of well-documented techniques that any Scrum team can implement with the right infrastructure and process discipline.

The four core techniques

Blue-green deployments. Two identical production environments — blue and green — run in parallel. One is live; the other is idle. The new version is deployed to the idle environment and validated there. When validation is complete, traffic is switched via load balancer or DNS in seconds. If the new version has problems, the switch is reversed immediately. The overhead is two full environments; the benefit is the cleanest possible rollback path.
Canary releases. The new version is deployed to a small subset of production traffic — typically one to five per cent — while the remainder continues on the current version. If it behaves correctly, the traffic percentage is gradually increased until rollout is complete. If problems emerge, canary traffic is redirected back and issues are resolved before the rollout resumes. Canary releases are especially valuable where performance at scale is difficult to validate in pre-production.
Rolling deployments. Instances are updated one by one or in small batches, with health checks between each step. At any point, some instances run the new version and some run the old. Traffic is continuously served because healthy instances are always available. The key requirement is that the new version must be backward-compatible with the old during the transition — which has implications for how database schemas and APIs evolve.
Feature flags. Feature flags decouple code deployment from feature activation. Code for a new feature is deployed with the feature disabled by default; the feature is enabled independently, for specific users or segments, when the team decides to activate it. This allows continuous code deployment while keeping features dark until they are ready. Flags also enable controlled rollouts and provide a fast kill switch if problems emerge. The operational overhead is managing the flag lifecycle — stale flags that are never cleaned up become technical debt.

Database migrations and observability

Database schema changes are the hardest part of zero downtime deployment because the database is shared between old and new application versions during the transition. The expand-contract pattern addresses this: in the expand phase, new schema structure is added alongside the existing structure — a new column added, not replacing the old. Both versions operate with the expanded schema. Once the new version is fully deployed, the contract phase removes the old structure. Additive changes are generally safe; destructive changes require the expand-contract approach. Observability completes the picture: health check endpoints, error rate and latency dashboards that surface regressions within minutes of a canary receiving traffic, and automated rollback triggers that revert a deployment when error thresholds are breached without waiting for human intervention.

From maintenance windows to continuous deployment

The journey from maintenance-window culture to continuous deployment is primarily a process and culture problem, not a technology problem. What is harder than the tooling is building the organisational habits that make frequent deployment normal: automated testing coverage high enough that teams trust their pipelines, change control processes that are bypassed by automated tests for routine changes, and deployment frequency high enough that any individual deployment is small and low-risk. Teams that deploy once a month experience each deployment as a major event; teams that deploy daily experience each as a routine operation. The goal is not to eliminate deployment risk but to make each deployment small enough that the risk is proportionate to the change.

If your Scrum teams are still operating with deployment windows, infrequent releases, or a deployment process that takes longer to prepare than to execute, XNM's program and project delivery advisory works with engineering teams to build the pipeline maturity, testing disciplines, and observability practices that make zero downtime deployment a normal part of how software ships.

Lean Six Sigma for Government: Improving Public Services

May 23, 2023

Zero Downtime Deployments: How to Ship Without Stopping

The four core techniques

Database migrations and observability

From maintenance windows to continuous deployment

Related articles

Lean Six Sigma for Government: Improving Public Services

Reshoring vs Nearshoring vs Offshoring: The Location Decision Framework

Stakeholder Mapping: Who Is in the Room and Who Should Be