Why Deployment Still Feels Broken (And How to Fix It)

Why Deployment Still Feels Broken (And How to Fix It)

Most teams treat deployment as a ceremony rather than a default. Here is what is actually causing the friction, and what fixing it looks like in practice.

Why Deployment Still Feels Broken (And How to Fix It)

Deploying should feel like saving a file. For most teams, it still feels like launching a rocket.

There is a pre-flight checklist. There is coordination between teams. There is a shared staging environment that someone else has parked a half-finished feature in. There is an approval step that requires a human who is in a different timezone. And when something breaks in production, the process to roll back is itself a procedure.

The tools to make deployment effortless have existed for years. The problem is structural: the way most teams have organised their release process embeds the friction.

The gap is measurable

The 2024 DORA performance analysis shows what happens when teams get this right and when they do not. Elite performers deploy on demand, multiple times per day, with a lead time from commit to production of under one day. Low performers deploy once a month to once every six months, with lead times measured in months.

The gap is not marginal. Elite teams deploy 182 times more frequently than low performers, with lead times 127 times faster and a failure recovery rate 2,293 times faster.

What is more concerning: the high-performance cluster shrank from 31% of organisations in 2023 to 22% in 2024. Teams are not getting better at delivery as a whole. The gap between the best and the rest is widening.

What is actually breaking deployment

Four structural problems account for most deployment friction. They are distinct, but they compound.

Slow CI pipelines

When a build takes 20 to 30 minutes, developers stop deploying frequently.

The consequence is not just slower feedback. Infrequent deploys mean larger batches of change per release. Larger batches mean harder debugging when something breaks. Harder debugging means more time spent on forensics and less on shipping. A slow pipeline does not only delay code: it changes how the whole team works.

Pipeline investment needs to be treated like product investment, profiled, owned, and iterated on. Not inherited and ignored.

Config drift and environment parity

Most environment-specific bugs trace back to the same issue: production behaves differently from staging, and staging behaves differently from local.

This happens when configuration is managed manually rather than declared, and when environments diverge over time through small interventions that bypass the pipeline. Infrastructure-as-code practices and ephemeral, per-branch environments address the root cause. When every environment is spun up from the same declared config, drift becomes impossible by design.

Manual steps in the release path

Any human checkpoint in a standard deployment becomes a bottleneck.

The problem is not the human: it is the sequence. Code ready, waiting for review, waiting for a staging slot, waiting for sign-off, then deploying. Each wait is a queue. Queues absorb velocity. As we covered in the previous post on workflow bottlenecks, high-performing teams replace human gatekeeping with automated quality gates. Tests pass, coverage holds, build succeeds: the code ships. Human review happens in parallel with a preview environment, not in sequence with the deployment.

Toolchain sprawl

Completing a single deployment across five disconnected tools carries a compounding cost that does not show up in any sprint metric.

GitLab's research found that developers waste up to 75% of their time on toolchain maintenance rather than shipping code. Each context switch during a deploy, from issue tracker to CI dashboard to staging environment to deployment tool to communication thread, is a small tax. Across a week, it adds up to hours.

What fixing it looks like

The teams that have solved deployment share a pattern, not a specific stack.

Shopify ships approximately 40 times per day across a codebase where over 1,000 engineers contribute daily. Their pipeline is fully automated: a merge triggers a build, CI runs, then a 5% canary deployment runs for ten minutes. If no alerts fire, the change is automatically promoted to production. End-to-end, the process runs in around 15 minutes. Emergency deploys are available with full auditability, but standard work needs no human in the release path.

The design decisions are worth noting: a Merge Queue controls how many undeployed changes can accumulate on the main branch, preventing the situation where too many unreviewed changes are in flight at once. Deployment frequency is tracked explicitly as a team health metric alongside developer satisfaction surveys.

Vercel and Netlify operationalised similar principles for web teams. Push to branch, and a preview URL exists. Push to main, and production deploys atomically, with the old version staying live until the new build is fully ready. Rollback is the same operation as promoting, which means it is instant and low-risk.

The common thread across all three: git is the control plane. A push is the deployment trigger. Previews are the default review mechanism, not an optional extra. Automation handles the path to production; humans review on a preview environment, not a shared staging server.

The architecture behind the pattern

This mirrors what composable architecture did for frontend development, described in more detail in Jamstack Was Never About Static Sites. The principle is the same: separate the layers so they can move independently.

When deployment is a clean layer, isolated from the content and application above it, changing how you deploy does not require changing how you build or how you manage content. Teams can iterate on the delivery system without touching the product.

The teams moving fastest have built that separation deliberately. The teams still fighting deployment ceremonies have usually inherited a setup where hosting, deployment, and application concerns are tangled together, and untangling them feels like a project rather than a configuration change.

A different starting point

The DORA framework measures deployment frequency and lead time for changes precisely because they are proxies for organisational capability. When deploying is effortless, teams ship more often, which means smaller batches, faster feedback, and fewer catastrophic failures.

Deployment frequency is not a vanity metric. Shopify's 40 deploys per day is the output of deliberate architectural choices, not a cultural accident.

Fixing deployment is not a one-day task. But diagnosing what is broken usually is. Start with the question from the previous post: if your team shipped every day, what would break? The answers map directly to the structural problems worth solving first.

Forge's POV

Forge's developer platform is built on the premise that deployment should be a background event, not a coordinated one. Git-driven, branch-aware, and independent of the content or application layer it serves.

Every branch is deployable. Previews are the default. Rollback is atomic. The goal is for deploying to feel like saving a file because for teams running a clean delivery layer, that is exactly what it is.