stop the world stop the world

Stop the World PostgreSQL

PostgreSQL is built to let lots of things happen at once: reads while writes are happening, multiple users running queries, background maintenance cleaning up old rows, migrations rolling out without downtime. Most of the time it delivers on that promise because its concurrency model (MVCC) avoids blocking in many common cases.

But every Postgres operator eventually runs into a moment that feels like someone yanked the power cord on concurrency: queries pile up, the app starts timing out, dashboards show CPU is oddly calm, and yet nothing is “moving.” In reality the system is usually not crashed. It’s waiting. One session is holding (or waiting for) a lock that forces other sessions to queue behind it, and that queue can spread until the database appears frozen.

1. Introduction:

“Stop-the-world” isn’t an official PostgreSQL term. It’s a useful mental label for incidents where a single lock or blocking chain effectively pauses progress for a large part of the workload. These events matter because:

  • The blast radius is huge. A lock taken on one table can stall unrelated requests if your app workflow touches that table frequently, or if connection pools fill up with waiting sessions.
  • They’re easy to trigger accidentally. Routine operations like certain schema changes, TRUNCATE, or an overlooked long-running transaction can create conditions where other sessions must wait.
  • They’re hard to diagnose under pressure. The database looks “idle” at the resource level, while the real problem is invisible unless you know where to look (pg_stat_activitypg_locks, lock wait events).
  • They turn small mistakes into outages. One migration run at the wrong time can turn into cascading timeouts, retries, and even more load—making recovery slower.

The goal of this article is to make those moments predictable. We’ll map the kinds of locks that can cause system-wide stalls, show you how lock queues form and spread, and give you practical ways to detect, prevent, and safely execute the operations that are most likely to cause “stop-the-world” behavior in production.

2. What People Mean by “Stop-the-World” in a Database Context

“Stop-the-world” is borrowed from other areas of computing (like garbage collection) where the system briefly pauses normal work. In databases, it’s not usually a literal pause of the entire server process. Instead, it’s a concurrency pause that happens when enough sessions end up waiting on the same chokepoint that the workload behaves like it’s frozen.

In PostgreSQL, that chokepoint is most often a lock (or a chain of locks) that forces other sessions to wait.

It’s not “the database is down”

During a stop-the-world-style event, Postgres is typically still:

  • accepting connections,
  • running some background work,
  • responding to trivial queries that don’t touch the blocked objects.

The problem is that your important queries can’t make progress, because they need a lock that can’t be granted yet.

What it looks like from the outside

Common “everything stopped” symptoms include:

  • Requests timing out across many endpoints, even ones that are normally fast.
  • A sudden spike in the number of active sessions, many of them “doing nothing” except waiting.
  • Low CPU utilization (because waiting doesn’t burn CPU) paired with high app error rates.
  • A backlog in the connection pool: threads are stuck waiting for a DB connection that is itself waiting on a lock.

The mechanics: how a single blocker becomes “world-stopping”

A typical pattern is:

  1. One session takes a strong lock (or tries to) on a table or other object.
    Example: a DDL statement that requires an ACCESS EXCLUSIVE lock.
  2. Other sessions queue behind it, waiting for locks that conflict.
    These waiting sessions are now not completing and remain in your pool.
  3. The queue spreads outward as more application requests arrive and also block.
    You get a connection pileup, then timeouts, retries, and even more pressure.
  4. The real blocker might not even be the DDL.
    Often, the DDL is waiting behind a long-running transaction that holds a weaker lock. Because Postgres queues lock requests, a single “strong lock request” can cause other compatible lock requests to queue behind it as well, amplifying the impact.

So the “stop-the-world” feeling can be caused by either:

  • A session holding a lock that blocks many others, or
  • A session waiting for a lock in a way that causes everyone else to line up behind it (lock queue effects).

What qualifies as “stop-the-world” vs normal contention

Normal contention is localized: one hot row, one hot index, one busy table, and most other work continues.

A “stop-the-world” episode is characterized by:

  • High fan-out blocking: one blocker affects many sessions.
  • Lock queue amplification: a few lock waits turn into many.
  • Application-level stall: pools saturate, timeouts rise, and the app effectively stops serving.

Key takeaway

When people say “stop-the-world lock” in PostgreSQL, they usually mean:

  • “A lock situation (often involving ACCESS EXCLUSIVE or a lock queue behind it) that makes the database effectively unusable for a broad slice of traffic.”

The rest of this article focuses on turning that vague fear into concrete knowledge: which operations are risky, how to spot the true blocker, and how to run changes without triggering a production-wide stall.

3. PostgreSQL Locking Basics You Need for the Rest of This Article

PostgreSQL’s locking system is how it keeps data correct when multiple sessions do things at the same time. If you’re going to reason about “stop-the-world” incidents, you only need a handful of core ideas: what a lock protects, how long it lasts, how conflicts create waiting, and why waiting can cascade.

Locks exist to protect specific resources

In Postgres, a “lock” is not one single thing. It’s a family of mechanisms that protect different kinds of resources, such as:

  • Tables and other relations (tables, indexes, materialized views, sequences)
  • Rows (the specific tuples you update/delete)
  • Transactions and virtual transactions
  • Internal shared-memory structures (via lightweight locks)

For “everything is stuck” events, the most common culprits are locks on relations (especially tables) and the way lock requests queue.

Two big categories you’ll hear about

Row-level locks
These happen as part of UPDATEDELETE, and SELECT ... FOR UPDATE/SHARE. They are usually narrow in scope (one row or set of rows), so they more often cause localized blocking.

Table-level (relation) locks
Every query takes some lock level on the relations it touches. Many of these are compatible (so they don’t block each other), but some operations need strong locks that block most concurrent access. These are the ones that feel “stop-the-world.”

Lock compatibility is the whole game

A lock only causes waiting when there is a conflict. Postgres defines lock modes (like ACCESS SHAREROW EXCLUSIVEACCESS EXCLUSIVE) and a compatibility matrix that determines which combinations can coexist.

The practical takeaway:

  • Many sessions can hold “weak” locks together.
  • “Strong” locks conflict with many other modes.
  • When a strong lock is involved, waiting can spread fast.

Locks are held for different durations

Lock duration depends on what kind of lock it is and what statement you ran, but two rules of thumb explain most surprises:

  • Locks taken by DDL are often held until the end of the transaction.
    If you run schema changes inside an explicit transaction block and keep it open, you keep the lock longer than you think.
  • Long transactions make everything worse.
    Even if a transaction is “idle,” if it’s still open it may hold locks (and it can also prevent cleanup via MVCC, which indirectly increases pressure).

Postgres uses MVCC, so reads usually don’t block writes

Multi-Version Concurrency Control (MVCC) is why plain SELECT can run while other sessions are UPDATE-ing the same table. Readers see a consistent snapshot; writers create new row versions.

But MVCC doesn’t eliminate all blocking:

  • Certain reads explicitly request locks (SELECT ... FOR UPDATE).
  • Writes can block each other when they touch the same rows.
  • Many schema/maintenance operations still require strong relation locks.

Waiting isn’t just “slow”—it creates queues

A common misconception is: “If a query waits on a lock, only that query is affected.”

In real systems, waiting causes secondary failures:

  • connection pools fill with waiting sessions,
  • threads block upstream,
  • retries amplify load,
  • more sessions arrive and join the wait queue.

So a single blocked statement can become an application-wide outage even if the database is technically functioning.

The mental model you’ll use going forward

For the rest of this article, keep this simple loop in mind:

  1. A statement needs a lock on some object.
  2. If the lock conflicts with existing locks, it waits.
  3. Waiters accumulate, and the wait spreads via your app and pool.
  4. The “stop-the-world” moment ends only when the blocking lock is released (commit/rollback, statement finishes, session is canceled/terminated).

Once you have this model, the next sections (lock modes, ACCESS EXCLUSIVE, DDL pitfalls, diagnosis with pg_stat_activity/pg_locks) become much easier to follow.

4. A Quick Tour of PostgreSQL Lock Types

PostgreSQL uses multiple locking mechanisms at different layers. When people talk about “locks,” they’re often mixing these together, so it helps to name the main families and what they’re for. The sections later in the article will zoom in on the ones most likely to create stop-the-world behavior.

Relation locks (table/index locks)

These are locks on relations: tables, indexes, materialized views, sequences, etc. Almost every SQL statement takes at least a weak relation lock on the objects it touches.

  • Examples: ACCESS SHAREROW EXCLUSIVEACCESS EXCLUSIVE
  • Where you see them: pg_locks with locktype = 'relation'
  • Why they matter: strong relation locks (especially ACCESS EXCLUSIVE) can block huge parts of your workload.

Row locks (tuple locks)

These protect specific rows. They arise from:

  • UPDATEDELETE
  • SELECT ... FOR UPDATEFOR NO KEY UPDATEFOR SHAREFOR KEY SHARE

Row locks tend to cause localized contention (hot rows) rather than “the whole DB froze,” but they can still contribute to large incidents when a hot table is involved or when many sessions pile up behind the same row(s).

  • Where you see them: you don’t get a neat “row id” in pg_locks the way you might expect; tuple locks are represented indirectly (and often show up via waits like tuple/transactionid depending on circumstances).

Transaction-related locks

Postgres also has locks that are effectively “wait for another transaction to finish.”

Two common ones you’ll encounter during diagnosis:

  • transactionid locks: a session is waiting for a particular transaction ID to commit/rollback (very common with row-level conflicts).
  • virtualxid locks: related to a session’s virtual transaction; these show up in some DDL/maintenance interactions and can surprise you when a seemingly harmless session blocks a stronger operation.

These are important because sometimes the visible blocked query is waiting on a lock that’s actually “owned” by a transaction you didn’t expect to matter.

Advisory locks

Advisory locks are application-defined locks. Postgres enforces them, but Postgres doesn’t take them automatically; your code does.

  • Functions: pg_advisory_lockpg_try_advisory_lock, and transaction-scoped variants
  • Use cases: ensuring a singleton job runs once, preventing concurrent migrations, serializing access to a resource
  • Risk: you can create your own “stop-the-world” event if you take a global advisory lock and forget to release it (or if the session hangs around).

Lightweight locks (LWLocks) and other internal locks

Postgres has internal locks used to coordinate access to shared memory structures (buffer manager, WAL, lock manager structures, etc.). These are not SQL-level table/row locks and typically aren’t what you’ll see in pg_locks as relation locks.

  • Why they matter: under extreme contention or I/O pressure, waits on these internals can look like a “freeze,” but they’re a different class of problem than DDL-induced blocking.
  • Where you see them: wait event reporting (pg_stat_activity.wait_event_type / wait_event).

Predicate locks (Serializable Snapshot Isolation)

If you run at the SERIALIZABLE isolation level, Postgres uses predicate locking (implemented as SIREAD locks) to detect dangerous read/write patterns.

  • These don’t behave like typical blocking locks most of the time.
  • They’re primarily for correctness (and can lead to serialization failures rather than long blocking).
  • Where you see them: pg_locks with locktype = 'predicate' in serializable workloads.

Extension: object and metadata locks

Some operations also take locks on non-table objects:

  • databases (locktype = 'database')
  • pages or tuples (less commonly surfaced in the way people expect)
  • extensions, schemas, or system catalogs (often indirectly, but crucial during DDL)

These can matter because “stop-the-world” incidents are frequently triggered by metadata-changing statements that must coordinate globally.

What to remember for stop-the-world scenarios

If your goal is to prevent “everything is stuck” moments, focus mostly on:

  1. Relation locks (especially ACCESS EXCLUSIVE)
  2. Lock queues and lock waits (how one waiter can cause pileups)
  3. Long-running transactions (they keep locks and block cleanup)
  4. DDL and maintenance commands (they’re disproportionately likely to need strong locks)

Next up, we’ll break down lock modes in plain language so you can predict which statements block which workloads.

5. Lock Modes Explained: From ACCESS SHARE to ACCESS EXCLUSIVE

PostgreSQL relation (table/index) locks come in a set of modes. Every statement that touches a table takes one of these modes (sometimes more than one object per statement). The mode determines what else can run concurrently.

You do not need to memorize every edge case, but you do need a feel for the spectrum: most queries take weak locks that happily coexist, while a few operations require locks strong enough to stall almost everything.

The spectrum at a glance

From “least blocking” to “most blocking,” the commonly encountered relation lock modes are:

  • ACCESS SHARE
  • ROW SHARE
  • ROW EXCLUSIVE
  • SHARE UPDATE EXCLUSIVE
  • SHARE
  • SHARE ROW EXCLUSIVE
  • EXCLUSIVE
  • ACCESS EXCLUSIVE

Below is what each one generally means in practice, plus typical statements that take it.


ACCESS SHARE

What it’s for: ordinary reads.
Typically taken by: SELECT (plain selects, not FOR UPDATE variants).

Concurrency impact:

  • Compatible with other ACCESS SHARE locks and most write locks.
  • Does not block normal INSERT/UPDATE/DELETE.

Why it matters:
Large numbers of readers usually aren’t the “stop-the-world” problem—but long-running reads can still get in the way of certain DDL that needs stronger locks, or they can keep transactions open and indirectly cause pain.


ROW SHARE

What it’s for: reads that intend to lock rows.
Typically taken by: SELECT ... FOR SHARE / FOR KEY SHARE (and related row-locking selects).

Concurrency impact:

  • Stronger than ACCESS SHARE.
  • Still allows a lot of concurrency, but it participates in conflicts with some stronger DDL-type locks.

ROW EXCLUSIVE

What it’s for: typical writes.
Typically taken by: INSERTUPDATEDELETEMERGE.

Concurrency impact:

  • Compatible with reads (ACCESS SHARE) and with other writers’ ROW EXCLUSIVE locks.
  • Conflicts with the more DDL-ish “share/exclusive” modes above it.

Why it matters:
Most OLTP workloads live here. If you’re seeing widespread blocking and almost everything is waiting, it’s rarely because of ROW EXCLUSIVE alone—it’s usually because something stronger entered the picture.


SHARE UPDATE EXCLUSIVE

What it’s for: operations that need stability of the relation definition and coordination with other maintenance.
Often taken by: VACUUM (including autovacuum), ANALYZECREATE INDEX CONCURRENTLY (phases vary), and other maintenance-style commands.

Concurrency impact:

  • Allows reads and writes to continue.
  • Conflicts with some other schema/maintenance operations (and can queue behind or block them).

Why it matters:
This mode is usually “production-friendly,” but it can become part of a bigger blocking chain when combined with DDL or long transactions.


SHARE

What it’s for: stronger coordination than maintenance, often to protect against concurrent writes that would invalidate an operation’s assumptions.
Sometimes taken by: certain DDL/maintenance steps (exact statements vary by version and operation).

Concurrency impact:

  • Allows reads.
  • Tends to block writers (ROW EXCLUSIVE) or other operations that need to modify the table.

SHARE ROW EXCLUSIVE

What it’s for: “I need to do something substantial, and I don’t want concurrent schema-ish things or concurrent writes stepping on it.”
Sometimes taken by: some ALTER TABLE variants and other heavy operations.

Concurrency impact:

  • More blocking than SHARE.
  • Starts to feel “big” in production because it conflicts with common write locks.

EXCLUSIVE

What it’s for: very strong operations that need near-complete control, but still not quite the “stop all access” level.
Sometimes taken by: certain ALTER TABLE operations (operation-dependent).

Concurrency impact:

  • Blocks a wide set of other operations.
  • Still not as absolute as ACCESS EXCLUSIVE.

ACCESS EXCLUSIVE

What it’s for: operations that must have the table to themselves.
Commonly taken by: many ALTER TABLE operations, DROP TABLETRUNCATE, some REINDEX/rewrite operations, and other commands that fundamentally change or replace the relation.

Concurrency impact (practically):

  • Conflicts with everything on that relation, including plain SELECT.
  • If it’s granted, it can make a hot table feel like it “went offline.”
  • If it’s waiting to be granted, it can still cause a pileup due to lock queueing effects.

Why it’s the headline “stop-the-world” lock:
If your application depends on the table (directly or through foreign keys, views, or ORM behavior), an ACCESS EXCLUSIVE lock request is one of the fastest ways to turn normal traffic into a wall of waiting sessions.


Two crucial behaviors that make this worse than it sounds

Lock waits form queues

If session A is waiting for a strong lock, and session B arrives later requesting a lock that would normally be compatible with the currently-held locks, B may still end up waiting because it’s behind A in the queue. This is one reason a single “big” lock request can make everything line up behind it.

Locks often live until transaction end

Especially with DDL, the lock may be held until COMMIT/ROLLBACK. That means even a quick operation can cause a long freeze if the transaction is left open (for example, a migration framework that wraps multiple steps in one transaction).


Practical takeaway

Most day-to-day queries use ACCESS SHARE (reads) or ROW EXCLUSIVE (writes) and coexist nicely. Stop-the-world moments nearly always involve:

  • an ACCESS EXCLUSIVE lock being held or requested on a busy table, and/or
  • a lock queue forming behind that request,
  • made worse by a long-running or idle transaction.

Next, we’ll connect these modes to real-life blocking by looking at relation locks vs row locks and what actually blocks what in practice.

6. Relation Locks vs Row Locks: What Can Block What

When PostgreSQL “freezes,” it’s tempting to think “some row is locked.” But the most disruptive incidents are usually about relation locks (table/index-level) and lock queues, not a single locked row. Understanding the difference makes diagnosis much faster.

Relation locks: the traffic lights on tables and indexes

Relation locks apply to an entire object (table, index, materialized view, sequence). They are taken in a specific lock mode (like ACCESS SHARE or ACCESS EXCLUSIVE) and conflicts are decided by the lock compatibility matrix.

Common examples:

  • Plain SELECT takes an ACCESS SHARE lock on the table.
  • INSERT/UPDATE/DELETE take ROW EXCLUSIVE.
  • Many ALTER TABLE operations take ACCESS EXCLUSIVE.

What relation locks block best:

  • DDL blocking DML: ALTER TABLE ... (often ACCESS EXCLUSIVE) blocks reads and writes.
  • DDL blocking reads: a granted ACCESS EXCLUSIVE blocks even plain SELECT.
  • Maintenance coordination: some operations (VACUUM/ANALYZE, concurrent index builds) use mid-strength locks that mostly allow traffic but can conflict with other schema/maintenance tasks.

Why relation locks cause “stop-the-world”:

  • The blocked unit is a whole hot table (or an index the hot table needs).
  • The lock request can create a queue that causes other sessions to wait even if they would otherwise be compatible.

You can observe relation locks directly in pg_locks (locktype = 'relation'), and identify the object via relation::regclass.


Row locks: conflicts on specific rows

Row locks (tuple locks) happen when statements need to change or explicitly lock rows:

  • UPDATE / DELETE lock the rows they modify.
  • SELECT ... FOR UPDATE/SHARE locks the rows it selects (depending on the clause).

Row locking conflicts are usually about two sessions touching the same rows, not about the whole table.

What row locks block best:

  • Writer vs writer on the same row:
    • Two UPDATEs hitting the same row → one waits.
  • Locker vs writer:
    • SELECT ... FOR UPDATE in one session can block an UPDATE in another session on the same rows.
  • Long transactions amplify row lock pain:
    • If a transaction updates rows and stays open, it holds those row locks longer.

How row lock waits show up:
Row lock waits often present as waiting on a transaction to finish, because the second session must wait until the first transaction commits/rolls back to know whether the row version is visible/updatable. In practice, you may see waits tied to transactionid (or related wait events) more than a neat “row lock entry.”


The key difference in “blast radius”

  • Row lock contention is usually surgical: it affects sessions that touch the same row(s). The database keeps serving other work.
  • Relation lock contention is often systemic: it can block every query that touches the table, which can quickly become “everything is stuck” in an application.

That’s why stop-the-world stories are disproportionately about ACCESS EXCLUSIVE (relation lock), not “a locked row.”


What can block what: common patterns you’ll actually see

1) DDL blocks everything on a table (classic stop-the-world)
  • Session A: ALTER TABLE big_table ... → requests ACCESS EXCLUSIVE
  • Sessions B..N: normal app queries (SELECT/INSERT/UPDATE) now wait

If the table is central, your app appears down.

2) Long-running read blocks DDL (then the queue blocks everyone else)
  • Session A: long SELECT (holds ACCESS SHARE)
  • Session B: ALTER TABLE ... requests ACCESS EXCLUSIVE and waits for A to finish
  • Session C: arrives later wanting a normal lock (e.g., ROW EXCLUSIVE for an update)
    Even though C is compatible with A, it can end up queued behind B’s pending ACCESS EXCLUSIVE request, so C waits too.

This is a common “it was just a SELECT, why did migrations take down prod?” incident.

3) Row lock pileups look like timeouts, not “global freeze”
  • Session A: updates a popular row and keeps the transaction open
  • Sessions B..N: try to update the same row → all wait on A
  • Other parts of the system may still work fine

It’s painful, but the impact is usually limited to the code paths that touch those rows.


Practical rules of thumb

  • If plain SELECTs are waiting, suspect a relation lock (often a DDL lock like ACCESS EXCLUSIVE) rather than a row lock.
  • If only updates to specific entities are timing out (e.g., “users table updates fail for one customer”), suspect row lock contention.
  • If many sessions are waiting and the blocker is “idle in transaction,” you may have a transaction holding locks open (row locks, relation locks, or both).

Next, we’ll look at the internal lock layers—lightweight locks vs heavyweight locks—and how to interpret what you see in Postgres’ monitoring views when you’re in the middle of an incident.

7. Lightweight Locks: The Hidden Locks Behind Core Operations

When most people say “Postgres locks,” they mean the SQL-visible ones: table locks, row locks, and the waits you can inspect via pg_locks. But PostgreSQL also relies heavily on lightweight locks (LWLocks)—short-lived internal locks used to coordinate access to shared in-memory structures.

LWLocks don’t usually show up as “blocking on a table,” yet under load they can be the reason your database feels stuck even when there’s no obvious ACCESS EXCLUSIVE drama.

What LWLocks are (and what they’re not)

LWLocks are:

  • Internal synchronization primitives used by the Postgres server
  • Held for very short periods in normal operation (microseconds to milliseconds)
  • Designed to protect shared memory structures where correctness matters

LWLocks are not:

  • the same as relation locks (ACCESS SHAREACCESS EXCLUSIVE, etc.)
  • row locks from UPDATE / SELECT ... FOR UPDATE
  • advisory locks

So if you’re hunting a “stop-the-world lock” and you only look at pg_locks, you can miss an LWLock bottleneck entirely.

What kinds of things LWLocks protect

LWLocks cover a wide range of core internals. Examples of areas that require LWLocks include:

  • Buffer manager and buffer mapping (finding/managing cached data pages)
  • WAL (Write-Ahead Log) coordination and related shared structures
  • Lock manager internal data structures (yes—locks have locks)
  • Replication and snapshot-related shared state
  • System catalog caches and shared metadata structures

The exact LWLock names and granularity can vary across Postgres versions, but the theme is consistent: they protect shared state that many backends may contend for.

How LWLock contention feels in production

LWLock contention often presents differently from classic DDL blocking:

  • CPU might be moderate to high (spinning, context switching, contention overhead)
  • Query latency increases broadly, not just for one table
  • Many sessions appear “active,” but they’re frequently waiting on internal events
  • You don’t necessarily see a clean “one blocker, many victims” chain

In other words, it can look like “Postgres is slow everywhere” rather than “one statement is blocked.”

How to observe LWLock waits

The most practical place to see LWLock pain is wait event reporting:

  • pg_stat_activity.wait_event_type
  • pg_stat_activity.wait_event

If you see many sessions with:

  • wait_event_type = 'LWLock' (or 'LWLockTranche' in newer versions)
  • similar wait_event values across many sessions

…that’s a signal the system is contending on an internal shared resource.

What you won’t get from this alone is a simple “kill this PID and you’re done.” LWLock contention is usually systemic: workload pattern, I/O pressure, checkpoint behavior, too many active backends, or a hot internal structure.

LWLocks vs “stop-the-world locks”

It’s useful to separate two incident archetypes:

A) Lock-queue freeze (SQL-level, usually DDL-related)

  • Many sessions waiting on relation locks
  • Clear blocker in pg_locks / pg_stat_activity
  • Often fixed by releasing/canceling the blocker or waiting for a long transaction to end

B) LWLock contention (internal bottleneck)

  • Many sessions waiting on LWLock-related wait events
  • No single SQL statement is “the blocker”
  • Fix usually involves reducing contention: tuning, schema/workload changes, connection management, checkpoint/WAL tuning, or scaling resources

Both can feel like “stop-the-world,” but they’re diagnosed and fixed differently.

Common drivers of LWLock contention

Without overfitting to any one case, LWLock hotspots are often associated with:

  • Too many concurrent active sessions (especially without a pooler)
  • High write rates that stress WAL and buffer management
  • Checkpoint or bgwriter pressure causing heavy buffer churn
  • Hot relations/indexes that concentrate access patterns
  • Catalog-heavy workloads (lots of DDL, many temporary objects, frequent plan invalidations)

The important point: LWLock problems are frequently about contention and throughput, not about a lock mode like ACCESS EXCLUSIVE.

What to do when you suspect LWLocks

If your incident looks systemic and wait events point to LWLocks:

  1. Confirm it’s not classic blocking
    Check whether there’s an obvious relation-lock queue (many sessions waiting on locks with a single root blocker).
  2. Look for a pattern in wait events
    Are most sessions waiting on the same LWLock tranche/event? That narrows the class of bottleneck.
  3. Reduce concurrency and smooth spikes
    Often the fastest mitigation is to reduce the number of concurrently active queries (pooler limits, app throttling, temporarily pausing heavy jobs).
  4. Then do the real fix
    The durable fix depends on the specific contention point: tuning checkpoints, WAL settings, pooling strategy, schema/index changes, query rewrites, or hardware/I/O improvements.

LWLocks are the “hidden gears” that keep Postgres correct and fast. Most days you’ll never notice them—but when they become a bottleneck, understanding that they’re internal contention (not “a table got locked”) is the difference between flailing and fixing.

8. Heavyweight Locks: The Ones You See in pg_locks

When you’re diagnosing a “stop-the-world” episode, the most actionable evidence usually comes from heavyweight locks—the SQL-level locks managed by Postgres’ lock manager. These are the locks that:

  • control concurrency for tables, rows (indirectly), transactions, and various objects,
  • can block other sessions for long periods,
  • and are visible in the system catalog view pg_locks.

If you can find a clear blocking chain, it almost always involves these locks.

What counts as a “heavyweight lock”

“Heavyweight” here doesn’t mean “slow.” It means:

  • the lock is tracked in the lock manager,
  • it can be waited on,
  • and it’s exposed to you for inspection.

Relation locks like ACCESS SHARE and ACCESS EXCLUSIVE are heavyweight locks. Advisory locks are also heavyweight (because they’re managed by the lock manager and visible in pg_locks).

Why pg_locks is so useful in incidents

pg_locks tells you, for each session/process:

  • what it is trying to lock (locktype + object identifiers)
  • which lock mode it holds or wants (mode)
  • whether the lock is granted or waiting (granted = true/false)

That’s the core of answering two questions under pressure:

  1. Who is blocked?
  2. Who is blocking them?

The most important columns in pg_locks

You’ll see many columns, but these are the ones you’ll use constantly:

  • pid: backend process ID holding or waiting on the lock
  • locktype: what kind of resource is locked (examples below)
  • mode: lock mode (e.g., AccessShareLockAccessExclusiveLock, etc.)
  • grantedtrue if held, false if waiting
  • relation: OID of the locked relation (when locktype = 'relation')
  • database: database OID (useful for some locktypes)
  • transactionid / virtualxid: transaction-level lock identifiers
  • classid / objid / objsubid: used for some object locks

Tip: the lock mode names in pg_locks.mode are the internal names, but they map directly to the modes discussed earlier:

  • AccessShareLock ↔ ACCESS SHARE
  • RowExclusiveLock ↔ ROW EXCLUSIVE
  • AccessExclusiveLock ↔ ACCESS EXCLUSIVE
    …and so on.

Common locktype values you’ll actually encounter

relation

Locks on tables, indexes, materialized views, sequences, etc.

  • This is the main one for DDL/DML blocking analysis.
  • You can resolve the name with relation::regclass.

Stop-the-world pattern: a waiting or granted AccessExclusiveLock on a hot table.

transactionid

A backend waiting for a specific transaction to finish.

  • Common with row-level conflicts: “I need to update this row, but I must wait for the transaction that last touched it to commit/rollback.”
  • This often means the “blocker” is a long transaction, possibly “idle in transaction.”

virtualxid

A lock on a virtual transaction ID.

  • Often appears when certain operations need to ensure no conflicting snapshots/transactions are active.
  • Can be involved in DDL waiting patterns that are confusing if you only look for relation locks.

advisory

Advisory locks taken by application code.

  • Great tool when used intentionally.
  • Dangerous when a global advisory lock becomes a single point of failure.

Other types you may see

Depending on workload and features, you may run into:

  • database (locks at database scope)
  • extend (related to extending relation files)
  • object (miscellaneous catalog objects)
  • page / tuple (less commonly used in day-to-day diagnosis; row-level behavior is more often inferred via transactionid waits)

How to interpret “granted = false”

A row in pg_locks with granted = false means:

  • the backend has requested a lock,
  • it conflicts with an already-granted lock,
  • and it is waiting in the queue.

In incidents, it’s worth remembering a subtle but important point:

  • The session that’s waiting might not be the root cause.
    The root cause is often the session holding a lock (or holding a transaction open) that prevents the requested lock from being granted.

Turning pg_locks into something human-readable

Raw pg_locks is hard to read because you need to join it to other views to see query text, usernames, and relation names.

Here’s a practical “what’s blocked and by whom?” query that focuses on relation locks and shows the table names. (This won’t cover every possible locktype, but it’s very effective for classic stop-the-world events.)

SQLSELECT
  a.pid                              AS waiting_pid,
  a.usename                          AS waiting_user,
  a.application_name                 AS waiting_app,
  a.state                            AS waiting_state,
  a.wait_event_type,
  a.wait_event,
  a.query                            AS waiting_query,
  now() - a.query_start              AS waiting_for,
  l.mode                             AS waiting_lock_mode,
  l.relation::regclass               AS locked_relation,
  b.pid                              AS blocking_pid,
  b.usename                          AS blocking_user,
  b.application_name                 AS blocking_app,
  b.state                            AS blocking_state,
  b.query                            AS blocking_query,
  now() - b.xact_start               AS blocking_xact_age
FROM pg_stat_activity a
JOIN pg_locks l
  ON l.pid = a.pid
 AND l.granted = false
 AND l.locktype = 'relation'
JOIN pg_locks bl
  ON bl.locktype = l.locktype
 AND bl.relation = l.relation
 AND bl.granted = true
JOIN pg_stat_activity b
  ON b.pid = bl.pid
ORDER BY waiting_for DESC;

Notes:

  • This highlights relation-level waits, which is where most stop-the-world DDL incidents live.
  • It doesn’t compute full lock compatibility (Postgres has its own rules), but in practice it quickly surfaces the likely blocker(s).
  • If you see many waiting PIDs on the same locked_relation, you’ve found your hotspot.

What heavyweight locks tell you during a “freeze”

Heavyweight locks are where you get your most operationally useful answers:

  • Which table is the choke point?
  • Is someone holding (or requesting) AccessExclusiveLock?
  • Is the real issue a long transaction holding locks open?
  • Did an advisory lock serialize more than you intended?

Next, we’ll take this from “I can see locks” to “I understand why the whole system queued up” by digging into lock queueing and the “one waiter can stall many” effect.

9. Lock Queuing and Why One Waiter Can Stall Many Others

The most confusing part of PostgreSQL locking—especially during “stop-the-world” incidents—is that the session causing the outage is often not the one holding a lock. It can be the one waiting for a strong lock, because PostgreSQL must preserve correctness and fairness by queueing lock requests.

This is how you get the classic situation: “A migration is waiting for a lock, but while it waits, the whole app starts timing out.”

The core idea: locks are granted in a queue

For any given lockable object (like a table), Postgres maintains a queue of lock requests. When a backend requests a lock:

  • If it is compatible with the locks currently granted and granting it wouldn’t violate the queue ordering rules, it can be granted.
  • Otherwise, it waits.

That queue ordering rule is the amplifier: once a conflicting request is waiting, later requests can end up waiting too, even if they would have been compatible with the locks that are currently granted.

A concrete scenario: the “pending ACCESS EXCLUSIVE” pileup

Imagine a busy table orders.

  1. Session A is running a long query:
SQLSELECT * FROM orders WHERE created_at >= now() - interval '30 days';

It takes ACCESS SHARE on orders and keeps it for the duration of the query (and possibly longer if inside a long transaction).

  1. Session B starts a migration:
SQLALTER TABLE orders ADD COLUMN foo text;

This commonly requires ACCESS EXCLUSIVE on orders. It cannot be granted while A holds ACCESS SHARE, so B waits.

  1. Session C arrives and wants to do a normal write:
SQLUPDATE orders SET status = 'paid' WHERE id = 123;

Normally, C’s ROW EXCLUSIVE is compatible with A’s ACCESS SHARE, so you’d expect it to proceed.

But because B is already waiting for ACCESS EXCLUSIVE, Postgres may prevent C from “cutting in line” if granting C would prolong B’s wait. The result: C queues behind B.

Now you’ve got:

  • A: granted ACCESS SHARE (the original long reader)
  • B: waiting ACCESS EXCLUSIVE (the “big request”)
  • C, D, E…: normal traffic also waiting because they’re stuck behind the pending big request

From the application’s perspective, it looks like the database suddenly stopped serving orders entirely.

Why Postgres behaves this way

Two design goals collide here:

  • Correctness: conflicts must be respected.
  • Fairness / starvation avoidance: a strong lock request should eventually run; otherwise, a busy table with constant writes could starve DDL forever.

If Postgres kept granting an endless stream of compatible “small” locks while a strong lock waits, the strong lock might never get a chance to acquire the table. Queueing prevents that.

The tradeoff: you can get a “stop-the-world” feeling when a strong lock request appears.

“The blocker” vs “the head-of-line waiter”

In these incidents, it helps to separate roles:

  • Root blocker: the session holding a conflicting lock (often a long-running query or long transaction). In the example, that’s session A.
  • Head-of-line waiter: the session waiting for a strong lock that causes others to queue behind it. In the example, that’s session B.
  • Victims: everyone else who now can’t get locks quickly. In the example, session C and friends.

Operationally:

  • Killing B (the waiter) can “unfreeze” the line immediately, because C and others can proceed with their compatible locks again.
  • But the underlying issue (A being long-running) still exists; you just avoided the queue amplification.

That’s why “cancel the migration” often restores service faster than “wait for it.”

How to recognize lock-queue amplification

Signs you’re seeing queueing behavior (not just normal blocking):

  • Many sessions are waiting on the same relation.
  • The waiting lock mode of the head-of-line session is very strong (often AccessExclusiveLock).
  • The root blocker might be a completely different query than the one you expect.
  • You may see many sessions waiting even though, pairwise, their requested locks would be compatible with the currently held locks.

In monitoring terms:

  • pg_stat_activity shows lots of sessions in wait_event_type = 'Lock'.
  • pg_locks shows a mix of granted weak locks and one pending strong lock request.

Practical implications for production

This queueing behavior drives a lot of “rules of thumb” you’ll see from experienced Postgres operators:

  • Avoid strong locks on hot tables during peak traffic.
  • Prefer concurrent/online schema changes when possible (CREATE INDEX CONCURRENTLY, etc.).
  • Keep transactions short, especially around DDL and migrations.
  • Watch for long-running reads (analytics queries, report jobs) that inadvertently block migrations.

What this section should change in your thinking

A stop-the-world incident is not always “someone held an exclusive lock.” It’s often:

  • a long-running session holding a weak lock,
  • plus a strong lock request that starts waiting,
  • plus Postgres’ lock queue fairness turning that wait into a traffic jam.

Next, we’ll ground this with the compatibility rules (which lock modes conflict) so you can predict when a queued request is likely to trigger a pileup.

10. Compatibility Matrix: Which Locks Conflict in Practice

PostgreSQL decides whether a lock can be granted by checking compatibility: “Does the requested lock mode conflict with any lock modes currently granted on this same object?”

You don’t need to memorize the full official matrix to operate Postgres safely, but you do need a practical version of it—especially around the modes that show up in production: reads (ACCESS SHARE), writes (ROW EXCLUSIVE), maintenance (SHARE UPDATE EXCLUSIVE), and the “stop-the-world” lock (ACCESS EXCLUSIVE).

First: the mental model

For a given table:

  • If two lock modes are compatible, they can be held at the same time.
  • If they conflict, the later request waits until the conflicting lock is released.
  • Separately from pure compatibility, queueing can still make compatible requests wait if a conflicting request is already queued ahead of them (covered in the previous section). Compatibility explains “can these coexist,” while queueing explains “why are they waiting anyway?”

Practical compatibility rules you’ll use most

Reads don’t normally block writes (and vice versa)
  • ACCESS SHARE (plain SELECT)
    • Compatible with: ACCESS SHAREROW EXCLUSIVE, and most other common modes
    • Conflicts with: ACCESS EXCLUSIVE (and some high-strength modes used by certain DDL)

This is why normal reads and writes can run together under MVCC.

Writes can usually run together
  • ROW EXCLUSIVE (typical INSERT/UPDATE/DELETE)
    • Compatible with: ACCESS SHARE and other ROW EXCLUSIVE locks
    • Conflicts with: stronger “share/exclusive” modes involved in DDL and heavy maintenance (notably SHARESHARE ROW EXCLUSIVEEXCLUSIVEACCESS EXCLUSIVE, and some operations using SHARE UPDATE EXCLUSIVE depending on the other mode)

This is why OLTP workloads with lots of concurrent writes can work fine—until DDL enters the picture.

Maintenance tries to be friendly, but it’s not free
  • SHARE UPDATE EXCLUSIVE (often VACUUM/ANALYZE-ish)
    • Usually compatible with: reads and writes (ACCESS SHAREROW EXCLUSIVE)
    • Conflicts with: many schema-change operations and other “strong” modes, and it can be blocked by them

That’s why VACUUM typically doesn’t block your application, but certain migrations can interfere with it (and vice versa).

ACCESS EXCLUSIVE conflicts with everything
  • ACCESS EXCLUSIVE
    • Conflicts with: all other modes, including ACCESS SHARE
    • Effect: once granted, nobody else can read or write that table until it’s released

This is the lock mode that most closely matches “stop-the-world” behavior for a single relation.

A “useful in practice” mini-matrix

Below is a simplified compatibility table for the modes you’ll see most often. “✓” means compatible; “✗” means conflict (so one side must wait). This is not the full official matrix, but it matches the operational intuition you need for incident response and safe changes.

Requested \ HeldACCESS SHAREROW EXCLUSIVESHARE UPDATE EXCLUSIVEACCESS EXCLUSIVE
ACCESS SHARE
ROW EXCLUSIVE✓ (usually)
SHARE UPDATE EXCLUSIVE✓ (usually)✗ (often conflicts with itself for some ops)
ACCESS EXCLUSIVE

Two important notes:

  • “Usually” depends on the exact command and phase (for example, some “concurrent” operations change lock levels during different phases).
  • Even when the matrix says “✓”, lock queueing can still cause waiting if there’s a conflicting request ahead in the queue.

Where people get surprised

Surprise #1: “Why did my ALTER TABLE block simple SELECTs?”

Because many ALTER TABLE operations require ACCESS EXCLUSIVE, and ACCESS EXCLUSIVE conflicts with ACCESS SHARE.

Surprise #2: “Why are writes blocked if the only thing running is a read?”

They usually aren’t—unless a strong lock request is queued behind that read, and now other sessions are queuing behind the strong request.

Surprise #3: “Why did a ‘concurrent’ command still block something?”

“Concurrent” usually means “minimizes blocking of normal reads/writes,” not “never takes locks.” Many concurrent operations still take locks briefly (or take different locks at different phases).

How to apply this during an incident

When you see a stall:

  1. Identify the hot relation(s) most waiters have in common.
  2. Look for any session holding or waiting for AccessExclusiveLock.
  3. If no AccessExclusiveLock is present, look for other strong modes (ExclusiveLockShareRowExclusiveLock, etc.) and check whether they conflict with typical workload locks (AccessShareLockRowExclusiveLock).
  4. Remember the queue: the “wrong” session might be the head-of-line waiter, not the root blocker.

Reference note

PostgreSQL’s full lock compatibility matrix and lock mode definitions are documented in the official manual under explicit locking / table-level locks. For exact behavior by mode and for less common lock types, rely on the docs for your Postgres version.
Citations: PostgreSQL Documentation – Explicit Locking and Table-Level Locks (Lock Modes and Conflict Table) https://www.postgresql.org/docs/current/explicit-locking.html

11. The Real “Stop-the-World” Lock: ACCESS EXCLUSIVE and What It Blocks

If you had to pick one lock mode that most often creates the “everything just froze” feeling in PostgreSQL, it’s ACCESS EXCLUSIVE.

It is the strongest relation lock mode. When it’s granted on a table, it effectively makes that table unavailable to other sessions for the duration of the lock. When it’s waiting to be granted, it can still trigger a pileup because of lock queueing (the “head-of-line waiter” effect).

What ACCESS EXCLUSIVE actually blocks

ACCESS EXCLUSIVE conflicts with all other relation lock modes, including the weakest one:

  • Blocks plain SELECT (ACCESS SHARE)
  • Blocks INSERT/UPDATE/DELETE (ROW EXCLUSIVE)
  • Blocks row-locking reads (ROW SHARE)
  • Blocks maintenance modes (SHARE UPDATE EXCLUSIVE)
  • Blocks other DDL and schema changes (all the stronger modes)

Operationally: if a hot table is central to your app, an ACCESS EXCLUSIVE lock on it can feel like the app lost its database, even though the server is up.

Citations: PostgreSQL documentation, Table-Level Locks (lock modes and conflict behavior)
https://www.postgresql.org/docs/current/explicit-locking.html

Two ways it causes outages

1) It’s granted (hard block)

If a session successfully acquires ACCESS EXCLUSIVE on a busy table:

  • every other session that touches that table will wait,
  • connection pools fill with blocked sessions,
  • upstream timeouts and retries amplify load.

This is the obvious “stop-the-world” pattern: one session holds the table hostage until it commits/rolls back.

2) It’s waiting (queue amplification)

Even if ACCESS EXCLUSIVE is not granted yet, its presence in the lock queue can be damaging.

A common chain is:

  • Session A holds a weak lock for a long time (often just ACCESS SHARE from a long SELECT).
  • Session B requests ACCESS EXCLUSIVE (DDL) and must wait for A to finish.
  • Sessions C, D, E… arrive needing normal locks (reads/writes). Even though those locks might be compatible with A, they can end up queued behind B’s pending exclusive request to avoid starving B.

Result: it looks like “DDL took down the app,” but the root blocker might be a long-running read or an open transaction.

This behavior is part of PostgreSQL’s lock manager fairness rules and is a big reason production migrations can cause surprise outages.

What kinds of statements require ACCESS EXCLUSIVE

Many commands that change the physical structure or metadata of a table require ACCESS EXCLUSIVE, especially when they need to guarantee nobody can concurrently read/write the relation while it’s being rewritten or its definition is changing.

Common examples include:

  • Many ALTER TABLE variants (especially those that rewrite the table or change constraints in certain ways)
  • TRUNCATE
  • DROP TABLE / DROP INDEX (on the object being dropped)
  • Some forms of REINDEX (non-concurrent)
  • Operations that rewrite a table (often the “dangerous” ones)

Exactly which statements take ACCESS EXCLUSIVE and for how long can vary by PostgreSQL version and the specific subcommand. The safe operational assumption is: if it’s DDL on a hot table, assume it might request ACCESS EXCLUSIVE until proven otherwise.

Citations: PostgreSQL documentation, ALTER TABLETRUNCATE, and lock mode reference
https://www.postgresql.org/docs/current/sql-altertable.html
https://www.postgresql.org/docs/current/sql-truncate.html
https://www.postgresql.org/docs/current/explicit-locking.html

How long is it held?

A key operational gotcha: the lock is generally held for at least the duration of the statement, and often until the end of the transaction.

So these two are very different in impact:

SQLALTER TABLE orders ADD COLUMN foo text;
-- autocommit: lock held briefly (still risky on a hot table)

vs

SQLBEGIN;
ALTER TABLE orders ADD COLUMN foo text;
-- ... do other stuff, wait for an app deploy, etc. ...
COMMIT;
-- lock can be held far longer than the DDL itself needed

Even if the DDL step is “fast,” holding the transaction open extends the lock’s blast radius.

Citations: PostgreSQL documentation, Explicit Locking and transactional behavior of DDL
https://www.postgresql.org/docs/current/explicit-locking.html

Why this lock is the one to respect

ACCESS EXCLUSIVE is not “bad”—Postgres needs it to safely perform certain changes. The problem is running it on hot objects at the wrong time or letting transactions linger.

In practical terms, whenever you see a stop-the-world incident, you should immediately ask:

  • Is there a session holding AccessExclusiveLock?
  • Is there a session waiting for AccessExclusiveLock that has become head-of-line and caused a queue?
  • What transaction is preventing it from being granted?

Next, we’ll enumerate the most common operations that take ACCESS EXCLUSIVE, so you can recognize risky commands before you run them in production.

12. Common Operations That Take ACCESS EXCLUSIVE Locks

ACCESS EXCLUSIVE is PostgreSQL’s strongest table-level lock mode. Many operations that change table structure or replace its underlying storage need it to guarantee that no other session is reading from or writing to the table while the change happens.

Exactly which statements take ACCESS EXCLUSIVE can vary by PostgreSQL version and by the specific subcommand, but the following list covers the most common real-world triggers you should treat as “production risky until proven otherwise.”

Citations: PostgreSQL docs on lock modes and SQL commands (see links in each subsection).


ALTER TABLE (many subcommands)

Many ALTER TABLE operations request ACCESS EXCLUSIVE, especially those that change physical layout, rewrite the table, or require strong metadata stability.

Common examples that are frequently ACCESS EXCLUSIVE in practice:

  • Adding/dropping columns (dropping is often cheap but still needs strong locking for catalog changes)
  • Changing a column type (often forces a rewrite, depending on the change)
  • Setting a column NOT NULL (often requires validation; may hold strong locks while it checks)
  • Adding/dropping constraints in ways that require validation or catalog updates
  • Enabling/disabling triggers (impactful and typically needs strong coordination)

Reference: ALTER TABLE
https://www.postgresql.org/docs/current/sql-altertable.html
Lock mode reference:
https://www.postgresql.org/docs/current/explicit-locking.html

Operational tip: even when an ALTER TABLE is “fast,” it can still block hard because it must acquire the lock first. The wait to get the lock is often the outage.


TRUNCATE

TRUNCATE is designed to be fast by deallocating or resetting storage rather than deleting row-by-row, but that speed comes with a big locking requirement: it needs to ensure no concurrent transaction can see a partial state.

  • TRUNCATE takes an ACCESS EXCLUSIVE lock on the table(s) it truncates.
  • If you truncate multiple tables in one statement, it locks all of them.

Reference: TRUNCATE
https://www.postgresql.org/docs/current/sql-truncate.html
Lock mode reference:
https://www.postgresql.org/docs/current/explicit-locking.html


DROP TABLEDROP INDEXDROP MATERIALIZED VIEW (on the object being dropped)

Dropping an object must prevent concurrent use of it and update catalogs safely, so it generally requires a very strong lock on that object.

Reference: DROP TABLE / DROP INDEX
https://www.postgresql.org/docs/current/sql-droptable.html
https://www.postgresql.org/docs/current/sql-dropindex.html
Lock mode reference:
https://www.postgresql.org/docs/current/explicit-locking.html


Non-concurrent index maintenance

When you rebuild an index without the concurrent option, PostgreSQL generally needs strong locks that block normal access patterns.

  • REINDEX INDEX ... (non-concurrent) can block operations on that index and may require strong locks on the underlying table depending on the exact command and version.
  • CREATE INDEX (non-concurrent) does not usually block reads, but it can block writes in specific ways and it still takes locks you must respect. The important operational point is: if you need an online rebuild, you typically want the CONCURRENTLY variants where available.

Reference: REINDEX / CREATE INDEX
https://www.postgresql.org/docs/current/sql-reindex.html
https://www.postgresql.org/docs/current/sql-createindex.html
Lock mode reference:
https://www.postgresql.org/docs/current/explicit-locking.html

(Note: locking details differ by “concurrent vs non-concurrent” and by phase; later sections on concurrent index operations cover this in depth.)


Table rewrite operations

Some commands effectively create a new copy of the table data and swap it in. Those operations almost always need ACCESS EXCLUSIVE for at least part of their execution because concurrent readers/writers can’t safely interact with a moving target.

Examples include:

  • VACUUM FULL
  • CLUSTER
  • Some ALTER TABLE subcommands that rewrite the heap

References:
VACUUM (incl. VACUUM FULL): https://www.postgresql.org/docs/current/sql-vacuum.html
CLUSTERhttps://www.postgresql.org/docs/current/sql-cluster.html
Lock mode reference: https://www.postgresql.org/docs/current/explicit-locking.html


Partition maintenance that changes attachment/detachment (often strong)

Partitioning can reduce blast radius, but certain partition operations still require strong locks:

  • Attaching/detaching partitions
  • Moving data between partitions in ways that require metadata changes

Depending on the exact operation and version, these can require very strong locks (often on the partitioned table and/or partitions).

Reference: partitioning and ALTER TABLE ... ATTACH PARTITION
https://www.postgresql.org/docs/current/ddl-partitioning.html
https://www.postgresql.org/docs/current/sql-altertable.html


The “hidden” trigger: DDL inside a long transaction

A frequent outage pattern isn’t “the DDL took a long time.” It’s:

  • the DDL needed ACCESS EXCLUSIVE,
  • it waited (queue amplification),
  • and once it got the lock, it was held longer than expected because the transaction stayed open.

This is why migration frameworks that wrap many steps in a single transaction can be dangerous on hot tables.

Reference: Locking + transactional DDL behavior
https://www.postgresql.org/docs/current/explicit-locking.html


Checklist: treat these as ACCESS EXCLUSIVE until you verify

If you’re operating a busy production database, assume you need a plan (low-traffic window, online alternative, lock/statement timeouts, rollback strategy) before running:

  • ALTER TABLE ... (anything non-trivial)
  • TRUNCATE
  • DROP ...
  • VACUUM FULL
  • CLUSTER
  • non-concurrent REINDEX or heavy rebuild operations

Next, we’ll drill into DDL locking more systematically—what different schema changes do, why some are safe and others rewrite tables, and how to run the risky ones without triggering a lock queue pileup.

13. DDL and Locking: CREATE, ALTER, DROP and Their Impact

DDL (Data Definition Language) is where PostgreSQL most often shifts from “high concurrency” to “everyone wait.” That’s not because Postgres is fragile—it’s because changing schema safely requires strong guarantees about what other sessions can see and do while the metadata is being changed.

This section focuses on the practical reality: CREATE is often the least disruptive, ALTER is the most likely to hurt you, and DROP can be deceptively dangerous depending on what’s being dropped and how widely it’s referenced.

References for exact lock modes and command behavior (varies by version and subcommand):


Why DDL needs locks at all

DDL changes system catalogs (metadata) that every query relies on: table definitions, column types, constraints, indexes, dependencies. If those catalogs changed mid-query without coordination, you could get inconsistent plans, incorrect results, or crashes.

So Postgres uses locks to ensure:

  • other sessions don’t concurrently change the same objects in incompatible ways,
  • queries either see the “old” schema or the “new” schema in a consistent way,
  • table rewrites or file swaps happen safely.

CREATE: Usually low blast radius, with one big exception

CREATE TABLE

Creating a brand-new table is usually safe from a concurrency perspective because nothing depends on it yet. It still takes locks, but it rarely blocks application traffic unless you’re doing unusual catalog-heavy patterns (like creating/dropping many objects rapidly).

CREATE INDEX (non-concurrent)

A normal CREATE INDEX is often less disruptive than people fear, but it is not “free”:

  • It must prevent certain concurrent schema changes.
  • It can still be a performance event (I/O, CPU) even when locking impact is acceptable.

If you need to keep write traffic flowing, the “big exception” is that non-concurrent index creation can interfere with writes. For production OLTP tables, the safer default is often CREATE INDEX CONCURRENTLY (covered later).

Reference: CREATE INDEX (including CONCURRENTLY notes)
https://www.postgresql.org/docs/current/sql-createindex.html

CREATE VIEWCREATE FUNCTION, etc.

These usually have limited direct locking impact on hot tables unless they touch them in a way that forces dependency checks or compilation steps. The bigger operational risk tends to be deployment mistakes (bad plans, incompatible changes), not stop-the-world locking.


ALTER: The most common cause of stop-the-world incidents

ALTER TABLE is the DDL family most likely to require ACCESS EXCLUSIVE and trigger lock queue pileups.

The impact depends on what kind of alteration you’re doing:

Metadata-only ALTERs (often fast, still needs strong coordination)

Examples: renaming a column, changing comments, some constraint changes.
These can be quick to execute, but:

  • they can still require strong locks to ensure no concurrent session uses an inconsistent definition,
  • the biggest risk is waiting to acquire the lock on a hot table.
Table-rewriting ALTERs (can be catastrophic if unplanned)

Some ALTERs force PostgreSQL to rewrite the entire table (copy data to a new physical structure). Common triggers:

  • many column type changes,
  • adding a column with a non-null default in older versions or certain patterns,
  • changing storage parameters that require rewriting,
  • other changes that affect the physical layout.

Table rewrite = longer lock hold time and huge I/O, which is the perfect recipe for a stop-the-world episode.

Reference: ALTER TABLE details and which subcommands rewrite tables
https://www.postgresql.org/docs/current/sql-altertable.html

Constraint validation and foreign keys

Some constraint operations aren’t just metadata flips—they require scanning data to prove validity. That can mean:

  • longer-running operations,
  • stronger locks held longer,
  • or strategies like adding constraints as NOT VALID and validating later (often used to reduce blocking).

(We cover FKs and “NOT VALID” patterns in a dedicated section later.)


DROP: Often quick, but can block in surprising ways

Dropping an object needs strong locking because it removes catalog entries and dependencies.

DROP TABLE
  • Requires preventing concurrent access to the table being dropped.
  • Can cascade to dependent objects (CASCADE), potentially widening the lock footprint.

Reference:
https://www.postgresql.org/docs/current/sql-droptable.html

DROP INDEX

Dropping an index is sometimes thought of as harmless. But it can still block operations that depend on that index’s metadata stability, and it can have big performance implications immediately after (plans change, workload shifts).

Reference:
https://www.postgresql.org/docs/current/sql-dropindex.html

Dependency surprises

A DROP might lock or invalidate more than you expect because of dependencies:

  • views, functions, triggers,
  • foreign keys,
  • materialized views,
  • ORMs and prepared statements that assume the old schema.

This is less “lock mode trivia” and more “blast radius awareness.”


The operational reality: DDL pain is often about lock acquisition, not execution

Even “fast DDL” can take your app down if:

  • the table is hot,
  • a long transaction is holding a weak lock,
  • your DDL requests a strong lock (often ACCESS EXCLUSIVE),
  • lock queueing causes everyone else to line up behind the DDL request.

So the risky part is frequently the wait to get the lock, not the milliseconds of catalog change once it’s acquired.


Safe patterns you should associate with DDL

When you run CREATE/ALTER/DROP in production, the difference between a routine deploy and an outage is usually process:

  • Use lock timeouts so DDL fails fast rather than queueing forever.
  • Use statement timeouts to avoid long-running surprises.
  • Prefer concurrent variants where available (CREATE INDEX CONCURRENTLY).
  • Keep transactions short; don’t leave migration transactions open.
  • Schedule high-risk ALTERs (especially table rewrites) intentionally.

(Those mitigation techniques get concrete in later sections.)


Takeaway

  • CREATE is often low-lock-risk but can still be resource-heavy (indexes).
  • ALTER TABLE is the #1 source of stop-the-world behavior because it commonly requires ACCESS EXCLUSIVE and may rewrite data.
  • DROP can be quick but still requires strong locking and can have unexpected dependency blast radius.

Next, we’ll go deeper into the specific DDL operations most likely to cause widespread blocking—starting with TRUNCATE and why it’s fast but disruptive.

14. TRUNCATE Locking: Fast Operation, Big Blast Radius

TRUNCATE is one of those PostgreSQL commands that’s simultaneously a performance gift and an operational foot-gun. It’s extremely fast compared to DELETE FROM big_table, but the tradeoff is that it needs very strong locking to be safe.

Why TRUNCATE is so fast

DELETE removes rows one by one, generating lots of row versions, WAL, and vacuum work. TRUNCATE, on the other hand, is closer to “throw away the table’s contents efficiently”:

  • it doesn’t scan and delete row-by-row,
  • it resets the relation to empty (implementation details vary, but the important part is it’s not doing per-row work).

This is why it’s often used for staging tables, ETL pipelines, and periodic refresh workflows.

Reference: PostgreSQL TRUNCATE docs
https://www.postgresql.org/docs/current/sql-truncate.html

The lock it takes: ACCESS EXCLUSIVE

The cost of that speed is concurrency: TRUNCATE takes an ACCESS EXCLUSIVE lock on each table it truncates.

That means:

  • it blocks all reads (SELECT) and writes (INSERT/UPDATE/DELETE) on that table,
  • it conflicts with every other table lock mode.

If the table is touched by production traffic, TRUNCATE can instantly create a “stop-the-world” experience.

References:

Why the blast radius can be bigger than “just that table”

There are a few common ways TRUNCATE spreads beyond what you intended:

1) Truncating multiple tables in one command
SQLTRUNCATE a, b, c;

You’re now requesting ACCESS EXCLUSIVE on all of them. If any one of those tables is busy, the statement waits—and while it waits, it can become head-of-line and cause lock queue pileups on whichever table(s) it’s waiting for.

2) Foreign keys: TRUNCATE ... CASCADE

If other tables reference the target via foreign keys, Postgres won’t let you truncate safely without handling those references.

  • TRUNCATE ... CASCADE automatically truncates referencing tables too.
  • That means more tables locked ACCESS EXCLUSIVE.
  • It’s very easy to accidentally lock a large portion of your schema.

Reference: TRUNCATE and CASCADE behavior
https://www.postgresql.org/docs/current/sql-truncate.html

3) Sequences and identity columns: RESTART IDENTITY

When you do:

SQLTRUNCATE my_table RESTART IDENTITY;

Postgres resets associated sequences. That’s usually what you want for staging tables, but it’s another reminder that TRUNCATE is a “reset state” operation, not just “delete data.”

Reference: RESTART IDENTITY
https://www.postgresql.org/docs/current/sql-truncate.html

The most common production failure mode

The outage pattern is rarely “TRUNCATE ran for a long time.” It’s usually:

  1. The app is actively using the table.
  2. Someone runs TRUNCATE.
  3. TRUNCATE requests ACCESS EXCLUSIVE and either:
    • gets it and instantly blocks everyone, or
    • waits to get it, becomes head-of-line, and triggers queue amplification.

Either way, the app sees timeouts and connection pileups.

Safer alternatives and patterns

Depending on what you’re trying to do, you can often avoid truncating a hot table:

  • For “clear and refill” workflows: use a new table and swap references (or partition swapping), rather than truncating the live table.
  • For batch pipelines: truncate only isolated staging tables not touched by OLTP traffic.
  • For large deletes in production: a DELETE in controlled batches may be slower, but it can be less disruptive because it doesn’t require ACCESS EXCLUSIVE (it uses row-level locks and ROW EXCLUSIVE on the table).

There isn’t a one-size-fits-all answer; the key is recognizing that TRUNCATE is structurally disruptive even if it’s computationally cheap.

Takeaway

TRUNCATE is fast because it avoids row-by-row work, but that speed comes from taking the strongest lock PostgreSQL has on the table: ACCESS EXCLUSIVE. On a hot table, that’s not a maintenance command—it’s an outage trigger unless you plan it carefully.

Next, we’ll look at VACUUM and autovacuum—operations that are designed to be online-friendly—and how their lock behavior differs from table-rewriting commands like VACUUM FULL.

15. VACUUM, Autovacuum, and Lock Behavior

VACUUM is PostgreSQL’s garbage collector for MVCC: it cleans up dead row versions so tables don’t bloat forever, and it helps keep query performance stable. Because it has to run all the time on busy systems, PostgreSQL designs regular VACUUM (and autovacuum) to be as non-blocking as possible.

But “non-blocking” doesn’t mean “no locks,” and it doesn’t mean vacuum can’t get stuck or contribute to production pain.

References:


What regular VACUUM does (and doesn’t do)

A standard VACUUM:

  • removes dead tuples from visibility (so space can be reused),
  • updates visibility information (visibility map),
  • may update statistics if ANALYZE is included (VACUUM (ANALYZE)).

It does not rewrite the whole table and it does not require exclusive access to the table the way VACUUM FULL does (next section).

Reference: https://www.postgresql.org/docs/current/sql-vacuum.html


What locks does VACUUM take?

Regular VACUUM takes a lock mode that is designed to allow normal traffic to continue. In practical terms:

  • It does not block normal SELECTINSERTUPDATEDELETE on the table.
  • It can conflict with some DDL or table-rewrite operations that want very strong locks.

Under the hood, VACUUM uses a table-level lock that allows concurrent reads/writes, and it also uses short-lived internal locks while it works through pages. The important operational point is that it’s intended to be “online.”

References: lock mode overview and conflict rules
https://www.postgresql.org/docs/current/explicit-locking.html
Vacuum’s concurrency notes: https://www.postgresql.org/docs/current/sql-vacuum.html


Autovacuum: same locking model, different operational risks

Autovacuum is just PostgreSQL automatically running vacuum/analyze in background workers. Lock behavior is similar to manual vacuum, but autovacuum adds two common operational dynamics:

1) It competes for resources

Even if it doesn’t block your queries via relation locks, it can still affect latency through:

  • extra I/O,
  • buffer cache churn,
  • CPU usage,
  • WAL generation (depending on workload).

So you can have “everything feels slower” without seeing obvious lock waits.

Reference: https://www.postgresql.org/docs/current/runtime-config-autovacuum.html

2) It can get starved or prevented from finishing

Autovacuum can be disrupted by things that keep old row versions “pinned” as still potentially visible:

  • long-running transactions
  • sessions stuck idle in transaction
  • long-running replication slots (in some setups) holding back cleanup

The result is classic bloat growth: vacuum runs but can’t reclaim enough, tables swell, indexes bloat, and performance degrades over time.

Reference: vacuum and MVCC horizon concepts are discussed across vacuum docs and MVCC/transactions docs
https://www.postgresql.org/docs/current/sql-vacuum.html


How VACUUM interacts with stop-the-world incidents

Regular vacuum is rarely the direct cause of a lock-based “stop-the-world” outage. More often, it’s involved indirectly:

  • A migration requests ACCESS EXCLUSIVE and waits.
  • Vacuum (or other sessions) continues taking locks compatible with current activity.
  • The lock queue behind the migration grows, and then suddenly everything is waiting (queue amplification).

Vacuum itself is usually not the villain, but it’s part of the lock ecosystem.


The two vacuum situations that do deserve extra attention

1) Anti-wraparound vacuum

To prevent transaction ID wraparound (a correctness issue), PostgreSQL will force more aggressive vacuuming when needed. These vacuums can be more urgent and more expensive.

  • The intent is safety, not convenience.
  • If you’re near wraparound, the system may spend significant effort vacuuming, which can cause major performance impact even if it isn’t “blocking” in the classic lock sense.

Reference: vacuum and wraparound notes
https://www.postgresql.org/docs/current/sql-vacuum.html

2) VACUUM on heavily contended tables

On extremely hot tables, vacuum may struggle to make progress efficiently (lots of churn). Again, this is more about throughput and contention than about ACCESS EXCLUSIVE, but the user-visible symptom can still be “the database feels jammed.”


What to check when you suspect vacuum-related pain

If you’re diagnosing a slowdown or weird waiting:

  • Look at pg_stat_activity wait events.
    If you see a lot of wait_event_type = 'Lock', that’s more likely DDL / relation locking than vacuum itself.
    If you see I/O-related waits or LWLock waits, vacuum may be contributing via resource pressure.
  • Identify long-running transactions.
    These are the #1 reason vacuum can’t reclaim space and bloat grows.
  • Inspect vacuum progress views when relevant.
    PostgreSQL provides progress reporting for vacuum (pg_stat_progress_vacuum) in supported versions, which helps you see whether vacuum is stuck or slow.
    Reference: vacuum progress reporting is described in the monitoring docs:
    https://www.postgresql.org/docs/current/progress-reporting.html

Takeaway

  • Regular VACUUM and autovacuum are designed to be online-friendly: they take locks that generally allow reads and writes to proceed.
  • Their biggest risks are resource contention and being blocked from effective cleanup by long-lived transactions, not classic stop-the-world locking.
  • The truly disruptive vacuum variant is VACUUM FULL, which rewrites the table and requires strong locks—covered next.

16. VACUUM FULL: Why It’s So Disruptive

VACUUM FULL is the “nuclear option” of vacuuming. It aggressively compacts a table by rewriting it, which can reclaim disk space back to the operating system. That’s sometimes exactly what you need—but it’s also why it’s one of the most disruptive maintenance commands you can run on a busy system.

References:


What VACUUM FULL does differently from regular VACUUM

Regular VACUUM marks dead tuples as reusable inside the table and updates visibility metadata, but it generally does not shrink the table file on disk.

VACUUM FULL instead:

  • creates a new copy of the table containing only live rows,
  • compacts it (removing bloat),
  • and then swaps it in place of the old one.

This rewrite is what allows it to return disk space to the OS, but it also requires much stronger coordination.

Reference: https://www.postgresql.org/docs/current/sql-vacuum.html


The locking reason it’s disruptive: it needs exclusive access

Because VACUUM FULL is effectively replacing the underlying storage of the table, PostgreSQL must ensure no one is concurrently reading from or writing to that table during the critical parts of the operation.

In practice, VACUUM FULL takes a very strong table lock (documented as requiring an ACCESS EXCLUSIVE-level lock on the table), which:

  • blocks plain SELECT,
  • blocks INSERT/UPDATE/DELETE,
  • blocks most maintenance operations,
  • and can create lock-queue pileups if it has to wait.

References:


Why it can “stop the world” even if it’s waiting

If VACUUM FULL can’t immediately acquire its exclusive lock (because the table is busy), it will wait. As covered earlier, a pending strong lock request can become a head-of-line waiter and cause other sessions to queue behind it.

So VACUUM FULL can hurt you in two ways:

  1. Granted: the table becomes unusable to other sessions until it finishes.
  2. Waiting: it can still cause broad stalls due to lock queueing rules on that table.

The other reason it’s disruptive: it’s expensive work

Even if you had perfect lock timing, VACUUM FULL is a heavyweight operation:

  • It reads and writes the entire table (big I/O).
  • It can generate significant WAL (relevant for replication lag and disk usage).
  • It can evict useful pages from cache (buffer churn).
  • It can take a long time on large relations.

That means it’s not just a locking event; it’s often a performance event for the whole instance.

Reference: https://www.postgresql.org/docs/current/sql-vacuum.html


When people reach for it (and what to consider first)

Common reasons teams run VACUUM FULL:

  • severe table bloat after large deletes/updates,
  • reclaiming disk space urgently,
  • fixing extreme fragmentation.

Before using it, consider less disruptive options depending on your goal:

  • If you mainly need query performance, regular vacuum plus proper autovacuum tuning (and possibly pg_repack as a third-party option) may be enough.
  • If you need online compaction, tools/strategies that avoid long ACCESS EXCLUSIVE holds are often preferred (e.g., rebuild-and-swap approaches).
  • If you need disk space returned to the OSVACUUM FULL does that, but it’s best treated like a maintenance-window operation.

I can’t guarantee which alternative fits your environment without specifics, but the principle is: don’t treat VACUUM FULL as routine maintenance on hot tables.


Practical safety guardrails

If you must run VACUUM FULL in a production-like environment:

  • Run it during a low-traffic window.
  • Set lock_timeout so it fails fast rather than sitting in the queue and causing pileups.
  • Set statement_timeout to avoid an unexpectedly long stall.
  • Ensure you have sufficient disk headroom during the rewrite (you may need extra space for the new copy while the old still exists).
  • Watch replication lag if you stream WAL to replicas.

(We’ll cover timeouts and incident playbooks in later sections.)


Takeaway

VACUUM FULL is disruptive because it rewrites the table and therefore needs exclusive access (strong locking), and it also does heavy I/O work that can impact the whole system. Treat it as a last-resort maintenance operation for busy databases unless you have a plan to contain the blast radius.

17. REINDEX and Concurrency: When Index Maintenance Blocks Workloads

Index maintenance is one of the easiest ways to accidentally introduce “stop-the-world” behavior, because indexes sit on the critical path of reads and writes. Rebuilding them can require locks that block normal traffic—unless you use the concurrent variants and plan for their tradeoffs.

References:


Why you run REINDEX in the first place

Common reasons include:

  • fixing index bloat or poor clustering,
  • recovering from index corruption (rare, but it happens),
  • after certain bulk operations where an index becomes inefficient,
  • as part of maintenance on heavily updated tables.

REINDEX rebuilds the index structure from the underlying table (or rebuilds all indexes on a table/database depending on the command variant).

Reference: https://www.postgresql.org/docs/current/sql-reindex.html


The concurrency question: does it block reads or writes?

It depends mainly on whether you use CONCURRENTLY.

REINDEX without CONCURRENTLY (the blocking default)

A plain REINDEX is generally a “maintenance window” operation for hot objects because it can take strong locks that block normal use of the index (and sometimes effectively block the table’s workload depending on what queries need that index).

Typical impact patterns:

  • Writes can be blocked if they need to update the index.
  • Queries that depend on that index can stall or switch to worse plans (and some operations can’t proceed safely without the index being in a stable state).
  • On busy tables, the time to acquire the needed locks can create queues and amplify into an outage.

While the exact lock modes and affected operations vary by the specific REINDEX form (INDEX vs TABLE vs DATABASE) and PostgreSQL version, the practical operational assumption is:

  • Non-concurrent REINDEX is potentially workload-blocking and should be treated like high-risk DDL.

Reference: https://www.postgresql.org/docs/current/sql-reindex.html

REINDEX CONCURRENTLY (online-friendly, slower, more complex)

REINDEX CONCURRENTLY is designed to rebuild indexes while allowing normal reads and writes to continue as much as possible. It does this by building a new index alongside the old and swapping them, similar in spirit to CREATE INDEX CONCURRENTLY.

Key characteristics:

  • Much lower chance of “stop-the-world” blocking for application traffic.
  • Usually takes weaker locks for longer and may take short stronger locks briefly during swap/validation steps.
  • Uses more time and resources overall than non-concurrent reindexing.
  • Has restrictions (for example, you can’t run it inside a transaction block).

Reference: https://www.postgresql.org/docs/current/sql-reindex.html


The stop-the-world failure modes with REINDEX

1) Reindexing a critical index without concurrency

If you REINDEX INDEX on an index that every transaction touches (common for primary keys and hot foreign key indexes), even brief blocking can cascade:

  • transactions queue,
  • connection pools saturate,
  • timeouts and retries amplify load.

This can look exactly like a global database outage even though the issue is localized to one table/index.

2) Waiting for locks can be worse than holding them

Just like with ACCESS EXCLUSIVE DDL, a reindex command can become the head-of-line waiter:

  • it requests a lock it can’t immediately acquire,
  • it waits,
  • and other sessions pile up behind it due to queue ordering.

This is why setting lock_timeout before running maintenance is so effective: you’d rather fail fast than become the reason everyone queues.

Lock timeout reference: PostgreSQL runtime settings (lock_timeout)
https://www.postgresql.org/docs/current/runtime-config-client.html

3) Doing it at the wrong scope

REINDEX TABLEREINDEX SCHEMA, and REINDEX DATABASE expand the blast radius quickly:

  • you may rebuild many indexes you didn’t intend to touch,
  • the probability of hitting a hot index (and a hot lock) goes way up,
  • it becomes hard to reason about total impact.

Reference: https://www.postgresql.org/docs/current/sql-reindex.html


Choosing between REINDEX and alternatives

Depending on the goal:

  • If you’re trying to reduce bloat on a single index and you need uptime, prefer REINDEX CONCURRENTLY when it’s available and fits your constraints.
  • If you can tolerate downtime or are in a maintenance window, non-concurrent REINDEX may be acceptable and faster.
  • If you need a broader “rebuild and compact table + indexes” effect, commands like CLUSTER or VACUUM FULL also exist, but they typically have even worse locking characteristics (covered in other sections).

Operational safety checklist for index rebuilds

Before running any REINDEX on a production system:

  • Identify whether the target index is on a hot table / hot path.
  • Prefer REINDEX CONCURRENTLY for hot objects.
  • Set lock_timeout to prevent lock-queue pileups.
  • Set statement_timeout (or run from a controlled session) to bound worst-case impact.
  • Monitor pg_stat_activity and pg_locks during the operation.
  • Avoid massive-scope commands (REINDEX DATABASE) unless you truly mean it and can take the risk.

Takeaway

Index rebuilds can block workloads because indexes are involved in nearly every write and many reads. Non-concurrent REINDEX can be outage-inducing on hot objects, mostly due to strong locks and lock queue amplification. When uptime matters, REINDEX CONCURRENTLY is typically the safer tool—at the cost of longer runtime and higher overall resource usage.

18. CLUSTER and Table Rewrites: Understanding the Downtime Risk

Some PostgreSQL operations aren’t just “metadata changes.” They physically rewrite a table: they build a new copy of its data in a different order or format, then swap it in. These operations can be great for performance and space efficiency—but they are also among the most downtime-prone commands in Postgres because a rewrite typically requires strong locks for correctness.

CLUSTER is the poster child here.

References:


What CLUSTER does

CLUSTER reorganizes a table’s physical order to match an index. The classic use case is improving locality:

  • if you often query ranges on (created_at) or (customer_id), clustering by an index on that column can make relevant rows sit closer together on disk,
  • which can reduce random I/O and improve cache efficiency.

But PostgreSQL tables do not stay clustered automatically. As inserts/updates happen, the physical order drifts again.

Reference: https://www.postgresql.org/docs/current/sql-cluster.html


Why CLUSTER is risky: it rewrites the whole table

Operationally, CLUSTER is similar to “build a new table and swap it in”:

  1. Read the original table.
  2. Write a new reordered copy.
  3. Swap the new copy into place (and rebuild/adjust indexes as needed).

This implies:

  • big I/O (read + write the entire table, often plus index work),
  • significant WAL generation (affects replicas and disk),
  • buffer churn (evicts useful cache),
  • and, crucially, strong locking while the swap happens (and often for the duration, depending on specifics).

Locking behavior: why it can cause real downtime

Because CLUSTER needs a stable target to rewrite and swap safely, it requires a strong table lock (commonly ACCESS EXCLUSIVE on the table). That means:

  • plain SELECT is blocked,
  • all writes are blocked,
  • other maintenance/DDL is blocked,
  • and if it has to wait to acquire the lock, it can trigger lock-queue pileups.

Even if the rewrite itself is “fast” for a small table, the attempt to acquire the lock on a hot table can cause a production-wide jam.

References:


Table rewrites are more than just CLUSTER

The broader category “table rewrite operations” includes:

  • VACUUM FULL
  • some ALTER TABLE subcommands (especially those that change physical layout or require rewriting rows)
  • some “rebuild and swap” maintenance patterns

The common thread is: rewriting means exclusive coordination.

Reference:


The real downtime risk: time scales with table size

A rewrite operation’s lock hold time tends to scale with:

  • table size (GB/TB matters),
  • I/O throughput,
  • WAL bandwidth and checkpoint behavior,
  • CPU (sorting / index scans),
  • background load (other queries fighting for cache and I/O).

So on large tables, these are not “a few seconds” operations—they can become minutes or hours, which is why they belong in planned windows unless you have an online alternative.

I can’t predict your runtime without details, but the scaling behavior is the reason these commands are treated with caution.


Safer approaches when you want the benefits

If your goal is “better locality” or “less bloat” but you can’t take downtime, common strategies include:

  • Partitioning to reduce the amount of data any one operation touches (smaller rewrites).
  • Online rebuild tools/approaches (e.g., build-and-swap patterns; some teams use third-party tooling such as pg_repack to rewrite with less blocking, accepting operational complexity).
  • Index-only improvements (sometimes creating a better index or changing query patterns yields more than clustering).

(These topics are covered in later sections like partitioning and safe migration playbooks.)


Practical guardrails if you must run CLUSTER

If you decide CLUSTER is worth it:

  • Run it during a low-traffic/maintenance window.
  • Use lock_timeout so it fails fast instead of sitting in the lock queue and stalling others.
  • Monitor replication lag (WAL) and disk headroom.
  • Consider doing it table-by-table, starting with the smallest/highest-impact targets.

Runtime settings reference (lock_timeoutstatement_timeout):
https://www.postgresql.org/docs/current/runtime-config-client.html


Takeaway

CLUSTER is powerful because it rewrites the table into an order that can improve real workloads. The price is that it’s fundamentally a table rewrite, which typically implies strong locks and downtime risk that grows with table size. Treat CLUSTER (and other rewrite operations) as planned, high-blast-radius maintenance unless you have an online strategy.

19. Long-Running Transactions: The Silent Cause of “Everything Is Stuck”

If ACCESS EXCLUSIVE is the obvious “stop-the-world” lock, long-running transactions are the subtle villain that makes those incidents both more likely and harder to recover from.

They often don’t look scary: maybe it’s “just a SELECT,” or a session that’s “idle.” But an open transaction can quietly hold locks, keep snapshots alive, and block maintenance or DDL in ways that cascade into an app-wide stall.

References:


What “long-running transaction” really means

It’s not about one slow statement. It’s about a transaction that stays open for a long time—seconds, minutes, hours—whether it’s actively running queries or doing nothing.

Examples:

  • An application begins a transaction, runs a query, then waits for user input before committing.
  • A background worker starts a transaction and keeps it open across many operations.
  • A reporting job runs a long SELECT inside an explicit transaction and forgets to commit.
  • A session is idle in transaction (we cover that explicitly next), meaning it ran something and then stopped sending commands without ending the transaction.

A long-running transaction can exist even when the connection appears “quiet.”


How a long transaction causes “everything is stuck”

There are three big mechanisms.

1) It holds locks longer than you think

Even a plain SELECT takes a relation lock (ACCESS SHARE) while it runs. More importantly, many operations hold locks until transaction end, not just statement end—especially DDL.

So if a transaction:

  • takes row locks via UPDATE / SELECT ... FOR UPDATE, or
  • takes relation locks via DDL, or
  • does a mixture of reads and writes and stays open,

it can become the root blocker for other sessions that need conflicting locks.

Reference: transactional nature of locking and DDL is discussed in the locking docs
https://www.postgresql.org/docs/current/explicit-locking.html

2) It blocks DDL by keeping old snapshots alive

Postgres uses MVCC snapshots to decide what data is visible. A long-running transaction keeps an old snapshot around.

That matters because many schema changes and maintenance tasks need to ensure there aren’t transactions that could observe inconsistent states or depend on older metadata. So a seemingly harmless long-running transaction can:

  • delay ALTER TABLE acquiring the strong lock it needs,
  • cause DDL to wait,
  • and then trigger lock queue amplification when the DDL request becomes head-of-line.

Reference: MVCC snapshots and transaction visibility
https://www.postgresql.org/docs/current/mvcc.html

3) It prevents vacuum cleanup, leading to bloat and compounding pain

A transaction that stays open can prevent vacuum from reclaiming dead tuples, because those old row versions might still be visible to that old snapshot.

That doesn’t always cause an immediate “freeze,” but over time it causes:

  • table and index bloat,
  • slower queries,
  • higher I/O,
  • more frequent and heavier vacuuming,
  • and a system that’s easier to tip into a crisis.

Reference: vacuum behavior and MVCC interaction
https://www.postgresql.org/docs/current/sql-vacuum.html


Why long transactions create misleading incidents

In a typical outage story, the timeline looks like this:

  1. Someone runs a migration needing ACCESS EXCLUSIVE.
  2. It waits.
  3. The app starts stalling.
  4. Everyone blames the migration.

But the real root blocker is often:

  • a long-running transaction started earlier (sometimes hours),
  • possibly from a report tool, a psql session, or a misbehaving app instance,
  • holding locks or snapshots that prevent the migration from progressing.

So the migration becomes the head-of-line waiter, and the long transaction is the silent root cause.


How to spot them quickly

The fastest way during an incident is pg_stat_activity. You’re looking for:

  • old xact_start (transaction age),
  • state = 'idle in transaction',
  • long-running queries (now() - query_start),
  • sessions tied to unexpected applications/users.

pg_stat_activity reference:
https://www.postgresql.org/docs/current/monitoring-stats.html

A practical query to list the oldest open transactions:

SQLSELECT
  pid,
  usename,
  application_name,
  client_addr,
  state,
  now() - xact_start AS xact_age,
  now() - query_start AS query_age,
  wait_event_type,
  wait_event,
  left(query, 200) AS query_preview
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start ASC
LIMIT 50;

This doesn’t prove a session is the blocker, but it quickly surfaces “usual suspects.”


Operational takeaway

Long-running transactions are dangerous because they:

  • extend the lifetime of locks and snapshots,
  • make DDL lock acquisition unpredictable,
  • enable lock-queue pileups,
  • and gradually degrade the system through bloat.

If you want fewer stop-the-world incidents, reducing long-lived transactions (especially “idle in transaction”) is one of the highest-leverage changes you can make.

Next, we’ll zoom in on the worst offender in this category: idle in transaction, why it happens, and how to prevent it.

20. Idle in Transaction: How One Session Can Freeze Progress

“Idle in transaction” is one of the most common, most avoidable causes of PostgreSQL lock misery. It’s not a special mode Postgres chooses; it’s what happens when a client:

  1. starts a transaction (BEGIN happens explicitly or implicitly),
  2. runs one or more statements,
  3. then stops sending commands,
  4. and crucially: does not COMMIT or ROLLBACK.

At that point, PostgreSQL reports the session as idle in transaction.

References:


Why “idle in transaction” is uniquely dangerous

An idle session that is not in a transaction is usually harmless. An idle session in a transaction can be a production incident waiting to happen because it can do three things at once:

1) Hold locks indefinitely

Depending on what it already executed, an idle-in-transaction session may be holding:

  • row locks (e.g., it ran an UPDATE or SELECT ... FOR UPDATE)
  • relation locks (e.g., it ran DDL, or it’s mid-migration)
  • other lock types that block unrelated work

And because the transaction never ends, those locks can stick around until someone intervenes.

Reference: https://www.postgresql.org/docs/current/explicit-locking.html

2) Block DDL and trigger lock-queue pileups

Even if it’s “only” holding a weak relation lock from a query, it can still prevent a later ACCESS EXCLUSIVE request from being granted. Then that ACCESS EXCLUSIVE request becomes head-of-line, and suddenly normal traffic starts queueing behind it.

This is the classic pattern where one forgotten session causes the app to “freeze” during a migration.

3) Prevent vacuum from reclaiming dead tuples

An open transaction holds an old snapshot. While it remains open, vacuum may be unable to remove dead tuples that are still potentially visible to that snapshot. Over time that causes:

  • bloat,
  • slower queries,
  • heavier vacuum work,
  • and higher I/O.

Reference: MVCC and vacuum behavior
https://www.postgresql.org/docs/current/mvcc.html
https://www.postgresql.org/docs/current/sql-vacuum.html


How it happens in real life

Common causes include:

  • Application code paths that start a transaction and then do network I/O (calling another service, waiting for a file upload, waiting for user input).
  • Connection pool misuse, where a transaction is started and the connection is returned to the pool without committing (buggy code or exception paths).
  • Interactive sessions (psql, BI tools) where someone runs BEGIN; SELECT ...; and then goes to lunch.
  • ORM defaults that open transactions automatically, plus a slow/failed request path that doesn’t close them.

Postgres isn’t “being picky” here—your client asked it to keep the transaction open, so it does.


How to identify idle-in-transaction sessions fast

pg_stat_activity includes both the state and transaction timing fields. The telltales are:

  • state = 'idle in transaction' (or 'idle in transaction (aborted)')
  • an old xact_start
  • often a query that is the last statement executed, not necessarily the statement causing current blocking

A practical query:

SQLSELECT
  pid,
  usename,
  application_name,
  client_addr,
  now() - xact_start AS xact_age,
  now() - state_change AS idle_in_xact_for,
  left(query, 200) AS last_query
FROM pg_stat_activity
WHERE state LIKE 'idle in transaction%'
ORDER BY xact_start ASC;

Reference: session state fields in pg_stat_activity
https://www.postgresql.org/docs/current/monitoring-stats.html


How one idle transaction “freezes progress”: a typical chain

  1. Session A: BEGIN; UPDATE accounts SET ... WHERE id=...;
    (row locks are now held)
  2. Session A goes idle—transaction remains open.
  3. Session B: tries to update the same row → waits on A (localized pain).
  4. Later, Session C: migration tries ALTER TABLE accounts ... → requests ACCESS EXCLUSIVE and waits.
  5. Sessions D..N: normal traffic arrives; because a strong lock request is queued, they can start queuing too.
  6. The app looks “down.”

One session, one forgotten transaction—big blast radius.


Prevention: make “idle in transaction” impossible to ignore

You want both application discipline and database guardrails.

Application-level
  • Keep transactions short and tightly scoped.
  • Never wrap user think-time or network calls inside a DB transaction.
  • Ensure exception paths always ROLLBACK.
  • Use poolers correctly: return connections only after the transaction is closed.
Database-level guardrail: idle_in_transaction_session_timeout

PostgreSQL can automatically terminate sessions that stay idle in a transaction longer than a configured threshold:

  • Setting: idle_in_transaction_session_timeout
  • Behavior: if a session is idle in a transaction past the timeout, the backend is terminated (transaction rolled back).

This is one of the most effective “stop-the-world prevention” settings because it turns “infinite lock hostage” into “bounded damage.”

Reference:
https://www.postgresql.org/docs/current/runtime-config-client.html


Incident response: what you do when you find one

Operationally you have two escalating actions:

  • Cancel the current query (pg_cancel_backend(pid)) — often not enough if it’s idle (there may be no running query).
  • Terminate the session (pg_terminate_backend(pid)) — ends the transaction and releases locks, but will roll back work.

Use termination carefully, but in a lock-induced outage it’s frequently the fastest way to restore service once you’ve identified the true root blocker.

Reference: server signaling functions
https://www.postgresql.org/docs/current/functions-admin.html


Takeaway

Idle-in-transaction sessions are dangerous because they can hold locks and snapshots indefinitely, blocking DDL and vacuum and triggering lock-queue pileups. The fix is mostly cultural (short transactions) plus one strong safety net: set idle_in_transaction_session_timeout so a single stuck session can’t freeze your database for hours.

21. Lock Escalation Myth in PostgreSQL: What Actually Happens Instead

A lot of database folks come to PostgreSQL expecting “lock escalation”: the idea that if you lock enough rows, the database upgrades that into a table lock automatically to reduce overhead. That’s a real behavior in some systems.

PostgreSQL is different: it does not automatically escalate row locks into a table lock in the classic sense. When Postgres appears to “escalate,” what you’re usually seeing is one of a few other mechanisms: normal table locks that every statement takes, long transactions holding locks, lock queue amplification, or maintenance/DDL operations that require strong table locks.

References:


What people mean by “lock escalation”

In the typical “escalation” story:

  • many row locks are acquired,
  • the DB decides tracking so many locks is expensive,
  • it replaces them with a single table-level lock,
  • which suddenly blocks everything.

This can sound like what’s happening in Postgres when an app gets slow after large updates. But that slowdown is almost never because Postgres escalated locks.


What PostgreSQL actually does

1) PostgreSQL always takes a relation lock for a statement anyway

Even if you’re updating rows, Postgres also takes a relation-level lock on the table (commonly ROW EXCLUSIVE for INSERT/UPDATE/DELETE), because it must prevent conflicting schema changes while the statement runs.

That relation lock:

  • is normal,
  • is compatible with many other operations (including reads and other writes),
  • and is not an escalation—it’s simply part of statement execution.

Reference: lock modes and what statements take them
https://www.postgresql.org/docs/current/explicit-locking.html

2) Row-level contention can look like a table-level freeze

If lots of transactions contend on the same set of rows (hot keys), you can get:

  • long wait chains,
  • many sessions blocked,
  • timeouts and pool exhaustion.

From the application’s point of view, it looks like “the table is locked,” but the actual issue is row-level conflicts and transactions waiting on each other (often visible as transactionid waits).

This is not escalation; it’s just many sessions colliding on the same data.

3) Lock queueing can make compatible locks wait

This is the sneaky one. Suppose:

  • a long SELECT holds ACCESS SHARE,
  • then a migration requests ACCESS EXCLUSIVE and waits,
  • and now other sessions queue behind the pending exclusive request.

Suddenly reads and writes are stalled, and it feels like a giant table lock appeared “out of nowhere.”

Again, not escalation. It’s queue fairness plus a strong lock request sitting in line.

(We covered this mechanism in the lock-queuing section.)

4) DDL and table rewrites legitimately require strong locks

When you run operations like:

  • many ALTER TABLE subcommands,
  • TRUNCATE,
  • VACUUM FULL,
  • CLUSTER,
  • non-concurrent index rebuilds in some forms,

Postgres may request ACCESS EXCLUSIVE (or similarly strong modes). That’s not escalation; it’s the correct lock mode for the operation.

References:


So why does “it escalated” feel true?

Because the symptom is similar:

  • lots of sessions waiting,
  • throughput collapses,
  • and it happens suddenly.

But the cause is usually one of these:

  • a long transaction holding locks open,
  • a hot-row contention storm,
  • a queued ACCESS EXCLUSIVE request causing head-of-line blocking,
  • a rewrite/DDL command that legitimately blocks.

Once you stop looking for escalation, you start looking for the real root blocker, which is much easier to fix.


Practical takeaway for operators

When someone says “Postgres escalated the lock,” translate it into a checklist:

  1. Is there a session holding or waiting for AccessExclusiveLock on the relation? (pg_locks)
  2. Is there a long-running or idle-in-transaction session? (pg_stat_activity.xact_startstate)
  3. Are many sessions waiting on transactionid (row contention)? (pg_locks.locktype = 'transactionid' and wait events)
  4. Did a migration or maintenance job start right before the stall?

That’s where the real answers usually are.


Takeaway

PostgreSQL doesn’t do classic lock escalation from rows to tables. When you see “it feels escalated,” it’s almost always normal relation locks + queueing + long transactions, or a DDL/maintenance operation that truly needs a strong lock. Understanding that distinction helps you debug faster and choose safer operational patterns.

22. Blocking Chains: How to Identify the “Root Blocker”

When PostgreSQL feels stuck, the most important diagnostic skill is finding the root blocker: the session that, if it ended (commit/rollback/terminate), would cause the whole backlog to drain.

In real incidents you’ll usually see a blocking chain (or a blocking tree), not a single pair of blocker/victim:

  • Session C is waiting on B
  • Session B is waiting on A
  • Session A is running (or idle in transaction) holding the lock everyone needs

Your job is to identify session A.

References:


Key concepts: blocker, victim, and root blocker

  • Victim (waiter): a session that cannot proceed because it’s waiting for a lock.
  • Blocker: a session currently holding a conflicting lock the victim needs.
  • Root blocker: the blocker that is not itself waiting on anyone else (or is the earliest cause in the chain). Removing the root blocker breaks the whole chain.

A common trap: the “obvious bad query” is a head-of-line waiter (like a migration waiting for ACCESS EXCLUSIVE) that is causing queue amplification. That query might be causing the symptom, but the root cause is often a different session holding a lock or holding an open transaction.


Fastest path in practice: use pg_blocking_pids()

PostgreSQL provides a built-in helper that tells you which PIDs are blocking a given PID: pg_blocking_pids(pid).

Start from the victims (sessions waiting) and walk upstream until you find sessions that are not blocked by anyone.

Find currently waiting sessions and their blockers
SQLSELECT
  a.pid                                AS waiting_pid,
  a.usename                            AS waiting_user,
  a.application_name                   AS waiting_app,
  now() - a.query_start                AS waiting_for,
  a.wait_event_type,
  a.wait_event,
  left(a.query, 200)                   AS waiting_query,
  pg_blocking_pids(a.pid)              AS blocking_pids
FROM pg_stat_activity a
WHERE a.wait_event_type = 'Lock'
ORDER BY waiting_for DESC;

Interpretation:

  • blocking_pids is an array; often it has 1 PID, sometimes more.
  • If it’s empty but the session is waiting, you may be dealing with a wait that’s not a heavyweight lock (or the snapshot is changing quickly). Still, for classic lock incidents, this is very effective.

Reference: pg_blocking_pids()
https://www.postgresql.org/docs/current/functions-info.html


Build the blocking tree to find the root blocker(s)

This recursive query expands blockers until it reaches the top of each chain, producing a readable “who blocks whom” tree.

SQLWITH RECURSIVE chain AS (
  -- Start with sessions that are waiting on a lock
  SELECT
    a.pid,
    a.usename,
    a.application_name,
    a.state,
    a.wait_event_type,
    a.wait_event,
    a.query,
    a.query_start,
    a.xact_start,
    pg_blocking_pids(a.pid) AS blockers,
    0 AS depth,
    ARRAY[a.pid] AS path
  FROM pg_stat_activity a
  WHERE a.wait_event_type = 'Lock'
  UNION ALL
  -- Walk to the blockers
  SELECT
    b.pid,
    b.usename,
    b.application_name,
    b.state,
    b.wait_event_type,
    b.wait_event,
    b.query,
    b.query_start,
    b.xact_start,
    pg_blocking_pids(b.pid) AS blockers,
    c.depth + 1 AS depth,
    c.path || b.pid
  FROM chain c
  JOIN LATERAL unnest(c.blockers) AS bp(pid) ON true
  JOIN pg_stat_activity b ON b.pid = bp.pid
  WHERE NOT b.pid = ANY (c.path)  -- prevent loops
)
SELECT
  depth,
  pid,
  usename,
  application_name,
  state,
  now() - COALESCE(xact_start, query_start) AS age,
  wait_event_type,
  wait_event,
  left(query, 200) AS query_preview,
  blockers
FROM chain
ORDER BY depth DESC, age DESC;

How to use it:

  • The highest depth entries are closer to the root blockers.
  • Root blockers often have wait_event_type NULL (they’re running) or are idle in transaction.
  • Look for very old xact_start and suspicious application_name (BI tool, migration runner, ad-hoc psql).

Validate what object they’re fighting over

Once you have candidate blocker PIDs, confirm what they’re locking.

A practical relation-lock focused view:

SQLSELECT
  a.pid,
  a.usename,
  a.application_name,
  a.state,
  now() - a.xact_start AS xact_age,
  l.locktype,
  l.mode,
  l.granted,
  l.relation::regclass AS relation,
  left(a.query, 200) AS query_preview
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE l.locktype = 'relation'
  AND (l.granted = true OR l.granted = false)
ORDER BY relation, l.granted DESC, l.mode;

This helps you spot the classic stop-the-world signature:

  • one session holding a weak lock for a long time,
  • another waiting for AccessExclusiveLock,
  • and many victims behind it.

Reference: pg_locks
https://www.postgresql.org/docs/current/view-pg-locks.html


Common root blockers (what they look like)

In real production incidents, root blockers often fall into a few buckets:

  1. Idle in transaction
    • state = 'idle in transaction'
    • old xact_start
    • query text shows the last statement, not the one causing current pain
  2. Long-running read
    • SELECT running for minutes/hours
    • often from analytics/BI/reporting
  3. Migration/DDL transaction left open
    • DDL completed quickly but the transaction wasn’t committed
    • AccessExclusiveLock might be held longer than expected
  4. Advisory lock “singleton” gone wrong
    • app took a global advisory lock and got stuck

What to do once you find the root blocker

From least to most disruptive:

  • Ask the app/job owner to end it cleanly (commit/rollback/stop the job) if you can coordinate quickly.
  • Cancel the querySELECT pg_cancel_backend(<pid>);
    Useful when it’s actively running.
  • Terminate the sessionSELECT pg_terminate_backend(<pid>);
    Ends the transaction and releases locks (rolls back). Often the fastest way to restore service.

Admin function reference:
https://www.postgresql.org/docs/current/functions-admin.html


Takeaway

Stop-the-world lock incidents are usually blocking chains. The winning move is to stop guessing and systematically:

  1. list the waiters,
  2. follow pg_blocking_pids() upstream,
  3. find the root blocker(s),
  4. confirm the locked object in pg_locks,
  5. then choose the least risky intervention (cancel → terminate).

Next, we’ll distinguish blocking chains from deadlocks, which look similar at first but behave differently and require different responses.

23. Deadlocks vs “Stop-the-World” Blocking: How to Tell the Difference

Both deadlocks and stop-the-world-style blocking can look like “everything is waiting,” but they’re fundamentally different problems:

  • Stop-the-world blocking is usually a queue: one or a few sessions are blocking many others, and progress resumes when the blocker releases a lock.
  • Deadlock is a cycle: sessions are waiting on each other in a loop, so none of them can ever make progress unless the database breaks the cycle.

PostgreSQL can automatically detect deadlocks and resolve them by aborting one participant. It does not automatically “fix” stop-the-world blocking, because that’s often a legitimate long wait rather than a logical impossibility.

References:


What a deadlock is (in practical terms)

A deadlock happens when there’s a cycle like:

  • Session A holds lock X and waits for lock Y
  • Session B holds lock Y and waits for lock X

Neither can proceed, ever.

Postgres detects this (after deadlock_timeout) and raises an error:

  • ERROR: deadlock detected

One transaction gets canceled/aborted, releasing its locks so the others can proceed.

Reference: deadlock discussion and parameters
https://www.postgresql.org/docs/current/explicit-locking.html


What “stop-the-world” blocking is

Stop-the-world blocking is usually:

  • a long-running transaction holding a lock (or holding a snapshot),
  • plus a strong lock request queued behind it,
  • plus many sessions queued behind the strong lock request.

There’s no cycle. There’s just a chokepoint and a queue. If the root blocker ends, the queue drains.

This can last indefinitely if the blocker never finishes (for example, “idle in transaction”), but it’s not logically impossible the way a deadlock is.


How to tell them apart in the moment

1) Does it eventually resolve by itself?
  • Deadlock: Postgres will resolve it by aborting one participant, usually within seconds (depending on deadlock_timeout).
  • Stop-the-world blocking: It can continue indefinitely until the blocker commits/rolls back or is terminated.

Caveat: both can cause timeouts at the application layer before the DB resolves anything, so don’t rely on app errors alone.


2) Look for the deadlock error and log entry

Deadlocks generate a specific SQLSTATE and a detailed log message (if logging captures it). You’ll see:

  • deadlock detected
  • a description of the processes and the locks involved

Reference: deadlock logging and error code


3) Shape of the wait graph: cycle vs tree

If you graph “who waits on whom”:

  • Deadlock: cycle (A → B → A, or longer cycles).
  • Blocking incident: tree/chain with one or few roots (many → one).

In Postgres terms, if you use pg_blocking_pids() and recursively expand blockers (as in the previous section):

  • cycles are a deadlock signature,
  • a single root blocker with lots of descendants is stop-the-world blocking.

Reference: pg_blocking_pids()
https://www.postgresql.org/docs/current/functions-info.html


4) Wait events and lock types can hint, but aren’t definitive

Both scenarios often show wait_event_type = 'Lock'.

The distinction is not “Lock wait vs not”—it’s the relationship pattern:

  • Deadlock: multiple sessions each waiting and holding locks needed by the others.
  • Blocking: most sessions are waiting, but one session is making progress (or is idle holding locks).

You can often spot stop-the-world blocking because the root blocker is:

  • running a long query, or
  • idle in transaction, or
  • holding AccessExclusiveLock or waiting for it and causing queueing.

Why deadlocks can look less “world-stopping” than blocking

Deadlocks usually involve a small number of sessions (2–5 is common) and Postgres resolves them. So they often show up as:

  • sporadic transaction failures,
  • retries succeeding,
  • a few error spikes.

Stop-the-world blocking is more likely to involve dozens/hundreds of sessions and cause widespread timeouts, because it jams a critical table or saturates the pool.

That said, a deadlock inside a hot path can still cause lots of user-visible errors even if it’s quickly resolved.


Response differences: what you do about each

If it’s a deadlock
  • Expect Postgres to abort one transaction.
  • Fix is usually application-level ordering: take locks in a consistent order, keep transactions short, avoid “lock A then lock B” in one code path and “lock B then lock A” in another.
  • Retries are often appropriate for deadlock errors.

Reference: deadlock prevention guidance
https://www.postgresql.org/docs/current/explicit-locking.html

If it’s stop-the-world blocking
  • Find the root blocker and end it (commit/rollback/terminate) or cancel the head-of-line strong lock request to let compatible traffic proceed.
  • Prevent recurrence with: timeouts (lock_timeoutstatement_timeoutidle_in_transaction_session_timeout), safer migration patterns, and reducing long transactions.

Takeaway

Deadlocks are cycles that Postgres can detect and break by aborting one participant; they show up as deadlock detected errors. Stop-the-world incidents are queues caused by strong lock requests, long transactions, and lock-queue amplification; they can persist indefinitely until you remove the root blocker or the head-of-line waiter.

24. Symptoms Checklist: What You’ll See When the System “Freezes”

A PostgreSQL “freeze” is usually not a crash. It’s the system doing exactly what it was designed to do: protect correctness by making sessions wait. The hard part is that lock-driven waiting often looks like a total outage from the application side.

This checklist is what you typically observe when you’re in a stop-the-world-style lock incident, plus what each symptom usually implies.

References:


Application-level symptoms

□ Sudden spike in request timeouts across many endpoints
  • Often indicates pool saturation: threads are waiting on DB connections that are themselves waiting on locks.
  • Especially common if most requests touch the same hot table.
□ Retry storms / cascading failures
  • Clients retry timeouts, increasing concurrency, which increases lock queue depth and makes the stall worse.
  • Watch for a sharp increase in request volume without a corresponding increase in completed requests.
□ Connection pool exhaustion
  • Pool at max, many app threads blocked waiting for a free connection.
  • From Postgres, you’ll see lots of backends “active” but not progressing.

Database-level “it looks weirdly calm” symptoms

□ CPU is not pegged (sometimes it’s surprisingly low)
  • Lock waits don’t burn CPU.
  • This is a classic clue that the bottleneck is waiting, not compute.
□ I/O is not obviously saturated
  • In pure lock incidents, disk may look normal.
  • (Contrast: table rewrites, vacuum full, cluster, massive sort operations—those often do show I/O pressure.)

What you’ll see in pg_stat_activity

□ Many sessions with wait_event_type = 'Lock'

This is the most direct “lock incident” indicator.

  • state may be active (a query is running but currently waiting)
  • wait_event may indicate the specific lock class being waited on (implementation detail; don’t overinterpret the exact string)

Reference: wait event fields
https://www.postgresql.org/docs/current/monitoring-stats.html

□ A small number of sessions running for a long time

Look for:

  • very old query_start (long-running statement),
  • very old xact_start (long-running transaction),
  • state = 'idle in transaction' (particularly dangerous).
□ A migration/DDL statement sitting “active” but not progressing

Often an ALTER TABLETRUNCATEDROPREINDEXVACUUM FULLCLUSTER, etc. It might be:

  • the head-of-line waiter (waiting for ACCESS EXCLUSIVE), or
  • the holder (already has the exclusive lock).

What you’ll see in pg_locks

□ A pile of waiting locks (granted = false)

If the freeze is lock-driven, pg_locks will show many entries where granted = false.

Reference: pg_locks columns and meaning
https://www.postgresql.org/docs/current/view-pg-locks.html

□ Many waits on the same relation

A telltale sign is many waiting locks tied to one table:

  • locktype = 'relation'
  • relation::regclass is the same for most waiters

That’s your chokepoint table.

□ One session holding or waiting for AccessExclusiveLock

Common patterns:

  • one backend holds AccessExclusiveLock on a hot table (hard freeze), or
  • one backend is waiting for AccessExclusiveLock and many others are queued behind it (queue amplification).

Reference: lock modes and conflicts
https://www.postgresql.org/docs/current/explicit-locking.html

□ Lots of transactionid waits

If you see many waits with:

  • locktype = 'transactionid'

…that’s often a sign of heavy row-level contention or many sessions waiting for one long transaction to commit/rollback.

This tends to be more “hot-row storm” than “DDL stop-the-world,” but in high-traffic systems it can still feel like an outage.


“Smell tests” that narrow the cause quickly

□ SELECTs are timing out too

If plain reads are timing out on a particular table, suspect:

  • a strong relation lock (ACCESS EXCLUSIVE or similar), or
  • lock queueing behind a pending exclusive request.
□ Only certain writes/time-critical paths are failing

If only specific entity updates fail (e.g., one customer or one order ID), suspect:

  • row-level contention,
  • long transactions holding row locks,
  • or a deadlock pattern (which usually resolves with errors).
□ The database is reachable, but everything is slow

If you can connect and run SELECT 1; quickly, but application queries hang, suspect:

  • relation lock on a hot object,
  • lock queueing,
  • or severe internal contention (LWLocks / I/O) rather than a full outage.

A quick “freeze triage” checklist

If you want a tight checklist to run mentally:

  1. □ Are many sessions waiting with wait_event_type = 'Lock'? (pg_stat_activity)
  2. □ Is there a head-of-line ACCESS EXCLUSIVE request? (pg_locks on relation)
  3. □ Is there an old open transaction / idle in transaction session? (pg_stat_activity.xact_startstate)
  4. □ Is the pain localized to one relation? (relation::regclass)
  5. □ Is it actually a deadlock (errors) rather than blocking (waiting)? (logs + deadlock errors)

Takeaway

A lock-driven “freeze” is characterized by lots of lock waitsqueueing, and often one hot relation with either a held or pending strong lock. The fastest path out is usually identifying the root blocker (often a long transaction or idle-in-transaction session) and understanding whether a queued ACCESS EXCLUSIVE request is amplifying the stall.

25. Measuring Impact: Latency, Throughput, and Connection Pileups

Lock incidents are sneaky because “the database is waiting” doesn’t always look like “the database is busy.” To manage an incident (and to justify preventive work later), you need a way to quantify impact in three dimensions:

  • Latency: how long requests take
  • Throughput: how many requests complete per second
  • Connection pileups: how many sessions are stuck waiting, and where

References:


Latency: “how long are we waiting?”

In a lock freeze, the single most useful latency metric is time spent waiting for locks, because that translates directly into user-facing response time.

What to look at:

  • Application request latency (p50/p95/p99)
    • p99 usually explodes first in lock incidents.
  • Database query latency
    • slow query logs help if they include lock waits (often the query isn’t “slow,” it’s “blocked”).

Inside Postgres, use pg_stat_activity to spot who has been waiting the longest:

SQLSELECT
  pid,
  usename,
  application_name,
  now() - query_start AS query_age,
  wait_event_type,
  wait_event,
  left(query, 200) AS query_preview
FROM pg_stat_activity
WHERE wait_event_type IS NOT NULL
ORDER BY query_age DESC;

Interpretation:

  • Large query_age + wait_event_type = 'Lock' is “pure lock wait time.”
  • If query_age is large but wait events aren’t lock-related, you may have I/O or internal contention instead.

Reference: pg_stat_activity fields
https://www.postgresql.org/docs/current/monitoring-stats.html


Throughput: “are we still completing work?”

A system can have thousands of “active” sessions and still complete almost nothing if most are waiting.

What to look at:

  • Completed requests per second at the app/load balancer
  • Transactions committed/rolled back in Postgres

From Postgres, a quick sense of throughput comes from pg_stat_database:

SQLSELECT
  datname,
  xact_commit,
  xact_rollback,
  blks_read,
  blks_hit,
  tup_returned,
  tup_fetched,
  tup_inserted,
  tup_updated,
  tup_deleted
FROM pg_stat_database
WHERE datname = current_database();

This is cumulative. For incident measurement you typically sample it twice (e.g., 10–30 seconds apart) and compute deltas to get rates. The key signal in a lock freeze is:

  • request volume remains high,
  • but commit rate drops sharply (or stalls),
  • rollbacks/timeouts may rise.

Reference: pg_stat_database
https://www.postgresql.org/docs/current/monitoring-stats.html


Connection pileups: “where are we stuck?”

Lock incidents become outages because waiting sessions accumulate faster than they drain. That’s the connection pileup.

What pileups look like
  • Many sessions in pg_stat_activity with wait_event_type = 'Lock'
  • Connection pool is maxed out
  • New requests can’t even get a DB connection
  • Sometimes max_connections pressure shows up too (more on that if you’re running without a pooler)
Measure pileup size and shape

Count sessions by state and wait type:

SQLSELECT
  state,
  wait_event_type,
  count(*) AS sessions
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state, wait_event_type
ORDER BY sessions DESC;

This tells you whether you have:

  • a lock wait storm (wait_event_type = 'Lock'),
  • a CPU/I/O issue (other wait event types),
  • or simply too many idle connections.

Reference: monitoring stats and wait events
https://www.postgresql.org/docs/current/monitoring-stats.html

Identify the hot relation causing the pileup

If it’s a relation-lock incident, group by relation:

SQLSELECT
  l.relation::regclass AS relation,
  l.mode,
  l.granted,
  count(*) AS cnt
FROM pg_locks l
WHERE l.locktype = 'relation'
GROUP BY 1, 2, 3
ORDER BY cnt DESC, relation;

Typical stop-the-world signature:

  • many granted = false rows on a single relation,
  • often with someone waiting for (or holding) AccessExclusiveLock.

Reference: pg_locks
https://www.postgresql.org/docs/current/view-pg-locks.html


Translating DB symptoms into user impact

A useful mental mapping:

  • Latency spikes happen immediately when sessions start waiting.
  • Throughput collapse happens when the backlog saturates the pool and commits slow down.
  • Pileup growth happens when arrival rate > completion rate (classic queueing).

So during the incident, track:

  • p95/p99 request latency,
  • commit rate (or completed request rate),
  • number of waiting sessions (and on what).

That combination tells you if the situation is:

  • stabilizing (pileup shrinking),
  • steady-state (pileup flat, still bad),
  • or runaway (pileup growing, timeouts rising).

Takeaway

Impact measurement in lock incidents is about confirming the “waiting outage” pattern:

  • Latency: wait times dominate query duration (pg_stat_activity.query_age + lock waits).
  • Throughput: commits per second drop (sample pg_stat_database deltas).
  • Pileups: waiting sessions explode and concentrate on a relation (pg_locks grouped by relation).

Once you can quantify those three, the next step is observability: knowing exactly which views, joins, and queries to use to find the root blocker quickly.

26. Observability Primer: The Must-Know Catalogs and Views

When a lock incident hits, you don’t want to “browse around.” You want a small set of Postgres views you trust, and you want to know what question each one answers.

This section is a cheat sheet for the must-know catalogs and views for lock and “stop-the-world” diagnosis.

References:


pg_stat_activity: what each session is doing (and waiting on)

This is your primary incident dashboard.

Use it to answer:

  • Who is currently running a query?
  • Who is waiting, and on what (wait_event_typewait_event)?
  • Which sessions have long-lived transactions (xact_start)?
  • Who is idle in transaction?
  • What is the query text and which app/user is it from?

Key columns you’ll use constantly:

  • pidusenameapplication_nameclient_addr
  • statestate_change
  • queryquery_start
  • xact_start
  • wait_event_typewait_event

Reference: https://www.postgresql.org/docs/current/monitoring-stats.html


pg_locks: what locks are held and what locks are waiting

This is the ground truth for heavyweight locks.

Use it to answer:

  • Which locks are granted vs waiting (granted)?
  • Which lock mode is involved (mode)?
  • What object is locked (locktyperelationtransactionid, etc.)?
  • Is there a pending AccessExclusiveLock on a hot table?

Pro tip:

  • For relation locks, resolve names with relation::regclass.

Reference: https://www.postgresql.org/docs/current/view-pg-locks.html


pg_blocking_pids(pid): who is blocking this session right now

This is the fastest way to move from “lots of waiters” to “who do I talk to / kill?”

Use it to answer:

  • For this waiting PID, which PIDs are directly blocking it?

It’s especially useful combined with pg_stat_activity and a recursive query to find root blockers.

Reference: https://www.postgresql.org/docs/current/functions-info.html


pg_stat_database: high-level health and throughput counters

This view is about “is the system making progress?”

Use it to answer:

  • Are commits happening?
  • Are rollbacks spiking?
  • Are we reading from disk vs cache?
  • Is activity concentrated in one database?

It’s cumulative, so you typically sample it twice to get rates.

Reference: https://www.postgresql.org/docs/current/monitoring-stats.html


Progress reporting views: “is the big operation still moving?”

When the problem involves long maintenance or DDL-like operations, progress views help distinguish “blocked” from “slow but progressing.”

Common ones:

  • pg_stat_progress_vacuum (vacuum progress)
  • pg_stat_progress_create_index (index build progress)
  • pg_stat_progress_cluster (cluster progress)
  • pg_stat_progress_copy (COPY progress)

Use them to answer:

  • Is the operation advancing or stuck waiting?
  • Which phase is it in?
  • How much work remains?

Reference: https://www.postgresql.org/docs/current/progress-reporting.html


Catalogs for resolving OIDs to names

Lock rows often contain OIDs. You need to translate those quickly.

Most common conversions:

  • relation::regclass (fastest for relation OIDs)
  • pg_class (tables, indexes, relkinds)
  • pg_namespace (schemas)
  • pg_database (databases)
  • pg_roles / pg_authid (roles; pg_roles is safer for most users)

References:


pg_stat_statements: who is generating pain over time

This isn’t a “find the blocker right now” view, but it’s crucial for prevention and postmortems.

Use it to answer:

  • Which query patterns are slow or frequent?
  • Which statements contribute most total time?
  • Are there recurrent queries that often get stuck behind locks?

It requires enabling the extension and configuration, so it may not be available in every environment.

Reference: pg_stat_statements docs
https://www.postgresql.org/docs/current/pgstatstatements.html


Wait events: the “what are we waiting for” taxonomy

Wait events are not a lock list. They’re a classification of why a backend isn’t running on CPU right now.

Use wait events to distinguish:

  • lock waits (wait_event_type = 'Lock') vs
  • internal contention (LWLock / LWLockTranche) vs
  • I/O waits vs
  • client/network waits.

This matters because the fix path is different.

Reference: wait event reporting in monitoring stats
https://www.postgresql.org/docs/current/monitoring-stats.html


A minimal “incident toolkit” query set

If you only memorize three things, make them these:

  1. Who’s waiting and on what? (pg_stat_activity)
  2. What lock is involved and on which relation? (pg_locks with relation::regclass)
  3. Who is blocking whom? (pg_blocking_pids())

Everything else is supporting detail.


Takeaway

PostgreSQL gives you enough observability to solve most lock incidents quickly, but only if you know where to look. pg_stat_activity tells you who is waiting, pg_locks tells you what they’re waiting on, and pg_blocking_pids() tells you who’s responsible. Add progress views for long operations and catalog lookups for OID resolution, and you have a complete “stop-the-world” debugging toolkit.

27. Using pg_stat_activity to Spot Blockers and Victims

When a PostgreSQL system “freezes,” pg_stat_activity is the fastest place to separate:

  • victims: sessions waiting and piling up
  • suspects: sessions running long or holding transactions open
  • likely blockers: sessions that victims say are blocking them (via pg_blocking_pids())

pg_stat_activity won’t tell you everything about locks by itself (that’s pg_locks), but it gives you the operational context you need: user, app, client address, query text, and timing.

References:


The columns that matter most in an incident

In pg_stat_activity, these columns are your “blockers vs victims” toolkit:

  • pid: the session identifier you’ll use everywhere else
  • stateactiveidleidle in transaction, etc.
  • queryquery_start: what it’s doing (or last did) and how long it’s been running
  • xact_start: how long the transaction has been open (critical)
  • wait_event_typewait_event: what it’s waiting for right now
  • usenameapplication_nameclient_addr: who/what to blame (or contact)

Reference: https://www.postgresql.org/docs/current/monitoring-stats.html


Step 1: Find the victims (the waiters)

Victims are sessions that are currently waiting. In lock incidents, the most common signal is:

  • wait_event_type = 'Lock'

Query:

SQLSELECT
  pid,
  usename,
  application_name,
  client_addr,
  state,
  now() - query_start AS query_age,
  now() - xact_start  AS xact_age,
  wait_event_type,
  wait_event,
  left(query, 200)    AS query_preview
FROM pg_stat_activity
WHERE datname = current_database()
  AND wait_event_type IS NOT NULL
ORDER BY query_age DESC;

How to interpret:

  • Lots of rows with wait_event_type = 'Lock' → lock storm.
  • If wait_event_type is mostly LWLock* or I/O-related, you may have internal contention rather than classic blocking.

Step 2: Attach “who is blocking me?” to each victim

pg_stat_activity doesn’t have a direct “blocked by” column, but Postgres provides pg_blocking_pids(pid).

SQLSELECT
  a.pid AS waiting_pid,
  a.usename,
  a.application_name,
  now() - a.query_start AS waiting_for,
  left(a.query, 200) AS waiting_query,
  pg_blocking_pids(a.pid) AS blocking_pids
FROM pg_stat_activity a
WHERE a.datname = current_database()
  AND a.wait_event_type = 'Lock'
ORDER BY waiting_for DESC;

Interpretation:

  • If most victims list the same blocker PID(s), you’re close to the root cause.
  • If victims show different blockers, you may have multiple hot spots or a more complex chain.

Reference: https://www.postgresql.org/docs/current/functions-info.html


Step 3: Find the “classic” blockers in pg_stat_activity

Even without pg_lockspg_stat_activity often makes the blocker obvious because blockers are usually:

A) Long-running transactions

Look for old xact_start:

SQLSELECT
  pid,
  usename,
  application_name,
  client_addr,
  state,
  now() - xact_start AS xact_age,
  now() - query_start AS query_age,
  wait_event_type,
  wait_event,
  left(query, 200) AS query_preview
FROM pg_stat_activity
WHERE datname = current_database()
  AND xact_start IS NOT NULL
ORDER BY xact_start ASC
LIMIT 50;

Red flags:

  • very large xact_age (minutes/hours)
  • state = 'idle in transaction' (often the real villain)
B) Idle in transaction

This is the “held locks forever” pattern:

SQLSELECT
  pid,
  usename,
  application_name,
  client_addr,
  now() - xact_start AS xact_age,
  now() - state_change AS idle_in_xact_for,
  left(query, 200) AS last_query
FROM pg_stat_activity
WHERE datname = current_database()
  AND state LIKE 'idle in transaction%'
ORDER BY xact_start ASC;

If you find one of these during a lock storm, treat it as a prime suspect.

C) The head-of-line DDL waiter

Migrations often show up as a single DDL statement that’s been “running” for a long time, but it’s actually waiting.

Look for:

  • state = 'active'
  • wait_event_type = 'Lock'
  • query text includes ALTER TABLETRUNCATEDROPREINDEXVACUUM FULLCLUSTER, etc.

This session may be the symptom amplifier rather than the root blocker.


Step 4: Label sessions as blocker vs victim (quick triage rule)

A useful incident triage heuristic:

  • Victim: wait_event_type = 'Lock' and pg_blocking_pids(pid) is not empty
  • Likely blocker: appears inside other sessions’ blocking_pids, often has:
    • no lock wait event (it’s running or idle),
    • old xact_start,
    • or state = 'idle in transaction'

You can pull the blocker details directly:

SQLWITH waiters AS (
  SELECT
    pid AS waiting_pid,
    unnest(pg_blocking_pids(pid)) AS blocking_pid
  FROM pg_stat_activity
  WHERE datname = current_database()
    AND wait_event_type = 'Lock'
)
SELECT
  b.pid,
  b.usename,
  b.application_name,
  b.client_addr,
  b.state,
  now() - b.xact_start AS xact_age,
  now() - b.query_start AS query_age,
  left(b.query, 200) AS query_preview,
  count(*) AS blocked_sessions
FROM waiters w
JOIN pg_stat_activity b ON b.pid = w.blocking_pid
GROUP BY 1,2,3,4,5,6,7,8
ORDER BY blocked_sessions DESC, xact_age DESC;

This gives you a prioritized “top blockers” list.


What pg_stat_activity can’t tell you (and when to switch to pg_locks)

pg_stat_activity will not reliably tell you:

  • which relation (table) is the hotspot,
  • what lock mode is involved (AccessExclusiveLock vs others),
  • or the full lock compatibility picture.

Once you have candidate blocker PIDs, the next step is usually to join to pg_locks to confirm:

  • the locked object,
  • the lock mode,
  • granted vs waiting.

(That’s the next section.)


Takeaway

Use pg_stat_activity to quickly:

  • identify victims (wait_event_type = 'Lock'),
  • attach blockers (pg_blocking_pids()),
  • find the usual root causes (old xact_startidle in transaction, long-running reads),
  • and produce a “top blockers” list you can act on.

Then use pg_locks to confirm exactly what’s being locked and why.

28. Reading pg_locks Without Losing Your Mind

pg_locks is the most accurate view of heavyweight locking in PostgreSQL—and also one of the easiest to misread. It’s low-level, full of OIDs, and a single incident can produce thousands of rows.

The trick is to stop thinking “I will understand every lock” and instead use pg_locks to answer three questions:

  1. What is waiting? (granted = false)
  2. What object is the bottleneck? (usually a relation)
  3. Is there a strong lock involved? (often AccessExclusiveLock)

References:


The columns that matter (ignore the rest at first)

For lock incidents, start with:

  • pid: who holds/wants the lock
  • locktype: what kind of thing is being locked
  • mode: lock mode (internal names like AccessExclusiveLock)
  • granted: true = held, false = waiting
  • object identifiers:
    • relation (for relation locks)
    • transactionid / virtualxid (for transaction waits)
    • classid/objid/objsubid (misc object locks)

Reference: https://www.postgresql.org/docs/current/view-pg-locks.html


Step 1: Filter to the waits (the incident surface area)

Most of the noise in pg_locks is granted locks that are normal and harmless. During a “freeze,” the signal is in the waits:

SQLSELECT
  pid,
  locktype,
  mode,
  relation::regclass AS relation,
  transactionid,
  virtualxid
FROM pg_locks
WHERE NOT granted
ORDER BY locktype, relation, mode;

If this returns many rows, you have real contention. If it returns almost none, your problem is probably not heavyweight locks (maybe LWLocks, I/O, or client-side issues).


Step 2: Identify the hotspot relation (most common stop-the-world pattern)

In stop-the-world DDL incidents, most waits will be locktype = 'relation', concentrated on one or a few relations.

Group them:

SQLSELECT
  relation::regclass AS relation,
  mode,
  count(*) AS waiting
FROM pg_locks
WHERE locktype = 'relation'
  AND NOT granted
GROUP BY 1, 2
ORDER BY waiting DESC, relation;

What you’re looking for:

  • a single table with a large number of waiters, and/or
  • a waiting AccessExclusiveLock request on a hot table.

Step 3: Learn the 5 locktypes you’ll actually use

pg_locks.locktype can be many things, but in practice, most incident work involves these:

relation

Locks on tables/indexes/etc. The most actionable for “stop-the-world.”

  • Resolve with relation::regclass.
transactionid

Usually means: “I need to know whether transaction X commits/rolls back before I can proceed.”
Common with row-level conflicts and hot-row storms.

virtualxid

Often shows up when operations need to coordinate with active transactions/snapshots. Can be involved in DDL waiting patterns.

advisory

App-controlled locks. Great when intentional; nasty when a session holds one forever.

object

Misc catalog objects. Important sometimes, but start with relation locks unless you have evidence otherwise.

Reference: pg_locks lock types
https://www.postgresql.org/docs/current/view-pg-locks.html


Step 4: Interpret lock mode names in pg_locks

In SQL you say ACCESS EXCLUSIVE. In pg_locks.mode you’ll see:

  • AccessShareLock
  • RowExclusiveLock
  • ShareUpdateExclusiveLock
  • AccessExclusiveLock
  • etc.

Operationally:

  • If you see AccessExclusiveLock involved on a hot table, treat it as a high-likelihood stop-the-world trigger.
  • If you see lots of RowExclusiveLock waits, you may be looking at DDL conflict or queue amplification (writes usually don’t block each other at the table-lock level).

Reference: lock modes
https://www.postgresql.org/docs/current/explicit-locking.html


Step 5: Always join to pg_stat_activity for “who/what/why”

A lock row without the query text is half the story. Join on pid:

SQLSELECT
  a.pid,
  a.usename,
  a.application_name,
  a.client_addr,
  a.state,
  a.wait_event_type,
  a.wait_event,
  now() - a.xact_start AS xact_age,
  now() - a.query_start AS query_age,
  l.locktype,
  l.mode,
  l.granted,
  l.relation::regclass AS relation,
  left(a.query, 200) AS query_preview
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE a.datname = current_database()
ORDER BY l.granted ASC, query_age DESC NULLS LAST;

Now you can see:

  • which app/user holds the lock,
  • whether it’s idle in transaction,
  • and how long it’s been open.

Reference: pg_stat_activity
https://www.postgresql.org/docs/current/monitoring-stats.html


Common “gotchas” that make people misdiagnose pg_locks

Gotcha 1: “There are tons of locks, so that must be the problem”

No. A healthy system has lots of granted locks. The problem is waiters.

Gotcha 2: “The DDL statement is the blocker”

Often the DDL is waiting (not granted). The root blocker is commonly a long transaction holding a weaker lock.

Use:

  • granted to distinguish holder vs waiter
  • pg_blocking_pids() to find the real blocker

Reference: https://www.postgresql.org/docs/current/functions-info.html

Gotcha 3: “I don’t see row locks, so it can’t be row contention”

Row-level waits often show up as transactionid waits rather than neat tuple identifiers. You may see many transactionid waiters tied back to one long-running transaction.

Gotcha 4: “Everything is waiting on one table, but I don’t see AccessExclusiveLock”

It might be:

  • another strong mode (ExclusiveLockShareRowExclusiveLock),
  • an object-level lock,
  • or queueing behind a pending strong request.

A simple workflow that keeps you sane

When you’re under pressure, do this in order:

  1. Count waiters: are there many NOT granted locks?
  2. Find the hotspot relation: group waiting relation locks by relation::regclass.
  3. Find the strong request: look for AccessExclusiveLock waiting/held on that relation.
  4. Identify the blocker PID(s): join to pg_stat_activity + use pg_blocking_pids().
  5. Decide action: cancel the head-of-line waiter, terminate the root blocker, or fix the app behavior.

Takeaway

Reading pg_locks is manageable if you focus on waits, not totals; relations, not every locktype; and lock modes that matter operationally (AccessExclusiveLock especially). Combine it with pg_stat_activity for context, and you can move from “Postgres is frozen” to “this PID on this table is causing it” quickly.

29. Joining pg_locks and pg_stat_activity: A Practical Blocking Query

When you’re in the middle of a lock incident, you want one query that answers, in one screen:

  • who is waiting,
  • what they’re waiting on (which relation + which lock mode),
  • who is blocking them,
  • and what the blocker is doing (query text, transaction age, idle-in-transaction status).

There isn’t a single “perfect” query for every locktype, but the one below is a very practical starting point for the most common stop-the-world scenario: relation lock contention (hot table + strong lock request + queue pileup).

References:


Practical “show me blockers and victims” query (relation locks)

This version uses pg_blocking_pids() to avoid re-implementing lock compatibility rules, then enriches with relation lock details from pg_locks.

SQLWITH waiting AS (
  SELECT
    a.pid                            AS waiting_pid,
    a.usename                        AS waiting_user,
    a.application_name               AS waiting_app,
    a.client_addr                    AS waiting_client,
    a.state                          AS waiting_state,
    a.wait_event_type,
    a.wait_event,
    now() - a.query_start            AS waiting_query_age,
    now() - a.xact_start             AS waiting_xact_age,
    left(a.query, 200)               AS waiting_query,
    unnest(pg_blocking_pids(a.pid))  AS blocking_pid
  FROM pg_stat_activity a
  WHERE a.datname = current_database()
    AND a.wait_event_type = 'Lock'
),
waiting_rel_locks AS (
  SELECT
    l.pid,
    l.mode,
    l.relation,
    l.relation::regclass AS relation_name
  FROM pg_locks l
  WHERE l.locktype = 'relation'
    AND l.granted = false
)
SELECT
  w.waiting_pid,
  w.waiting_user,
  w.waiting_app,
  w.waiting_client,
  w.waiting_state,
  w.waiting_query_age,
  w.wait_event_type,
  w.wait_event,
  rl.mode                AS waiting_lock_mode,
  rl.relation_name       AS locked_relation,
  w.blocking_pid,
  b.usename              AS blocking_user,
  b.application_name     AS blocking_app,
  b.client_addr          AS blocking_client,
  b.state                AS blocking_state,
  now() - b.xact_start   AS blocking_xact_age,
  now() - b.query_start  AS blocking_query_age,
  left(b.query, 200)     AS blocking_query
FROM waiting w
LEFT JOIN waiting_rel_locks rl
  ON rl.pid = w.waiting_pid
LEFT JOIN pg_stat_activity b
  ON b.pid = w.blocking_pid
ORDER BY
  w.waiting_query_age DESC,
  locked_relation NULLS LAST,
  w.waiting_pid,
  w.blocking_pid;

What this gives you:

  • A row per (waiting PID, blocking PID) pair.
  • The relation (when the waiting is on a relation lock) plus the waiting lock mode.
  • Full context about the blocker (transaction age, last query, state).

Why pg_blocking_pids() is helpful:

  • It uses PostgreSQL’s actual lock conflict rules, including edge cases, so you don’t need to infer blockers by “same relation” alone.

Reference: pg_blocking_pids()
https://www.postgresql.org/docs/current/functions-info.html


How to use the output during an incident

1) Find the hotspot relation quickly

Sort/scan for locked_relation that appears repeatedly. If dozens/hundreds of waiters point at the same relation, that’s your blast radius.

2) Identify the root blocker vs head-of-line waiter
  • If the blocker is idle in transaction with a huge blocking_xact_age, it’s often the root cause.
  • If the blocker is itself waiting on locks (check its wait_event_type in a separate query), you may be in a chain.
  • If many sessions are waiting and you see one DDL query waiting for a strong lock, that DDL may be the head-of-line waiter amplifying the incident.
3) Decide intervention

Common quick wins:

  • cancel/stop the head-of-line migration to remove queue amplification, or
  • terminate the root blocker that’s holding locks open.

(Your incident policy decides which is acceptable.)


Two small upgrades that often help

Upgrade A: collapse to “top blockers”

If you want a prioritized list:

SQLWITH w AS (
  SELECT unnest(pg_blocking_pids(pid)) AS blocking_pid
  FROM pg_stat_activity
  WHERE datname = current_database()
    AND wait_event_type = 'Lock'
)
SELECT
  b.pid,
  b.usename,
  b.application_name,
  b.client_addr,
  b.state,
  now() - b.xact_start AS xact_age,
  left(b.query, 200) AS query_preview,
  count(*) AS blocked_sessions
FROM w
JOIN pg_stat_activity b ON b.pid = w.blocking_pid
GROUP BY 1,2,3,4,5,6,7
ORDER BY blocked_sessions DESC, xact_age DESC;
Upgrade B: show the blocker’s relation locks too

Sometimes you want to confirm the blocker is holding (not just blocking via transactionid semantics). Add a join to pg_locks for the blocker’s granted relation locks.

Reference for join logic: pg_locks
https://www.postgresql.org/docs/current/view-pg-locks.html


Limitations to be aware of (so you don’t misread it)

  • This is optimized for lock waits (wait_event_type = 'Lock'). If the system is “frozen” due to LWLocks or I/O, you need a different approach (wait events, not pg_locks).
  • The relation info here is pulled from the waiting session’s relation lock request; some waits (especially row/transactionid waits) won’t have a relation row to display.
  • A blocker can be “blocking” due to transaction-level dependencies even if it doesn’t show an obvious relation lock row you expect.

Takeaway

Join pg_stat_activity (context + wait state) with pg_locks (lock details) and use pg_blocking_pids() (true blocking relationships) to get a high-signal incident view: victims → locked relation → blocking PID → blocker transaction/query. This is usually enough to identify the root blocker and stop the pileup fast.

30. Lock Timeouts: Preventing Infinite Waiting

A big reason lock incidents turn into outages is not that locks exist—it’s that sessions can wait forever by default. One DDL statement that sits in the lock queue for minutes can become head-of-line and trigger a pileup, saturating your pool and taking the app down.

lock_timeout is the simplest guardrail against that: it makes lock acquisition fail fast instead of waiting indefinitely.

References:


What lock_timeout does

lock_timeout sets a maximum amount of time a statement will wait to acquire a lock. If the lock cannot be acquired within that time, PostgreSQL cancels the statement with an error.

Key properties:

  • It applies to waiting for locks, not executing once you have them.
  • It’s measured per statement while it is blocked on locks.
  • It helps prevent queue amplification because the head-of-line waiter will give up instead of collecting victims behind it.

Reference: https://www.postgresql.org/docs/current/runtime-config-client.html


Why this matters specifically for stop-the-world incidents

Most production “freeze” stories have this shape:

  1. A hot table is being used normally.
  2. A migration or maintenance command requests a strong lock (ACCESS EXCLUSIVE or similar).
  3. It waits.
  4. Other sessions start queuing behind it.
  5. The pool fills, timeouts explode.

With lock_timeout, step 3 turns into:

  • “migration fails quickly,” not “migration becomes the reason everyone queues.”

That’s often the difference between:

  • an annoying failed deploy, and
  • a cascading outage.

How to set it (session, transaction, or role/database level)

Session-level (common for migration scripts)
SQLSET lock_timeout = '2s';
Transaction-level (only for the current transaction)
SQLBEGIN;
SET LOCAL lock_timeout = '2s';
-- your DDL here
COMMIT;

SET LOCAL is nice for migrations because it doesn’t leak into other transactions on the same connection.

Reference: GUC scoping rules and runtime settings
https://www.postgresql.org/docs/current/runtime-config-client.html

Role/database defaults (use carefully)

You can set a default for a user or database so “dangerous clients” can’t wait forever:

SQLALTER ROLE app_user SET lock_timeout = '1s';
-- or
ALTER DATABASE mydb SET lock_timeout = '1s';

This is powerful but can surprise applications that aren’t written to handle lock-timeout errors cleanly.


What error you’ll get (and what your app should do)

When lock_timeout triggers, Postgres raises an error like:

  • ERROR: canceling statement due to lock timeout

Apps should generally treat this as a retryable failure for idempotent operations, with backoff/jitter. For migrations, it usually means “abort and try again during a quieter window.”

(Exact SQLSTATE is documented, but the key is to handle it intentionally rather than letting it bubble up as a generic 500.)

Reference: error behavior tied to lock timeout setting
https://www.postgresql.org/docs/current/runtime-config-client.html


Choosing a value: what works in practice

A good operational stance is:

  • Short for DDL/migrations on hot tables (e.g., 1–5 seconds): fail fast, avoid pileups.
  • Longer (or default) for batch jobs where waiting is acceptable and you control concurrency.

There is no universal best value, but the rule is: the closer an operation is to “might request ACCESS EXCLUSIVE on a hot table,” the more you want a short lock_timeout.


lock_timeout vs statement_timeout

These are complementary:

  • lock_timeout limits waiting to acquire locks.
  • statement_timeout limits total statement runtime (including waiting + execution).

In stop-the-world scenarios, lock_timeout is often the more important safety valve because the worst incidents come from lock queueing, not from long CPU execution.

Reference: runtime client settings
https://www.postgresql.org/docs/current/runtime-config-client.html


A safe migration pattern using lock timeout

A common pattern is:

SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '30s';
-- DDL that might block
ALTER TABLE orders ADD COLUMN foo text;
COMMIT;

Outcome:

  • If the table is busy and the lock isn’t available quickly, the migration fails fast.
  • If it does acquire the lock, the statement has a bound on how long it can run.

This doesn’t make risky DDL “safe,” but it makes its failure mode much less catastrophic.


Takeaway

Lock timeouts turn “infinite waiting” into “bounded damage.” Set lock_timeout (especially for migrations and maintenance commands) so a single strong lock request can’t sit in the queue long enough to stall your entire application.

31. Statement Timeouts: Putting an Upper Bound on Damage

If lock_timeout prevents a statement from waiting forever to get a lock, statement_timeout prevents a statement from running (or waiting) forever in total. It’s your “hard ceiling” safety valve: no single SQL statement should be allowed to consume unlimited time and tie up a connection indefinitely.

References:


What statement_timeout does

statement_timeout sets the maximum time any single statement may run. If it exceeds the threshold, PostgreSQL cancels it.

Important details:

  • The timer includes time spent waiting (locks, I/O, etc.) plus actual execution.
  • It applies per statement (not per transaction).
  • When triggered, you’ll see an error like:
    • ERROR: canceling statement due to statement timeout

Reference: https://www.postgresql.org/docs/current/runtime-config-client.html


Why it matters in stop-the-world incidents

Lock-related outages often involve “stuck” statements that hold onto connections:

  • victims waiting on locks,
  • DDL waiting to acquire ACCESS EXCLUSIVE,
  • long-running queries holding locks/snapshots,
  • maintenance commands running far longer than expected.

Even if you set lock_timeout, you can still be harmed by statements that do acquire the lock but then run too long, or by long-running reads that indirectly block DDL. statement_timeout bounds both.

This is especially important if your application uses a connection pool: every stuck statement consumes a pool slot.


How to set it (session, transaction, role/database)

Session-level
SQLSET statement_timeout = '30s';
Transaction-scoped (recommended for migrations)
SQLBEGIN;
SET LOCAL statement_timeout = '30s';
-- your changes here
COMMIT;
Defaults for an app role
SQLALTER ROLE app_user SET statement_timeout = '5s';

Be cautious with role/database defaults: they can break legitimate long reports or maintenance tasks unless those tasks use a different role or override the setting.

Reference: https://www.postgresql.org/docs/current/runtime-config-client.html


statement_timeout vs lock_timeout: how to use both

They solve different failure modes:

  • lock_timeout: “Don’t wait too long to acquire locks.”
    Stops queue amplification and lock pileups.
  • statement_timeout: “Don’t let a statement run too long overall.”
    Caps total damage from runaway queries, slow plans, heavy I/O, or unexpectedly slow maintenance.

For production changes (especially DDL), using both is common:

SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '30s';
ALTER TABLE orders ADD COLUMN foo text;
COMMIT;

If this fails, it fails predictably, without turning into a multi-minute outage.


Picking values that won’t backfire

There’s no single correct value, but the safe approach is to pick timeouts based on intent:

  • Interactive OLTP queries: low seconds (e.g., 1–5s) often makes sense if your app can retry.
  • Background jobs: longer timeouts or explicit overrides, because they may legitimately scan lots of data.
  • Migrations/DDL: short lock_timeout, moderate statement_timeout, and run in controlled steps.

If your app doesn’t handle statement timeout errors, turning this on globally can cause new failures. The best rollout is usually:

  • start with migration tooling and batch jobs you control,
  • then consider app-wide defaults with proper retry/backoff.

Reference: runtime configuration discussion
https://www.postgresql.org/docs/current/runtime-config-client.html


What happens when a statement times out

When statement_timeout triggers:

  • the current statement is canceled,
  • the transaction is left in an aborted state if you were inside an explicit transaction block (you must ROLLBACK),
  • locks held by the transaction may remain held until you end the transaction (another reason to keep transactions short and handle errors cleanly).

This is a common gotcha: timeouts are protective, but only if your client cleans up by rolling back promptly.

Reference: transactional behavior and statement cancellation
https://www.postgresql.org/docs/current/runtime-config-client.html


Takeaway

statement_timeout is the “upper bound on damage” setting: it prevents runaway or unexpectedly slow statements from tying up connections indefinitely. Combined with lock_timeout, it gives you predictable failure modes for risky operations—especially migrations—so a single statement can’t quietly turn into a stop-the-world incident.

32. DDL Safely in Production: Patterns That Reduce Lock Pain

Running DDL on a busy PostgreSQL database is less about memorizing lock modes and more about adopting patterns that keep the “blast radius” small when locks are contested. The goal is to make schema changes:

  • short-lived (don’t hold strong locks long),
  • retryable (fail fast instead of queueing),
  • incremental (avoid table rewrites and full validations in one step),
  • observable (you can tell if it’s blocked and back out safely).

References:


Pattern 1: Fail fast on lock contention (don’t become head-of-line)

The most important production DDL habit is: never wait forever for a lock.

In migration sessions, set timeouts:

SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '30s';
-- DDL here
COMMIT;
  • lock_timeout prevents queue amplification: if you can’t get the lock quickly, you abort rather than jamming the table.
  • statement_timeout ensures the statement doesn’t run unbounded after it does get the lock.

Reference: https://www.postgresql.org/docs/current/runtime-config-client.html


Pattern 2: Keep the transaction scope tiny

DDL is transactional in PostgreSQL, which is great for correctness—but it also means locks can be held until COMMIT.

Rules that prevent accidental lock hostage situations:

  • Don’t bundle many unrelated DDL steps into one long transaction on hot tables.
  • Don’t do “DDL + data backfill + deploy + wait” in one transaction.
  • Avoid leaving a migration session open after the DDL runs.

If you use a migration tool that wraps everything in a single transaction by default, be extra cautious with operations that request strong locks.

Reference: transactional DDL and locking behavior
https://www.postgresql.org/docs/current/explicit-locking.html


Pattern 3: Prefer “online” variants when available

Some operations have online-friendly forms specifically to reduce lock pain:

  • CREATE INDEX CONCURRENTLY
  • REINDEX CONCURRENTLY (where supported)
  • Adding constraints in a way that avoids full blocking (e.g., NOT VALID then VALIDATE CONSTRAINT for certain constraints)

These aren’t magic—they still take locks—but they’re designed to avoid long ACCESS EXCLUSIVE holds.

References:


Pattern 4: Avoid table rewrites whenever possible

The operations that hurt most are the ones that rewrite the entire table:

  • VACUUM FULL
  • CLUSTER
  • some ALTER TABLE changes (especially certain type changes, defaults, storage changes)

For production systems, the question to ask isn’t “will it work?” but “does it rewrite the table, and how long will it hold an exclusive lock?”

If you can’t avoid a rewrite, treat it as a maintenance-window change or use an online rebuild approach.

Reference: ALTER TABLE rewrite behavior varies by subcommand; see docs
https://www.postgresql.org/docs/current/sql-altertable.html


Pattern 5: Use expand/backfill/contract instead of “big bang” changes

For application-facing schema changes, the safest pattern is almost always:

  1. Expand: add new nullable columns / new tables / new indexes (online-friendly)
  2. Backfill: populate data in batches (outside of strong DDL locks)
  3. Contract: enforce constraints, switch reads/writes, remove old columns later

This pattern reduces lock pain because the slow part (backfill) is not done under an exclusive schema lock.

(We’ll lay out this playbook in detail later, but it’s worth calling out as the default mindset.)


Pattern 6: Separate “validation” steps from “metadata” steps

Some operations look like metadata but require scanning data:

  • adding a NOT NULL constraint,
  • validating foreign keys,
  • validating CHECK constraints.

When possible, do:

  • add constraint as NOT VALID,
  • then validate it later in a controlled way.

This shifts risk from “instant lock outage” to “controlled background work,” usually with less blocking.

Reference: constraint validation options in ALTER TABLE
https://www.postgresql.org/docs/current/sql-altertable.html


Pattern 7: Reduce the blast radius with partitioning or smaller units

If a single table is enormous and hot, any strong lock on it is scary. Partitioning can:

  • limit operations to a smaller partition,
  • make maintenance and index builds more targeted,
  • reduce the number of concurrent sessions impacted by a single lock.

Partitioning isn’t free (it changes query planning and operational complexity), but it’s a powerful tool to avoid “one table locks the world.”

Reference: partitioning overview
https://www.postgresql.org/docs/current/ddl-partitioning.html


Pattern 8: Make DDL observable and abortable

Before and during production DDL:

  • watch pg_stat_activity for lock waits and long transactions,
  • watch pg_locks for a pending AccessExclusiveLock,
  • use progress views for long operations (pg_stat_progress_create_index, etc.),
  • be ready to abort quickly if it can’t get locks.

References:


A practical “safe DDL” checklist

Before you run DDL on a hot table:

  • □ Does it require ACCESS EXCLUSIVE (or rewrite the table)?
  • □ Do I have lock_timeout set so it won’t queue forever?
  • □ Is there any long-running / idle-in-transaction session right now?
  • □ Is the change deployable via expand/backfill/contract?
  • □ Do I have a rollback plan and a way to stop quickly?

Takeaway

Safe production DDL is about engineering the failure mode. Use timeouts to fail fast, keep transactions short, prefer concurrent/online operations, avoid rewrites, and roll schema changes out in phases (expand/backfill/contract). Those patterns turn lock behavior from “surprise outage” into “predictable, manageable risk.”

33. Using CREATE INDEX CONCURRENTLY to Avoid Blocking Writes

Creating an index on a busy table is a classic way to cause pain: a standard CREATE INDEX can block writes at the wrong time, and even when it doesn’t “stop the world,” it can create enough contention to trigger timeouts and pileups.

CREATE INDEX CONCURRENTLY is PostgreSQL’s built-in, production-friendly way to build an index while allowing normal reads and writes to keep flowing.

References:


What “concurrently” really means

CREATE INDEX CONCURRENTLY builds the index in a way that avoids holding locks that would block normal DML for long periods.

In practical terms:

  • Your application can keep doing INSERT/UPDATE/DELETE while the index is being built.
  • The index build takes longer and does more work than a normal CREATE INDEX.
  • There are still locks involved, but they’re chosen to minimize disruption to reads/writes.

Reference: https://www.postgresql.org/docs/current/sql-createindex.html


The big rule: you can’t run it inside a transaction block

This is the #1 operational gotcha:

SQLBEGIN;
CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);
COMMIT;

This will fail. PostgreSQL requires CREATE INDEX CONCURRENTLY to run outside an explicit transaction block.

So your migration tooling must support “non-transactional” migrations for this step.

Reference: CREATE INDEX restrictions
https://www.postgresql.org/docs/current/sql-createindex.html


How it avoids blocking writes (high-level)

A normal index build wants a more exclusive relationship with the table to ensure it sees a consistent set of rows and can safely update index state.

The concurrent build instead:

  • does an initial build while allowing writes,
  • then performs additional passes to catch rows that changed during the build,
  • and finally performs a short “swap/validation” step to make the index usable.

You don’t need every internal detail to operate it safely; what matters is the tradeoff:

  • less blocking
  • more time + more work

Reference: https://www.postgresql.org/docs/current/sql-createindex.html


What it does (and still doesn’t) block

Operationally, CREATE INDEX CONCURRENTLY is designed to avoid long write blocks, but you should still expect:

  • brief lock steps where it coordinates with other DDL on the same table
  • conflicts with some schema changes or other concurrent index operations

So “concurrently” is best read as:

  • “won’t take the table down for normal DML,” not
  • “never waits for locks.”

Reference: lock mode conflicts and index build notes
https://www.postgresql.org/docs/current/explicit-locking.html
https://www.postgresql.org/docs/current/sql-createindex.html


Practical usage patterns

Create the index
SQLCREATE INDEX CONCURRENTLY idx_orders_created_at
ON orders (created_at);
Combine with IF NOT EXISTS (when appropriate)
SQLCREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_created_at
ON orders (created_at);

Whether you can use IF NOT EXISTS depends on Postgres version and your migration style; it’s supported for CREATE INDEX in modern versions, but always check your target version behavior.

Reference: CREATE INDEX syntax
https://www.postgresql.org/docs/current/sql-createindex.html

Partial indexes are still a great fit
SQLCREATE INDEX CONCURRENTLY idx_orders_open_recent
ON orders (created_at)
WHERE status = 'open';

Partial indexes often provide big wins with smaller build time and lower write overhead.

Reference: partial indexes (covered under indexing docs)
https://www.postgresql.org/docs/current/indexes-partial.html


Safety guardrails for production

Even though it’s online-friendly, you still want guardrails:

  • Set a reasonable statement_timeout in the session running the migration (so it doesn’t run for hours unexpectedly).
  • Consider lock_timeout to avoid it sitting behind conflicting DDL and becoming part of a queue pileup.
  • Monitor progress via pg_stat_progress_create_index.

Example session setup:

SQLSET lock_timeout = '2s';
SET statement_timeout = '30min';
CREATE INDEX CONCURRENTLY idx_orders_created_at
ON orders (created_at);

References:


Monitoring a concurrent index build

Use the progress view to see whether it’s moving and what phase it’s in:

SQLSELECT *
FROM pg_stat_progress_create_index;

On a busy system, this is extremely helpful for distinguishing:

  • “it’s blocked waiting on a lock” vs
  • “it’s working, just slow.”

Reference: progress reporting
https://www.postgresql.org/docs/current/progress-reporting.html


Failure modes you should expect (and plan for)

  • If the statement is canceled or fails mid-way, you can end up with an invalid index that must be dropped or rebuilt.
  • It may take significantly longer than a normal index build, especially under heavy write load.
  • Running multiple concurrent index builds on the same table can amplify resource usage and increase the chance of contention.

Reference: CREATE INDEX notes on concurrent builds and invalid indexes
https://www.postgresql.org/docs/current/sql-createindex.html


Takeaway

CREATE INDEX CONCURRENTLY is the default choice for adding indexes to hot production tables because it avoids long write-blocking locks. The tradeoff is longer runtime and added complexity (notably: it can’t run inside a transaction block, and you should monitor/guard it with timeouts). Used correctly, it’s one of the highest-leverage tools for avoiding lock-driven outages during migrations.

34. Using REINDEX CONCURRENTLY When You Can

When an index needs rebuilding (bloat, corruption recovery, performance issues), the operational question is almost always: “Can we do this without blocking production traffic?”

If your PostgreSQL version and the specific index support it, REINDEX CONCURRENTLY is usually the best answer. It rebuilds indexes in an online-friendly way, similar in spirit to CREATE INDEX CONCURRENTLY, aiming to avoid long blocking of normal reads/writes.

References:


What REINDEX CONCURRENTLY is for

Use it when:

  • the table is hot and you can’t afford a blocking reindex,
  • you need to rebuild an index to reduce bloat or restore performance,
  • you want a safer alternative to REINDEX (non-concurrent), VACUUM FULL, or CLUSTER for index-specific problems.

It rebuilds the index while keeping the old index usable until a final swap step.

Reference: https://www.postgresql.org/docs/current/sql-reindex.html


The big restriction: not inside a transaction block

Just like CREATE INDEX CONCURRENTLY, you generally cannot run REINDEX CONCURRENTLY inside an explicit transaction block:

SQLBEGIN;
REINDEX INDEX CONCURRENTLY idx_orders_created_at;
COMMIT;

This will fail. Plan migrations accordingly (a separate non-transactional migration step, or a migration tool feature for “no transaction”).

Reference: REINDEX restrictions
https://www.postgresql.org/docs/current/sql-reindex.html


What it buys you (and what it doesn’t)

Benefits
  • Minimizes disruption to normal reads/writes.
  • Reduces the risk of stop-the-world lock incidents on hot tables.
  • Often the safest way to rebuild a critical index in production.
Tradeoffs
  • Takes longer than non-concurrent reindexing.
  • Uses more resources while it runs (extra work, extra I/O).
  • Still takes locks at various phases; it’s “online-friendly,” not “lock-free.”
  • Can be blocked by conflicting DDL on the same table/index, and can wait for locks.

References:


Typical usage

Reindex a single index concurrently:

SQLREINDEX INDEX CONCURRENTLY idx_orders_created_at;

Reindex all indexes on a table concurrently:

SQLREINDEX TABLE CONCURRENTLY orders;

Whether you should use TABLE scope depends on blast radius: it’s often safer to start with the single worst index and expand only if needed.

Reference: https://www.postgresql.org/docs/current/sql-reindex.html


Guardrails: avoid becoming the head-of-line waiter

Even concurrent maintenance can cause trouble if it waits too long on locks and triggers queueing effects. Use timeouts:

SQLSET lock_timeout = '2s';
SET statement_timeout = '30min';
REINDEX INDEX CONCURRENTLY idx_orders_created_at;
  • lock_timeout ensures you don’t sit in the queue forever if someone is doing DDL.
  • statement_timeout bounds the operation’s total time.

Reference: runtime settings
https://www.postgresql.org/docs/current/runtime-config-client.html


What to check before you run it

A quick pre-flight checklist:

  • Are there long-running transactions or idle in transaction sessions?
    Those can block key phases and make “online” operations painful.
  • Is there other DDL scheduled on the same table?
    Avoid overlapping schema changes with concurrent reindexing.
  • Is the index truly the problem?
    If the underlying issue is table bloat or bad query patterns, reindexing may not help much.

Failure handling you should plan for

If a concurrent reindex is interrupted, you may end up with an index state that requires cleanup (for example, rebuilding again). Operationally:

  • be prepared to retry during a quiet window,
  • confirm index validity afterwards (your operational tooling may check this; PostgreSQL also tracks index validity internally).

Exact details and edge cases depend on version and circumstances; treat concurrent reindex as safer, not risk-free.

Reference: REINDEX documentation notes
https://www.postgresql.org/docs/current/sql-reindex.html


Takeaway

Use REINDEX CONCURRENTLY whenever you need to rebuild indexes on hot production tables and your environment supports it. It’s designed to avoid the “maintenance took down prod” pattern by keeping reads/writes flowing, at the cost of longer runtime and more complexity (notably: no transaction block, and you still need timeouts and monitoring).

35. Adding Columns Safely: Fast Defaults and Avoiding Table Rewrites

“Add a column” sounds harmless, but in PostgreSQL it can range from a near-instant metadata change to a table-rewriting, outage-prone operation—depending on how you do it and which version/feature behavior you’re relying on.

The safe goal in production is usually:

  • avoid a full table rewrite,
  • avoid holding strong locks for long,
  • make the change deployable with expand/backfill/contract.

References:


The baseline: adding a nullable column with no default

This is typically the safest and fastest form:

SQLALTER TABLE orders ADD COLUMN foo text;

Why it’s usually safe:

  • It’s primarily a catalog (metadata) change.
  • PostgreSQL doesn’t need to rewrite every existing row just to “have a new column that is NULL for old rows.”

What can still hurt:

  • It may still require a strong table lock to perform the metadata change. On a hot table, the risk is often waiting to acquire the lock, not the execution time once acquired.

So still treat it like production DDL:

SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '30s';
ALTER TABLE orders ADD COLUMN foo text;
COMMIT;

References:


The foot-gun: adding a column with a default (historically rewrite-prone)

The risky pattern is:

SQLALTER TABLE orders ADD COLUMN foo text DEFAULT 'x';

Historically, PostgreSQL implemented this by rewriting the entire table to physically store the default value in every existing row—meaning:

  • big I/O,
  • long runtime,
  • and often long-lived strong locks (stop-the-world potential).

Modern PostgreSQL versions introduced an optimization often called “fast defaults” where certain constant defaults can be stored in metadata and treated as if they exist for old rows without rewriting the table. However, behavior depends on version and the exact default expression.

Because this is version- and expression-dependent, the safest operational stance is:

  • assume ADD COLUMN ... DEFAULT ... might rewrite unless you’ve verified your target Postgres version and the default expression qualifies for the fast-path.

Reference: ALTER TABLE behavior and notes are version-specific; always confirm for your Postgres version
https://www.postgresql.org/docs/current/sql-altertable.html


The safe production pattern: add it nullable, backfill, then enforce

If you need a non-null column with a default value, the most robust pattern is:

Step 1: Add column as nullable, no default (fast metadata change)
SQLALTER TABLE orders ADD COLUMN foo text;
Step 2: Backfill existing rows in batches (no ACCESS EXCLUSIVE lock)

Do this in controlled chunks to avoid long transactions and hot-row storms:

SQLUPDATE orders
SET foo = 'x'
WHERE foo IS NULL
  AND id >= 100000
  AND id < 110000;

Repeat in batches with application-side loop logic and sleep/backoff as needed.

Step 3: Add a default for new rows

Once old rows are backfilled, set a default for inserts going forward:

SQLALTER TABLE orders ALTER COLUMN foo SET DEFAULT 'x';
Step 4: Enforce NOT NULL (after you’re confident backfill is complete)
SQLALTER TABLE orders ALTER COLUMN foo SET NOT NULL;

This pattern reduces the highest-risk part (rewriting and long exclusive lock holds) by moving the expensive work into ordinary updates you can throttle.

Reference: ALTER TABLE for defaults and NOT NULL
https://www.postgresql.org/docs/current/sql-altertable.html


Avoiding the “NOT NULL validation outage”

Setting NOT NULL can require validating that all rows comply, which can take time on large tables. Two practical ways to reduce risk:

  • Ensure the backfill is truly complete before you attempt it.
  • Run it with tight timeouts so it fails fast rather than waiting and queueing:
SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '30s';
ALTER TABLE orders ALTER COLUMN foo SET NOT NULL;
COMMIT;

Even if it’s “fast” on your table, treat it as potentially disruptive until proven otherwise.

Reference: lock timeouts and DDL locking
https://www.postgresql.org/docs/current/runtime-config-client.html
https://www.postgresql.org/docs/current/explicit-locking.html


Special case: adding a column with a volatile/default expression

Even on versions with fast defaults, defaults like now()random(), or expressions depending on other columns may not qualify for metadata-only optimization and can force heavier work or different semantics.

Rule of thumb:

  • If the default isn’t a simple constant, treat it as higher risk and prefer the expand/backfill approach.

Reference: expression semantics and ALTER TABLE notes
https://www.postgresql.org/docs/current/sql-altertable.html


Takeaway

The safest way to add columns in production is usually:

  • ADD the column nullable with no default
  • BACKFILL in batches
  • SET DEFAULT for future rows
  • ENFORCE NOT NULL last

This avoids table rewrites in the common case, reduces lock hold time, and turns a potentially stop-the-world migration into a controlled, observable rollout.

36. Changing Column Types: What Forces a Rewrite and What Doesn’t

ALTER TABLE ... ALTER COLUMN ... TYPE ... is one of the most dangerous “simple” migrations in PostgreSQL. Sometimes it’s almost instant. Other times it rewrites the entire table, holds strong locks for a long time, and becomes a stop-the-world event.

The difference comes down to whether PostgreSQL must touch every row to convert or re-encode values.

References:


The big idea: metadata-only vs table rewrite

A type change can fall into two categories:

Metadata-only (usually fast)

Postgres updates the column’s type metadata, but does not need to rewrite each tuple.

This is possible when the on-disk representation is compatible or the conversion is essentially a “relabeling” that doesn’t require per-row transformation.

Table rewrite (potentially catastrophic)

Postgres must scan every row, compute a new representation, and write a new copy of the table (or otherwise rewrite the affected tuples). This is slow, I/O-heavy, and typically requires strong locks for the duration.

Reference: ALTER TABLE notes on type changes and table rewrites
https://www.postgresql.org/docs/current/sql-altertable.html


What usually forces a rewrite

Treat these as “rewrite likely” unless you’ve verified otherwise:

  • Conversions that require computing a new value per row
    e.g., text → integerjsonb → typed columns, timestamp → timestamptz with a USING expression, etc.
  • Changes that alter physical storage format
    For example, changing between types with different binary representations.
  • Any type change using a non-trivial USING clause
    If you need:SQLALTER TABLE t ALTER COLUMN c TYPE newtype USING some_expression(c);…you should assume it will touch every row.
  • Type changes on large, hot tables
    Even if the conversion is “simple,” if it’s not metadata-only, the runtime scales with table size.

What sometimes does not force a rewrite

There are type changes that can be cheap because they don’t require rewriting stored values (or can be performed without visiting all rows). Common examples people rely on:

  • Increasing the limit of a varchar(n) or numeric(p,s) in certain cases.
  • Changing varchar(n) to text (often compatible in practice).

However, exact behavior depends on PostgreSQL version and details, and the safe operational posture is:

  • Don’t assume a type change is metadata-only just because it “feels compatible.”
    Confirm with your version’s documentation and test on a production-like copy of the table.

Reference: ALTER TABLE type change behavior and restrictions
https://www.postgresql.org/docs/current/sql-altertable.html


Locking: why type changes are “stop-the-world” prone

Even if the conversion itself is fast, ALTER TABLE ... TYPE typically requires a strong lock on the table (often ACCESS EXCLUSIVE) because:

  • the table’s definition is changing,
  • concurrent queries must not see mixed type semantics,
  • indexes and constraints may need updates.

So the risk is both:

  • lock acquisition (can cause queue amplification), and
  • lock hold time (if it rewrites, it can hold the lock for a long time).

References:


Safer alternatives to “big bang” type changes

If the table is hot or large, the safest approach is usually to avoid in-place conversion.

Option A: Expand/backfill/contract with a new column
  1. Add a new column with the desired type:
SQLALTER TABLE t ADD COLUMN c_new newtype;
  1. Backfill in batches:
SQLUPDATE t
SET c_new = transform(c)
WHERE c_new IS NULL
  AND id BETWEEN ...;
  1. Update the application to read from c_new (or dual-read).
  2. Dual-write for a period (write to both).
  3. Validate and swap (rename columns, drop old later).

This turns one scary rewrite into controlled DML you can throttle.

Option B: Build a new table and swap

For very large transformations, it can be safer to:

  • create a new table with the correct schema,
  • copy/transform data (possibly with logical replication or triggers),
  • cut over with a controlled swap.

This is more complex but can avoid long exclusive locks.


Practical guardrails if you must do an in-place type change

If you decide to run ALTER COLUMN TYPE directly:

  • Set lock_timeout so it fails fast rather than sitting in the queue.
  • Set statement_timeout so it can’t run forever if it rewrites.
  • Run it during a low-traffic window if there’s any chance it rewrites.

Example:

SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '5min';
ALTER TABLE t
  ALTER COLUMN c TYPE newtype
  USING c::newtype;
COMMIT;

Reference: timeouts
https://www.postgresql.org/docs/current/runtime-config-client.html


How to know for sure

The only reliable way to know if your exact change will rewrite (for your Postgres version, column, data distribution, indexes, and constraints) is to:

  • read the version-specific docs for the exact subcommand, and
  • test it on a production-sized staging environment.

I can’t infer rewrite behavior perfectly without those details, but the risk management approach remains: assume rewrite unless verified, and prefer phased migrations for hot tables.


Takeaway

Changing column types is risky because many conversions require a full table rewrite and strong locks (ACCESS EXCLUSIVE). If you can’t prove the change is metadata-only for your version and case, treat it as a rewrite and use safer patterns like expand/backfill/contract. This is one of the highest-leverage places to avoid accidental stop-the-world outages.

37. Dropping Columns and Constraints: Lock Implications

Dropping things feels like it should be easy: remove a column, remove a constraint, done. In PostgreSQL, these operations are often fast in terms of CPU/I/O because they’re mostly catalog changes, but they can still be high impact because they require strong locks and can trigger dependency cascades.

The practical rule: drops are usually “quick to execute, dangerous to schedule.”

References:


Why drops can be disruptive even when they’re fast

Two reasons:

  1. Strong lock requirement
    Most ALTER TABLE operations, including many drops, require strong relation locks (commonly ACCESS EXCLUSIVE) to ensure nobody is concurrently using a schema that is being changed.
  2. Lock acquisition is the real risk
    On a hot table, the DDL might spend 99.9% of its wall-clock time waiting to acquire the lock. While it waits, it can become head-of-line and cause lock queue pileups (the stop-the-world symptom).

Reference: lock behavior and conflict rules
https://www.postgresql.org/docs/current/explicit-locking.html


Dropping columns

ALTER TABLE ... DROP COLUMN

Example:

SQLALTER TABLE orders DROP COLUMN legacy_code;

What happens:

  • PostgreSQL removes the column from the table’s definition (catalog metadata).
  • This does not require rewriting the whole table to physically remove that field from every row in the immediate sense, but it does require catalog updates and dependency cleanup.

Why it still hurts:

  • It typically needs a strong lock on the table while the catalog change is made.
  • If the table is hot, acquiring that lock is the outage risk.

References:
https://www.postgresql.org/docs/current/sql-altertable.html
https://www.postgresql.org/docs/current/explicit-locking.html

DROP COLUMN ... CASCADE

If you do:

SQLALTER TABLE orders DROP COLUMN legacy_code CASCADE;

You may remove dependent objects too (views, triggers, constraints, etc.). That expands the blast radius:

  • more catalog work,
  • more dependencies changed,
  • more surprising downstream breakage.

Even if locking remains focused on the table, the operational impact can be much larger.

Reference: CASCADE behavior in ALTER TABLE
https://www.postgresql.org/docs/current/sql-altertable.html


Dropping constraints

ALTER TABLE ... DROP CONSTRAINT

Example:

SQLALTER TABLE orders DROP CONSTRAINT orders_status_check;

Constraints are part of the table’s definition, and dropping them requires strong coordination:

  • queries planning/using the constraint must see a consistent schema,
  • other sessions must not simultaneously alter the same metadata.

So again: usually fast, but often requires a strong lock and can queue.

Reference:
https://www.postgresql.org/docs/current/sql-altertable.html

Special note: dropping uniqueness / primary keys

Dropping constraints that are backed by indexes (like PRIMARY KEY or UNIQUE) can have bigger effects than “just schema”:

  • The supporting index might be dropped (or become unused) depending on how it was created and referenced.
  • Query plans can change immediately.
  • Writes might get faster, but reads and join performance can tank if the index was crucial.

So the “impact” is often performance and correctness semantics as much as locking.


Dependency and blast radius pitfalls

Dropping schema objects often triggers surprises through dependencies:

  • views and materialized views,
  • foreign keys (both referencing and referenced sides),
  • triggers,
  • generated columns or expressions,
  • application prepared statements expecting the old schema.

From a lock perspective, the risk is still usually “strong lock on a hot table,” but from an outage perspective the risk is “we broke code paths immediately.”

Reference: dependency behavior is discussed throughout DDL docs; CASCADE is a major lever
https://www.postgresql.org/docs/current/sql-altertable.html


How to drop safely in production

These patterns reduce the chances of lock pain and surprise breakage:

1) Make it an explicit “contract” step

Don’t drop immediately after adding the replacement. Instead:

  • stop reads/writes to the old column in application code,
  • deploy and observe,
  • then drop in a later release.

This reduces the chance you need an emergency rollback that depends on the old schema.

2) Use timeouts so drops don’t become head-of-line
SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '30s';
ALTER TABLE orders DROP COLUMN legacy_code;
COMMIT;

If the table is busy, it fails fast instead of queueing and amplifying a stall.

Reference: timeouts
https://www.postgresql.org/docs/current/runtime-config-client.html

3) Avoid CASCADE unless you truly want the cascade

Prefer explicit drops of dependent objects so you can predict impact. CASCADE is convenient, but it’s a classic way to widen the change beyond what you intended.


Takeaway

Dropping columns and constraints is often quick in execution but can be high-risk in production because it typically requires strong locks and can trigger lock queue pileups on hot tables. Treat drops as planned “contract” steps, use timeouts to fail fast, and be cautious with CASCADE due to dependency blast radius.

38. Foreign Keys and Locking: Validations, NOT VALID Constraints, and Backfills

Foreign keys are essential for data integrity, but they’re also one of the easiest ways to accidentally introduce painful locking and long-running table scans during a migration. The good news: PostgreSQL gives you tools to make FK rollouts much safer, especially NOT VALID constraints and separate validation.

This section focuses on how FK creation/validation interacts with locking and how to roll them out with minimal production impact.

References:


Why foreign keys are operationally “heavy”

A foreign key has two tables:

  • referencing table (child): has the FK column(s)
  • referenced table (parent): has the primary key/unique key being referenced

Adding a foreign key isn’t just metadata. PostgreSQL must ensure:

  1. existing rows in the child all have matching parent rows (unless the FK allows NULLs and the values are NULL), and
  2. future changes keep that property true.

That first part—checking existing data—is the operational hazard: it can require scanning a large child table and probing the parent index many times.

Reference: FK behavior and constraint validation
https://www.postgresql.org/docs/current/ddl-constraints.html


Locking: what gets locked, and why it matters

At a high level, FK operations require table locks to prevent schema changes and ensure the validation sees a stable definition.

Practical implications:

  • The child table is where you feel most impact because validation touches its rows.
  • The parent table matters too because inserts/updates/deletes must be coordinated with referential checks, and the parent must have a suitable unique index/constraint.

Exact lock modes vary by operation and PostgreSQL version, but the key operational takeaway is:

  • FK creation/validation can take long enough that even “non-stop-the-world” lock modes become a production issue, and
  • if anything requests ACCESS EXCLUSIVE on either table while validation is happening (or queued), you can get lock queue pileups.

Reference: lock modes and DDL locking model
https://www.postgresql.org/docs/current/explicit-locking.html
FK DDL reference: https://www.postgresql.org/docs/current/sql-altertable.html


The safe rollout pattern: ADD CONSTRAINT NOT VALID, then VALIDATE

PostgreSQL allows you to add a foreign key without immediately scanning and validating existing rows:

Step 1: Add the FK as NOT VALID
SQLALTER TABLE child
  ADD CONSTRAINT child_parent_fk
  FOREIGN KEY (parent_id)
  REFERENCES parent(id)
  NOT VALID;

What this does:

  • Adds the constraint metadata.
  • Enforces the FK for new rows (and updates that touch the FK columns) going forward.
  • Does not scan/validate all existing child rows yet.

Reference: NOT VALID constraints
https://www.postgresql.org/docs/current/sql-altertable.html

Step 2: Backfill / fix existing data

Now you can clean up any orphan rows safely, using controlled batch jobs. Common approaches:

  • delete or archive orphan rows,
  • update them to a valid parent,
  • insert missing parent rows (if that’s correct),
  • or set FK column to NULL if semantics allow it.

Example orphan finder:

SQLSELECT c.*
FROM child c
LEFT JOIN parent p ON p.id = c.parent_id
WHERE c.parent_id IS NOT NULL
  AND p.id IS NULL
LIMIT 100;

Then fix in batches.

Step 3: Validate the constraint later
SQLALTER TABLE child
  VALIDATE CONSTRAINT child_parent_fk;

Validation scans the table to ensure all existing rows comply. It is still a heavyweight operation, but the critical improvement is that you can schedule it deliberately and you’ve already cleaned up the data, so it should complete predictably.

Reference: VALIDATE CONSTRAINT
https://www.postgresql.org/docs/current/sql-altertable.html


Why this pattern reduces lock pain

Compared to “add FK and validate immediately,” the NOT VALID approach:

  • decouples “start enforcing correctness for new data” from “scan the entire history,”
  • lets you do the expensive cleanup as throttled DML instead of one giant DDL event,
  • reduces the chance of a long-running validation being the trigger for lock queues at peak traffic.

It doesn’t make FK changes zero-risk, but it converts “surprise table scan during deploy” into “planned validation step.”


Indexing: the hidden prerequisite for safe FK rollout

A major operational mistake is adding an FK without the right indexes.

  • The parent side must have a PRIMARY KEY or UNIQUE constraint/index on the referenced columns (that’s required).
  • The child side should have an index on the FK column(s) in most real systems, because otherwise:
    • deletes/updates on the parent may require scanning the child table to check for referencing rows,
    • which can create huge performance spikes and long-running locks.

This is not just performance trivia; it affects how long FK-related operations hold locks and how likely they are to trigger pileups.

Reference: FK constraint requirements and general indexing guidance
https://www.postgresql.org/docs/current/ddl-constraints.html

(Postgres docs discuss the parent uniqueness requirement explicitly; the “index the child FK columns” guidance is a widely followed operational best practice because of how referential checks are performed.)


Guardrails: timeouts and sequencing

When adding or validating FKs in production:

  • Set lock_timeout so the operation fails fast instead of waiting and amplifying a queue.
  • Consider a moderate statement_timeout for validation so it can’t run unbounded.
  • Prefer running VALIDATE CONSTRAINT during a low-traffic window on very large tables.

Timeout settings reference:
https://www.postgresql.org/docs/current/runtime-config-client.html


Common failure modes (and how to avoid them)

Failure mode 1: Long transaction blocks validation

A long-running transaction (especially idle in transaction) can make validation take longer or cause lock acquisition delays. Always check for old xact_start sessions before running validation.

Reference: pg_stat_activity
https://www.postgresql.org/docs/current/monitoring-stats.html

Failure mode 2: Orphans discovered mid-deploy

If you add the FK without NOT VALID, the statement can fail after doing a lot of work—or worse, it can sit scanning while holding locks. Using NOT VALID moves the “do we have orphans?” discovery earlier and makes it manageable.

Failure mode 3: Parent updates/deletes become slow (missing child index)

If you validate and ship an FK without an index on child FK columns, you may create a performance regression that feels like random lockups later when parent rows are updated/deleted.


Takeaway

Foreign keys are correctness-critical but operationally tricky because validation can require large scans and prolonged locking interactions. The production-safe pattern is:

  1. ADD FK … NOT VALID
  2. backfill/fix existing data in batches
  3. VALIDATE CONSTRAINT in a planned step
  4. ensure indexes (especially on child FK columns) so ongoing enforcement doesn’t create future lock pain.

Done this way, FK rollouts stop being “deploy roulette” and become predictable, observable operations.

39. Safe Migration Playbook: Expand/Backfill/Contract

The most reliable way to avoid stop-the-world lock incidents during schema changes is to stop treating migrations as “one big DDL event” and instead roll them out in phases that are:

  • backward compatible,
  • retryable,
  • and reversible.

The canonical pattern is Expand → Backfill → Contract. It trades “one risky, blocking change” for “several small, controlled changes.”

References:


Why this playbook works (in locking terms)

Stop-the-world incidents usually happen when you combine:

  • strong locks (often ACCESS EXCLUSIVE),
  • long-running operations (table rewrites, full validations),
  • long transactions,
  • lock queue amplification.

Expand/backfill/contract reduces those risk multipliers by:

  • making “strong-lock” steps short,
  • doing expensive work (moving data) as throttled DML instead of a rewrite,
  • and postponing irreversible steps until you’ve proven the system is stable.

Phase 1: Expand

Goal: introduce new schema elements in a way that is compatible with the old application code.

Typical Expand actions (low risk when done carefully):

  • Add a new nullable column (no default)SQLALTER TABLE t ADD COLUMN new_col text;
  • Add a new table, or new enum values (version-dependent risk)
  • Add an index concurrently (for hot tables)SQLCREATE INDEX CONCURRENTLY idx_t_new_col ON t(new_col);
  • Add constraints as NOT VALID (when supported)SQLALTER TABLE t ADD CONSTRAINT t_new_col_check CHECK (new_col <> '') NOT VALID;

Operational guardrails:

  • Use short lock_timeout so Expand steps fail fast instead of queueing.
  • Keep Expand transactions tiny.

References:


Phase 2: Backfill

Goal: populate or transform existing data without relying on long exclusive locks.

Backfill is intentionally DML-heavy and time-flexible. You do it in batches so you can control load and avoid long transactions.

Example batch backfill:

SQLUPDATE t
SET new_col = old_col
WHERE new_col IS NULL
  AND id >= 100000 AND id < 110000;

Best practices for backfill:

  • Small batches (bounded time per transaction).
  • Commit frequently (avoid long transactions).
  • Throttle between batches to reduce I/O and lock pressure.
  • Monitor dead tuples/bloat and autovacuum behavior.

Backfill is where many teams accidentally create a different kind of incident (hot-row contention or massive write amplification). The key is pacing and observability.

References: MVCC/locking concepts
https://www.postgresql.org/docs/current/mvcc.html


Phase 3: Contract

Goal: make the new schema the source of truth and remove the old schema, but only after you’ve proven correctness and compatibility.

Typical Contract actions:

  • Switch application reads to new column/table.
  • Switch writes to new column/table (or stop dual-writing).
  • Validate constraints added as NOT VALID:SQLALTER TABLE t VALIDATE CONSTRAINT t_new_col_check;
  • Enforce NOT NULL (after backfill is complete):SQLALTER TABLE t ALTER COLUMN new_col SET NOT NULL;
  • Drop old columns, old constraints, or old indexes later:SQLALTER TABLE t DROP COLUMN old_col;

Contract is where you re-introduce stronger locking risk (drops, NOT NULL enforcement, validation). The difference is:

  • by now, the data is ready, so operations are faster and more predictable,
  • you can schedule the small “strong lock” steps carefully,
  • and rollback is easier because you’ve already run both schemas for a while.

References: ALTER TABLE, validation
https://www.postgresql.org/docs/current/sql-altertable.html


A concrete example: change column type safely

Problem: change customer_id from text to uuid on a hot table.

Expand

SQLALTER TABLE orders ADD COLUMN customer_id_uuid uuid;
CREATE INDEX CONCURRENTLY idx_orders_customer_id_uuid
  ON orders(customer_id_uuid);

Backfill

SQLUPDATE orders
SET customer_id_uuid = customer_id::uuid
WHERE customer_id_uuid IS NULL
  AND id BETWEEN ...;  -- batch window

Deploy app changes:

  • dual-write: write both customer_id and customer_id_uuid
  • gradually switch reads to customer_id_uuid

Contract

SQLALTER TABLE orders ALTER COLUMN customer_id_uuid SET NOT NULL;
-- later
ALTER TABLE orders DROP COLUMN customer_id;
ALTER TABLE orders RENAME COLUMN customer_id_uuid TO customer_id;

This avoids the in-place ALTER COLUMN TYPE rewrite-risk path.


Guardrails that make this playbook reliable

No matter the schema change, these practices keep the phases safe:

  • Fail fast: lock_timeout for DDL steps.
  • Bound runtime: statement_timeout for riskier operations.
  • Keep transactions short: especially during backfill.
  • Separate concerns: schema first, data later, enforcement last.
  • Observe continuously: watch pg_stat_activitypg_locks, and progress views.

Timeout reference:
https://www.postgresql.org/docs/current/runtime-config-client.html


Takeaway

Expand/backfill/contract is the default “production-safe” migration strategy because it minimizes long exclusive locks and turns risky schema changes into small, controllable steps. It doesn’t eliminate locking—but it makes locking predictable, limits queue amplification, and gives you safe rollback points.

40. Partitioning and Locks: How Partitions Can Limit Blast Radius

Partitioning is often sold as a performance feature, but for stop-the-world prevention it has an equally important benefit: it can reduce lock blast radius. If your workload naturally divides into slices (time, tenant, region, etc.), partitions let many operations target a smaller physical object, so a lock (even a strong one) affects less of your traffic.

Partitioning does not eliminate locking, and some partition maintenance operations still require strong locks. But it can turn “locking the world” into “locking one shard.”

References:


How partitioning changes the “unit of locking”

Without partitioning, a hot table is a single choke point. Many disruptive operations need strong locks on that one object.

With partitioning, you have:

  • partitioned parent table (mostly routing/metadata)
  • many child partitions (where the actual rows live)

Many DML operations (inserts/updates/queries) touch only the specific partition(s) relevant to the rows. That means:

  • row locks are naturally localized,
  • relation locks may also localize to the partition being accessed (depending on the operation),
  • and some maintenance can be done partition-by-partition.

Reference: partitioning concepts
https://www.postgresql.org/docs/current/ddl-partitioning.html


Ways partitions can limit lock blast radius

1) Maintenance can be done one partition at a time

Instead of:

  • rebuilding a huge index on one monolithic table,

you can:

  • rebuild indexes on one partition, then move to the next.

Even if an operation needs a strong lock, it impacts only a fraction of the table’s total traffic.

Examples:

  • REINDEX (CONCURRENTLY) a single partition’s index
  • VACUUM a single partition
  • CREATE INDEX CONCURRENTLY on a single partition (or on the parent in newer patterns that propagate)

Caveat: behavior varies by version and by whether you create indexes on the parent or per partition. Always validate in your version.

References:

2) Data lifecycle operations become safer

For time-based data, partitions enable “drop old data” as metadata operations:

  • detach/drop an old partition rather than deleting millions of rows.

This can be dramatically faster and can avoid long row-locking delete storms. But it can still involve strong locks for attach/detach/drop steps—so it’s safer, not free.

Reference: partition management concepts
https://www.postgresql.org/docs/current/ddl-partitioning.html

3) Hot spots are contained

In multi-tenant systems, a single tenant can become a hot spot. Partitioning by tenant can isolate:

  • lock contention,
  • vacuum pressure,
  • index bloat,
  • and even some kinds of DDL/maintenance scheduling.

This is only worth it if tenant distribution and query patterns make it viable; otherwise you can create too many partitions and hurt planning/maintenance.


The locking reality: partition operations can still be “big locks”

Some partition maintenance operations can request strong locks, sometimes on:

  • the partitioned parent,
  • the partition being attached/detached,
  • and potentially related objects.

Examples that often require careful scheduling:

  • ALTER TABLE parent ATTACH PARTITION ...
  • ALTER TABLE parent DETACH PARTITION ...
  • dropping partitions
  • changing constraints/indexes that propagate or require parent-level coordination

If your parent table is extremely hot, even short-lived strong locks on the parent can cause a noticeable stall.

Reference: partition operations are ALTER TABLE subcommands
https://www.postgresql.org/docs/current/sql-altertable.html
Lock modes overview: https://www.postgresql.org/docs/current/explicit-locking.html


A practical “blast radius” example

Imagine an events table partitioned by day:

  • Without partitioning: a lock on events blocks most event reads/writes.
  • With daily partitions: a lock on events_2025_12_21 blocks only traffic touching today’s partition; older partitions are unaffected, and many queries (especially historical ones) continue.

Even more importantly, if you’re doing maintenance on an old partition (say last month), you can often do it with negligible impact on “today’s hot path.”


Tradeoffs and pitfalls (so you don’t partition just for locks)

Partitioning adds complexity:

  • more objects to manage (tables, indexes),
  • more planning considerations for queries,
  • operational work to create future partitions and manage retention,
  • potential performance issues if you have too many partitions or poor pruning.

So the decision should be driven by clear needs: retention management, query pruning, and/or lock blast radius reduction.

Reference: partitioning caveats and pruning
https://www.postgresql.org/docs/current/ddl-partitioning.html


Operational guidance: using partitioning to reduce stop-the-world risk

If your main goal is “avoid one table locking everything”:

  • Partition along a dimension that aligns with most queries (time is the most common).
  • Keep partitions “small enough” that maintenance and rewrites are tolerable per partition.
  • Run strong-lock operations (even ACCESS EXCLUSIVE-requiring ones) on a single partition at a time.
  • Use lock_timeout on partition attach/detach/drop steps to avoid queue amplification.

Timeout reference:
https://www.postgresql.org/docs/current/runtime-config-client.html


Takeaway

Partitioning can limit lock blast radius by changing the unit of contention from “one giant table” to “many smaller partitions.” That makes maintenance and even some disruptive operations safer because you can target one partition at a time. But partition management itself can require strong locks—so treat attach/detach/drop as planned operations with timeouts and careful scheduling.

41. Advisory Locks: When to Use Them and When Not To

Advisory locks are PostgreSQL’s “bring your own mutex” feature. They let your application (or ops tooling) coordinate work using locks that PostgreSQL enforces, but PostgreSQL does not take automatically. Used well, advisory locks prevent whole classes of race conditions. Used poorly, they create self-inflicted stop-the-world incidents.

References:


What advisory locks are

Advisory locks are:

  • explicit: you request them via functions (pg_advisory_lockpg_try_advisory_lock, etc.)
  • named by integers: either one bigint key or two int keys
  • tracked by Postgres like other heavyweight locks (they show up in pg_locks)
  • independent of tables/rows: they don’t lock data; they lock a key you choose

They’re called “advisory” because Postgres won’t automatically tie them to data correctness—you must design the protocol.

Reference: https://www.postgresql.org/docs/current/functions-admin.html


Two flavors: session-level vs transaction-level

This distinction matters a lot operationally.

Session-level advisory locks
  • Functions: pg_advisory_lockpg_try_advisory_lockpg_advisory_unlock
  • Held until you explicitly unlock or the session ends.

Risk: if your app crashes, loses network, or forgets to unlock, you can block other workers indefinitely (until the connection is dropped).

Transaction-level advisory locks
  • Functions: pg_advisory_xact_lockpg_try_advisory_xact_lock
  • Held until the transaction ends (commit/rollback).

These are usually safer because they can’t outlive the transaction. If you’re choosing between the two, transaction-level is often the better default.

Reference: advisory lock function list
https://www.postgresql.org/docs/current/functions-admin.html


When advisory locks are a good idea

1) Singleton jobs / leader election (lightweight coordination)

Examples:

  • “Only one instance of the cron job runs at a time.”
  • “Only one worker does the daily aggregation.”

Pattern:

SQLSELECT pg_try_advisory_lock(42) AS got_lock;

If got_lock is false, exit quickly.

Why it’s good:

  • Simple
  • No extra coordination service
  • Lock is released if the DB session dies
2) Serialize a critical section that spans multiple tables

If you need a cross-table invariant and don’t want to invent a “lock table” row, advisory locks can be a clean way to guarantee only one worker executes a section at once.

3) Prevent concurrent migrations

Many teams use an advisory lock key like hashtext('migrations') to ensure only one migrator runs at a time—especially important if you have multiple deploy pipelines.


When advisory locks are a bad idea

1) Using them as a “global DB mutex” in the hot path

If you do something like:

  • every request tries to take the same advisory lock key

…you’ve created a single-threaded bottleneck. Under load, this looks exactly like a stop-the-world event: huge waits, pool exhaustion, timeouts.

If you must coordinate per-entity, use a keyed lock (e.g., one key per account id), not one global key.

2) Holding session-level locks across network calls or user think-time

This is the advisory-lock version of “idle in transaction”:

  • worker takes lock
  • does HTTP call / waits for user / waits on queue
  • forgets to release promptly

Result: other workers block behind it.

If the critical section includes any unpredictable wait, avoid session-level advisory locks or use strict timeouts and try-style acquisition.

3) Using advisory locks to “fix” data consistency issues

If you’re using advisory locks to paper over missing constraints, missing unique indexes, or flawed transaction design, you’ll eventually lose. Advisory locks can help coordinate workflows, but they’re not a substitute for database constraints and correct transaction logic.


Best practices that keep advisory locks safe

Prefer pg_try_* over blocking locks in most application flows

Blocking lock acquisition can create pileups:

SQLSELECT pg_try_advisory_xact_lock(12345);

If false, back off and retry later.

Use transaction-level locks unless you truly need session scope

Transaction-level locks naturally bound the blast radius.

Design your lock key scheme carefully
  • Pick a stable namespace (e.g., one bigint per resource).
  • Avoid collisions across features.
  • Consider using two-key form: (feature_id, entity_id) for readability.

Reference: function signatures
https://www.postgresql.org/docs/current/functions-admin.html

Add timeouts and observability

Advisory locks show in pg_locks with locktype = 'advisory'. You can detect stuck holders and measure waiters.

Reference: pg_locks
https://www.postgresql.org/docs/current/view-pg-locks.html


How to troubleshoot advisory lock stalls

If you suspect advisory locks are the bottleneck:

  1. Find advisory locks and who holds them:
SQLSELECT
  l.pid,
  a.usename,
  a.application_name,
  a.client_addr,
  l.mode,
  l.granted,
  now() - a.xact_start AS xact_age,
  left(a.query, 200) AS query_preview
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE l.locktype = 'advisory'
ORDER BY l.granted DESC, xact_age DESC NULLS LAST;
  1. Identify whether you have:
  • one “global” key everyone wants, or
  • many keys (per-entity contention), or
  • leaked session locks (holders idle for a long time).

References:


Takeaway

Advisory locks are great for coordinating work (singleton jobs, migration serialization, cross-table critical sections) when used with scoped keystransaction-level locks, and try + backoff patterns. They’re dangerous when used as a global mutex or held across unpredictable waits, because they can create self-inflicted stop-the-world behavior.

42. Application-Level Tactics: Retry Logic, Jitter, and Backpressure

Database locking is only half the story. The other half is what your application does when it hits a lock wait. Most “stop-the-world” incidents become outages because applications respond to timeouts and transient lock failures by doing the worst possible thing: piling on.

The goal at the application layer is to convert “lock contention” into “controlled slowdown,” using:

  • retry logic (for the right errors),
  • jittered backoff (so retries don’t synchronize),
  • backpressure (so you don’t overload the database and your pool).

References:


Why naive retries make lock incidents worse

During a lock freeze:

  • many requests are already waiting,
  • the pool is near saturation,
  • latency is spiking.

If every request times out and instantly retries, you increase concurrency precisely when the system is least able to handle it. This creates:

  • more waiting sessions,
  • longer queues,
  • more timeouts,
  • and a “retry storm” that can keep the system unstable even after the root blocker is gone.

So the problem is not “retries are bad,” it’s “unbounded, synchronized retries are bad.”


Retry logic: what to retry and what not to retry

Retryable categories (common)

These are often safe to retry if the operation is idempotent (or you have an idempotency key):

  • Lock timeout: statement canceled due to lock_timeout
  • Statement timeout: statement canceled due to statement_timeout (sometimes retryable, sometimes indicates a real performance issue)
  • Deadlock detected: Postgres aborts one participant; retry is the normal pattern
  • Serialization failures at SERIALIZABLE (retry is required by design)

Postgres documents SQLSTATE codes; in practice you usually match on SQLSTATE rather than parsing text.

Reference: SQLSTATE appendix (including deadlock 40P01, serialization failures, etc.)
https://www.postgresql.org/docs/current/errcodes-appendix.html

Usually not retryable (or only with human-level caution)
  • Syntax/permission errors
  • Constraint violations (unless your business logic expects it)
  • “relation does not exist” / schema mismatch (deploy ordering issue)
  • Out-of-disk / corruption / admin shutdown type errors

A helpful rule: retry only when the failure is plausibly transient and retrying won’t amplify harm.


Backoff and jitter: how to retry without stampeding

Exponential backoff

Instead of retrying immediately, wait longer after each failure:

  • attempt 1: wait ~50ms
  • attempt 2: wait ~100ms
  • attempt 3: wait ~200ms
  • cap at some maximum (e.g., 1–2 seconds)
Add jitter (randomness)

If every worker uses the same backoff schedule, they wake up together and collide again. Jitter desynchronizes retries.

Two common approaches:

  • full jitter: pick a random value between 0 and the current backoff cap
  • equal jitter: wait half the cap plus a random half

Either is fine; the important point is “not synchronized.”

Practical example pseudocode:

textbase = 50ms
cap  = 2000ms
for attempt in 1..N:
  try operation
  if success: return
  sleep random(0, min(cap, base * 2^(attempt-1)))

Backpressure: stop feeding the queue

Retries handle individual failures; backpressure protects the system.

1) Bound concurrency to the database

Your app should have a hard upper bound on concurrently in-flight DB work. Usually that’s your connection pool size, but also consider:

  • limiting the number of worker threads that can simultaneously execute DB-heavy code paths,
  • using bulkheads: separate pools for “critical path” and “background jobs.”

The aim is: a lock incident in a background job should not starve your user-facing traffic.

2) Fail fast when the pool is exhausted

If a request can’t get a DB connection quickly, it’s often better to:

  • return a fast error (or degraded response),
  • rather than wait, time out, and retry.

This prevents “threads stuck waiting for pool” from taking down the whole app server.

3) Adaptive throttling

If you detect high DB latency or lock timeout rates, dynamically reduce request rate for heavy endpoints (queue, shed load, or degrade). This is especially valuable during migrations or partial outages.


Align app behavior with database timeouts

A healthy pattern is:

  • Postgres lock_timeout is short for contentious operations, so DDL fails fast.
  • App request timeout is slightly larger than normal DB query budget.
  • App retries are bounded and jittered.

If your application timeout is shorter than lock_timeout or statement_timeout, you can end up in a weird state:

  • the app gives up and retries,
  • but the original statement is still waiting in Postgres,
  • multiplying load.

So you want your timeout layers to be coherent.

Reference: timeouts
https://www.postgresql.org/docs/current/runtime-config-client.html


Use try-style locks for singleton workflows

If you use advisory locks or other “only one worker should do this” patterns, prefer try-lock + backoff:

  • pg_try_advisory_lock / pg_try_advisory_xact_lock

This avoids having dozens of workers blocked on the same lock key.

Reference: advisory lock functions
https://www.postgresql.org/docs/current/functions-admin.html


Practical checklist for production apps

  • Retry only retryable errors (use SQLSTATE).
  • Make retries bounded (max attempts, max total time).
  • Add jitter to every retry.
  • Cap concurrency per DB (pool + bulkheads).
  • Fail fast if the pool is exhausted.
  • Use idempotency keys for write retries.
  • Avoid long transactions in request flows (no user think-time / network calls inside a transaction).

Takeaway

Locking incidents become outages when applications amplify them. Good application behavior—bounded retries with jitter, plus backpressure and sensible timeouts—turns “database waiting” from a cascading failure into a controlled degradation that recovers quickly once the root blocker is gone.

43. Connection Poolers and Lock Storms: What PgBouncer Changes

Connection poolers (most famously PgBouncer) are often deployed to protect PostgreSQL from having to manage thousands of client connections. They can dramatically improve stability and throughput. But they also change the shape of lock incidents—sometimes for the better, sometimes in surprising ways.

This section focuses on how poolers interact with lock storms and “stop-the-world” events, and what to configure so the pooler doesn’t become an outage amplifier.

References:

(PgBouncer behavior is defined by its own docs; Postgres docs don’t fully describe it.)


What a pooler changes at a high level

Without a pooler, every application connection is usually a PostgreSQL backend process. During a lock storm:

  • requests hang,
  • apps open more connections (or keep them stuck),
  • Postgres ends up with too many backends,
  • memory grows, context switching increases, and recovery gets harder.

With a pooler:

  • the app can have many client connections,
  • but the database sees a smaller, controlled number of server connections.

That means a lock storm is less likely to become “we hit max_connections and everything fell over.”


The big concept: poolers cap concurrency (this is usually good)

Lock storms are queueing problems: too many requests pile up behind a blocker.

A pooler helps by limiting how many queries can be simultaneously waiting inside Postgres. Instead of 2,000 app threads all waiting on locks in the database, you might have:

  • 100 server connections waiting in Postgres,
  • and 1,900 client connections queued in PgBouncer.

That can be a major stability win because Postgres isn’t forced to context-switch and manage lock state for thousands of waiting backends.


But the queue doesn’t disappear: it moves

A pooler doesn’t fix blocking—it relocates the backlog:

  • without a pooler: queue is mostly inside Postgres
  • with a pooler: queue is partly inside PgBouncer

Operational implication:

  • Your app may still time out, but Postgres metrics might look “less insane.”
  • You must monitor PgBouncer too, or you’ll miss the pileup.

Pool modes: how PgBouncer changes transaction behavior

Session pooling
  • One client connection maps to one server connection for the life of the client session.

Pros:

  • Works with session features: temp tables, prepared statements (server-side), session variables, etc.
  • Least surprising.

Cons:

  • Less efficient at sharing server connections.
  • During lock storms, server connections can still get “stuck” per client.
Transaction pooling
  • A server connection is assigned to a client only for the duration of a transaction, then returned to the pool.

Pros:

  • Greatly improves server connection reuse.
  • Often the best mode for OLTP workloads.

Cons (very important):

  • You cannot rely on session state being preserved (session-level prepared statements, SET variables, temp tables, advisory session locks, etc.).
  • If your app assumes session affinity, transaction pooling can cause correctness bugs.
Statement pooling
  • Even more aggressive reuse (per statement). Rarely used for typical apps because it breaks too many assumptions.

PgBouncer reference: pool modes
https://www.pgbouncer.org/


Lock storms under PgBouncer: what gets better

1) You’re less likely to melt Postgres with connection count

By capping server connections, you reduce:

  • per-backend memory overhead,
  • CPU spent on managing thousands of idle/waiting processes,
  • the “death spiral” where timeouts cause more connection attempts.
2) You can protect critical traffic with separate pools

A powerful pattern is to run multiple logical pools / users:

  • a small, protected pool for migrations/maintenance,
  • a primary pool for OLTP traffic,
  • a background pool for batch jobs.

This prevents a batch backfill from consuming all server connections during a lock incident.

(Exact implementation depends on your PgBouncer config.)


Lock storms under PgBouncer: what can get worse or more confusing

1) More client timeouts while Postgres looks “fine”

Because the queue is now in PgBouncer, clients may time out waiting for a server connection even though Postgres isn’t at max connections. This is still an outage, just with different symptoms.

2) “Idle in transaction” is still deadly

PgBouncer doesn’t magically prevent long transactions. In session pooling, an idle-in-transaction client can tie up a server connection and hold locks. In transaction pooling, a client that holds a transaction open still holds the server connection for that transaction duration.

So the classic root causes remain: long transactions + strong lock requests.

Reference: idle_in_transaction_session_timeout
https://www.postgresql.org/docs/current/runtime-config-client.html

3) Prepared statements and session state assumptions can break

If you switch to transaction pooling but your application uses:

  • server-side prepared statements,
  • temp tables,
  • session-level advisory locks,
  • session-local SET values,

you can get functional bugs that are hard to trace because they appear only under load or failover.


Operational advice: settings and tactics that help during lock storms

Use database timeouts anyway

Even with PgBouncer, set:

  • lock_timeout for DDL/migrations
  • statement_timeout for runaway queries
  • idle_in_transaction_session_timeout to kill “lock hostage” sessions

Reference:
https://www.postgresql.org/docs/current/runtime-config-client.html

Size pools intentionally
  • Too many server connections → you lose the main benefit.
  • Too few server connections → you can create artificial bottlenecks and longer client queues.

A good configuration is workload-specific, but the principle is: use PgBouncer to cap and shape concurrency, not to maximize it.

Monitor both sides
  • Postgres: pg_stat_activitypg_locks, commit rate
  • PgBouncer: client queue length, wait times, pool saturation

(PgBouncer exposes its own stats via its admin console; see its docs.)


Takeaway

PgBouncer doesn’t prevent locks, but it changes lock storms from “Postgres has 2,000 stuck backends” to “a controlled number of Postgres backends are stuck and the rest are queued in the pooler.” That usually improves stability and recovery, but it also makes incidents easier to misdiagnose unless you monitor PgBouncer. Choose the right pool mode (session vs transaction), avoid session-state assumptions in transaction pooling, and keep database-level timeouts in place to prevent lock queue amplification.

44. High Availability Considerations: Replication, Failover, and DDL

High availability (HA) changes the operational meaning of “stop-the-world.” A lock incident might be isolated to the primary, but its effects can propagate through replication lag, failover behavior, and read-scaling patterns. DDL in particular has sharp edges in HA setups because schema changes are both lock-heavy and replication-visible.

This section covers the main HA interactions you need to think about: streaming replication lag, failover timing/consistency, and how to run DDL without turning replicas (or failovers) into a second incident.

References:


Replication reality: locks don’t replicate, but their effects do

Locks are local to a server. A lock queue on the primary does not “replicate” to a standby.

But the work that causes the lock incident often has downstream effects:

  • DDL and big maintenance operations can generate lots of WAL.
  • Heavy write bursts (including backfills and rewrites) can increase replication lag.
  • If your app reads from replicas, lag can become an availability problem: stale reads, inconsistent behavior, or read timeouts.

Reference: replication and WAL
https://www.postgresql.org/docs/current/high-availability.html
https://www.postgresql.org/docs/current/wal-intro.html


How DDL behaves with streaming replication

Most DDL is transactional: it commits on the primary and then the commit record replicates. Standbys apply WAL in order, so:

  • the schema change appears on the standby after it replays the commit,
  • but only once it reaches that WAL position.

This creates a few practical hazards:

1) Schema drift during lag

If you deploy app code expecting the new schema immediately, but replicas lag, you can get:

  • “column does not exist” errors on read replicas,
  • or ORM query failures if reads go to standbys.

This is why migration sequencing matters more with replicas.

Reference: transactional behavior and HA consistency topics
https://www.postgresql.org/docs/current/high-availability.html
https://www.postgresql.org/docs/current/sql-altertable.html

2) Rewrites and large backfills increase lag

Operations that rewrite tables or generate heavy WAL (large updates, index builds, VACUUM FULLCLUSTER) can push replicas behind. Even if the primary survives the lock risk, your read capacity might collapse because replicas can’t keep up.

Reference: WAL generation concepts
https://www.postgresql.org/docs/current/wal-intro.html


Failover considerations: what happens to locks and in-flight DDL?

During failover:

  • existing locks on the old primary disappear because that server is no longer the primary.
  • in-flight transactions (including DDL) on the old primary are either:
    • committed and replicated (if they had committed and been replayed), or
    • lost/rolled back (if they never committed or didn’t reach the standby).

Practical implications:

1) “Did the migration happen or not?”

After failover, you can end up in an uncertain state if:

  • the migration client saw an error/disconnect mid-flight,
  • and you don’t know whether the DDL committed.

Because DDL is transactional, the truth is in the new primary’s catalogs: check schema state rather than assuming success/failure from the client’s perspective.

Reference: HA and transactional guarantees
https://www.postgresql.org/docs/current/high-availability.html

2) Prepared statements / cached plans across failover

Applications often cache prepared statements or schemas. After failover + DDL changes, apps can see plan invalidation issues or errors until they reconnect or refresh. This is an app/driver concern, but it’s commonly triggered by DDL rollouts in HA environments.

(Postgres handles plan invalidation internally on one server; across failover you’re effectively connecting to a different server.)


Read replicas and “stop-the-world”: the gotcha with locking + stale reads

If you do “expand/backfill/contract” migrations with read replicas:

  • Expand (add new nullable columns, new tables, new indexes) is usually safe.
  • During lag, some readers might still be on the old schema.
  • If your app uses replicas for reads, you must ensure your code path is compatible with both schemas until replicas catch up.

This is why expand/backfill/contract is not just about locks—it’s also about replication-safe compatibility.

Reference: partitioned deployment / rolling upgrades ideas map back to HA docs and transactional DDL
https://www.postgresql.org/docs/current/high-availability.html


Operational patterns that reduce HA pain during DDL

1) Prefer backward-compatible schema changes

Expand/backfill/contract is especially valuable in HA because it tolerates replication lag and rolling deploys.

Reference: ALTER TABLE and constraint validation options
https://www.postgresql.org/docs/current/sql-altertable.html

2) Avoid rewrite-heavy operations on the primary during peak

Even if you can tolerate a lock, you might not tolerate the replication lag.

Treat these as high risk for HA:

  • VACUUM FULL
  • CLUSTER
  • large backfills without throttling
  • non-concurrent index rebuilds
  • large ALTER TABLE ... TYPE rewrites

References:
https://www.postgresql.org/docs/current/sql-vacuum.html
https://www.postgresql.org/docs/current/sql-cluster.html
https://www.postgresql.org/docs/current/sql-altertable.html

3) Use timeouts to keep DDL from becoming a long lock queue

HA doesn’t prevent queue amplification on the primary. Use:

  • lock_timeout (fail fast)
  • statement_timeout (cap runtime)

Reference: https://www.postgresql.org/docs/current/runtime-config-client.html

4) Plan cutovers with replication state in mind

If your app uses replicas:

  • don’t switch reads/writes to the new schema until replicas have replayed the DDL commit (or route that code path to the primary temporarily).
  • in some setups, teams gate deploy phases on replication lag thresholds.

(Postgres provides replication status views; exact choice depends on your HA tooling.)

Reference: replication monitoring views (in HA docs and monitoring stats)
https://www.postgresql.org/docs/current/high-availability.html
https://www.postgresql.org/docs/current/monitoring-stats.html


Takeaway

In HA systems, lock incidents are only part of the risk. DDL and heavy maintenance can also create replication lag, schema visibility gaps between primary and replicas, and uncertainty during failover about whether a migration committed. The safest approach is to use backward-compatible migrations (expand/backfill/contract), avoid rewrite-heavy operations on hot primaries, and keep strict timeouts so DDL never becomes a long-lived head-of-line blocker.

45. Scheduling Risky Operations: Maintenance Windows vs Online Techniques

Not every PostgreSQL operation can be made “online,” and not every operation needs a midnight maintenance window. The art is knowing which changes are inherently downtime-prone (strong locks, table rewrites) versus which can be engineered to be low impact (concurrent builds, phased migrations).

This section gives you a scheduling framework: when to insist on a maintenance window, when to use online techniques, and how to avoid becoming the head-of-line lock waiter either way.

References:


The decision: “online” vs “window” is about failure mode

Ask two questions:

  1. Does this operation require a strong lock for a meaningful duration?
    If yes, it’s downtime-prone.
  2. Can we redesign it into smaller steps that hold strong locks only briefly?
    If yes, you can usually do it online.

If the answer to (2) is “no,” you either accept a maintenance window or accept outage risk.


Operations that usually belong in a maintenance window

These commonly cause stop-the-world behavior because they rewrite tables or need ACCESS EXCLUSIVE in ways that can last:

Even in a window, still set lock_timeout and statement_timeout so a “window change” doesn’t become an all-night incident.

Reference: timeouts
https://www.postgresql.org/docs/current/runtime-config-client.html


Operations that are good candidates for online techniques

These are commonly doable during business hours if you do them correctly:

Online doesn’t mean “no risk.” It means “bounded and observable risk,” where the worst case is typically “it fails fast and you try later,” not “it locked the table for 20 minutes.”


The “gray zone”: depends on table hotness and your tooling

Some operations can be either windowed or online depending on traffic, table size, and appetite for risk:

  • ALTER TABLE ... ADD COLUMN ... DEFAULT ...
    Could be fast with “fast defaults” in some versions and cases, but can also be rewrite-prone or lock-prone. Verify on your version and table; otherwise use expand/backfill/contract.
  • DROP COLUMN / DROP CONSTRAINT
    Often quick but still needs strong locking; safe if you can acquire the lock quickly, risky if the table is extremely hot.
  • Partition attach/detach operations
    Can be quick but can still require strong coordination locks.

References:


Scheduling guidance that works in real systems

1) Prefer “online” when you can bound the lock wait

If you can run the operation with:

  • short lock_timeout (so it won’t queue),
  • and you have a rollback/retry plan,

then you can often schedule it during normal hours, because the worst case is a quick failure.

Example pattern:

SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '60s';
-- small DDL step
COMMIT;

Reference: https://www.postgresql.org/docs/current/runtime-config-client.html

2) Use maintenance windows when the lock hold time is inherently long

If the operation:

  • must rewrite a very large table,
  • will generate huge I/O/WAL,
  • or must hold strong locks for minutes,

then a window is the honest choice. Trying to “online” it usually just means “we risk a surprise outage during peak.”

3) Separate risk: do the dangerous part when traffic is low

Even within “online” techniques, you can schedule the riskiest micro-steps (like constraint validation or the final cutover/rename/drop) during a lower-traffic period, while doing the long backfill earlier in the day with throttling.

4) Don’t overlap risky operations

Avoid running:

  • backfills,
  • index builds,
  • vacuum-heavy maintenance,
  • and schema changes

at the same time on the same hot tables. Overlap increases contention and can push you into LWLock/I/O bottlenecks even if locks are “fine.”


A practical pre-flight checklist before you schedule

  • □ Is the target table on a critical path and continuously written?
  • □ Do you have long-running transactions or “idle in transaction” sessions regularly?
  • □ Can you use CONCURRENTLY or NOT VALID to reduce blocking?
  • □ Have you set lock_timeout so you won’t become head-of-line?
  • □ What’s the rollback plan if it fails halfway?
  • □ For HA setups: can your replicas absorb the WAL (lag risk)?

References:


Takeaway

Maintenance windows are for operations that are inherently exclusive or rewrite-heavy. Online techniques are for operations you can decompose into short-lock steps plus throttled background work. The safest scheduling strategy is to choose the approach that gives you a predictable failure mode—then enforce that predictability with timeouts, phased rollouts, and “no overlap” discipline.

46. Incident Response: Step-by-Step When the Database Appears “Hung”

When PostgreSQL “hangs,” it’s usually not dead. It’s waiting—on locks, I/O, internal contention, or downstream dependencies. The fastest recoveries come from running a consistent playbook that answers, in order:

  1. Is it locks or something else?
  2. If it’s locks, what is the hot object and who is the root blocker?
  3. What is the least risky intervention that restores throughput?

References:


Step 0: Stabilize the system so you can diagnose

Before you dive into SQL:

  • Pause/slow any known deploys, migrations, or batch jobs if you control them.
  • If your application is retrying aggressively, reduce retry rate or shed load if possible (to stop the pileup from growing).

This isn’t “fixing the lock,” it’s preventing amplification so your actions have effect.


Step 1: Confirm whether you have a lock incident

Run:

SQLSELECT
  wait_event_type,
  count(*) AS sessions
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY wait_event_type
ORDER BY sessions DESC;

Interpretation:

  • Lots of wait_event_type = 'Lock' → classic lock storm.
  • Mostly LWLock* → internal contention.
  • Mostly I/O waits → storage/throughput issue.
  • Mostly Client → clients not sending/receiving (less likely “DB hung”).

Reference: wait events
https://www.postgresql.org/docs/current/monitoring-stats.html

If it’s not “Lock,” your lock playbook may not help; you need the wait-event-specific path. If it is “Lock,” continue.


Step 2: Identify the worst waiters (what’s piling up)

SQLSELECT
  pid,
  usename,
  application_name,
  state,
  now() - query_start AS query_age,
  wait_event_type,
  wait_event,
  left(query, 200) AS query_preview
FROM pg_stat_activity
WHERE datname = current_database()
  AND wait_event_type = 'Lock'
ORDER BY query_age DESC
LIMIT 50;

This shows:

  • how long sessions have been stuck,
  • what kinds of queries are waiting (often points to the hot table or DDL).

Step 3: Find the hot relation (the blast radius)

If it’s relation-lock driven, pg_locks will show many waiters on one table:

SQLSELECT
  relation::regclass AS relation,
  mode,
  count(*) AS waiting
FROM pg_locks
WHERE locktype = 'relation'
  AND NOT granted
GROUP BY 1, 2
ORDER BY waiting DESC, relation;

If you see one relation dominating, that’s your choke point.

Reference: pg_locks
https://www.postgresql.org/docs/current/view-pg-locks.html


Step 4: Identify the root blocker(s)

Use pg_blocking_pids() to jump from victims to blockers:

SQLWITH waiters AS (
  SELECT
    pid AS waiting_pid,
    unnest(pg_blocking_pids(pid)) AS blocking_pid
  FROM pg_stat_activity
  WHERE datname = current_database()
    AND wait_event_type = 'Lock'
)
SELECT
  b.pid,
  b.usename,
  b.application_name,
  b.client_addr,
  b.state,
  now() - b.xact_start  AS xact_age,
  now() - b.query_start AS query_age,
  left(b.query, 200) AS query_preview,
  count(*) AS blocked_sessions
FROM waiters w
JOIN pg_stat_activity b ON b.pid = w.blocking_pid
GROUP BY 1,2,3,4,5,6,7,8
ORDER BY blocked_sessions DESC, xact_age DESC NULLS LAST;

What to look for:

  • high blocked_sessions
  • old xact_age
  • state = 'idle in transaction' (very common root blocker)

Reference: pg_blocking_pids()
https://www.postgresql.org/docs/current/functions-info.html


Step 5: Determine if you’re seeing “head-of-line DDL” amplification

Now check if there’s a strong lock request queued (common amplifier):

SQLSELECT
  a.pid,
  a.usename,
  a.application_name,
  now() - a.query_start AS waiting_for,
  l.mode,
  l.relation::regclass AS relation,
  left(a.query, 200) AS query_preview
FROM pg_locks l
JOIN pg_stat_activity a ON a.pid = l.pid
WHERE l.locktype = 'relation'
  AND NOT l.granted
ORDER BY waiting_for DESC;

If you see a long-waiting AccessExclusiveLock request on a hot table, you likely have:

  • a root blocker (long transaction) plus
  • a head-of-line waiter (DDL) causing queue amplification.

Reference: lock modes
https://www.postgresql.org/docs/current/explicit-locking.html


Step 6: Choose the least risky intervention

Your options, from least to most disruptive:

A) Stop the amplifier (cancel the head-of-line DDL)

If a migration is waiting for AccessExclusiveLock and causing pileups, canceling it can immediately let compatible reads/writes proceed.

SQLSELECT pg_cancel_backend(<ddl_pid>);

Reference: admin functions
https://www.postgresql.org/docs/current/functions-admin.html

B) End the root blocker (terminate the session holding locks)

If the root blocker is idle in transaction or running an unacceptably long query, you may need:

SQLSELECT pg_terminate_backend(<blocking_pid>);

This rolls back its transaction and releases locks, usually restoring service quickly.

Reference: https://www.postgresql.org/docs/current/functions-admin.html

C) Reduce load while you work

If you can throttle background jobs, pause deploys, or shed non-critical traffic, do it—especially if you’re seeing multiple blockers or broad contention.


Step 7: Verify recovery (don’t stop at “we killed the blocker”)

After intervention, confirm the system is draining:

  • lock wait count drops:SQLSELECT count(*) FROM pg_stat_activity WHERE datname = current_database() AND wait_event_type = 'Lock';
  • throughput returns (sample pg_stat_database deltas):SQLSELECT xact_commit, xact_rollback FROM pg_stat_database WHERE datname = current_database();
  • pool returns to normal and p95/p99 latency drops (app metrics).

Reference: monitoring stats
https://www.postgresql.org/docs/current/monitoring-stats.html


Step 8: Capture evidence for the postmortem

Before everything disappears, capture:

  • top blockers and their queries (pg_stat_activity)
  • relevant lock snapshots (pg_locks)
  • whether idle in transaction was involved
  • the command that triggered the queue (often DDL)
  • timestamps and affected relations

This is what lets you fix the root cause instead of repeating the incident.


Takeaway

A “hung” Postgres is usually “waiting Postgres.” The fastest incident response is a disciplined sequence: confirm lock waits, find the hotspot relation, identify blockers via pg_blocking_pids(), remove the head-of-line amplifier if needed, terminate the true root blocker when necessary, and verify that waiters and latency are draining.

47. What to Kill First: Canceling Queries vs Terminating Sessions

When a lock incident is actively causing timeouts, you often have to intervene. In PostgreSQL you have two primary “stop it” tools:

  • Cancel the current querypg_cancel_backend(pid)
  • Terminate the sessionpg_terminate_backend(pid)

They are not interchangeable. The right choice depends on whether you’re dealing with:

  • a long-running query that’s safe to stop, or
  • a transaction that is holding locks/snapshots and won’t go away unless the session ends.

References:


The difference in one sentence

  • Cancel stops the currently running statement but keeps the session alive.
  • Terminate kills the session, rolling back any open transaction and releasing its locks.

Canceling queries: pg_cancel_backend(pid)

When cancel is the right first move

Cancel is ideal when the problem is an actively running statement that you want to stop with minimal collateral damage, for example:

  • a runaway SELECT doing a huge scan and holding resources,
  • a long-running report query that is blocking DDL (or contributing to contention),
  • a DDL statement that is waiting and acting as head-of-line (canceling it can remove queue amplification),
  • a batch update that you’d rather retry later in smaller chunks.

How to do it:

SQLSELECT pg_cancel_backend(<pid>);

What happens:

  • Postgres sends an interrupt to that backend.
  • If the backend is in a cancelable state, the current statement ends with an error (commonly “canceling statement due to user request”).

Important caveats:

  • If the session is inside an explicit transaction, canceling a statement typically leaves the transaction in an aborted state; the client must ROLLBACK before it can do more work.
  • Canceling does not necessarily release locks if the session remains in a transaction holding them. Some locks (especially those acquired by DDL or row locks held by an open transaction) may remain until transaction end.

Reference: admin functions and cancellation behavior
https://www.postgresql.org/docs/current/functions-admin.html


Terminating sessions: pg_terminate_backend(pid)

When termination is the right move

Termination is the “release the locks now” tool. It’s appropriate when the root blocker is:

  • idle in transaction (classic lock hostage)
  • holding row locks from an unfinished write transaction
  • holding a strong relation lock (e.g., DDL transaction left open)
  • a session that won’t respond to cancel quickly enough to restore service

How to do it:

SQLSELECT pg_terminate_backend(<pid>);

What happens:

  • The backend process is terminated.
  • Any open transaction is rolled back.
  • Locks held by that transaction/session are released.

This is often the fastest way to unstick a system, but it has higher risk:

  • rolled-back work (writes undone),
  • potential partial application workflows,
  • possible user-visible errors for whatever was using that session.

Reference: admin functions
https://www.postgresql.org/docs/current/functions-admin.html


A practical “what to kill first” decision tree

1) If there’s a queued ACCESS EXCLUSIVE DDL acting as head-of-line
  • Cancel the DDL waiter first (often lowest collateral damage).
  • This can immediately allow compatible traffic (reads/writes) to proceed again.

Why: the DDL may not have changed anything yet; it’s just waiting and amplifying the queue.

Then, separately, fix the root cause (often a long transaction) before retrying the migration.

2) If the root blocker is idle in transaction
  • Terminate the session.
  • Canceling often does nothing because there may be no running statement to cancel, and the transaction (and its locks) will remain.

Use pg_stat_activity.state and xact_start to confirm.

Reference: session states
https://www.postgresql.org/docs/current/monitoring-stats.html

3) If the root blocker is a long-running query (active)
  • Try cancel first.
  • If it doesn’t stop quickly enough (or keeps coming back), terminate.
4) If you’re unsure whether a session holds locks past statement end
  • Prefer terminate for true “lock hostage” situations (especially when you’ve identified it as the root blocker via pg_blocking_pids()).
  • Prefer cancel when you’re addressing the head-of-line waiter or a clear runaway statement and want to minimize rollback impact.

Reference: lock behavior and transaction scope
https://www.postgresql.org/docs/current/explicit-locking.html


How to avoid making things worse

Before you kill anything, confirm you’re targeting the right PID(s):

  • Use pg_blocking_pids() to identify blockers from victims.
    Reference: https://www.postgresql.org/docs/current/functions-info.html
  • Check pg_stat_activity for:
    • state (idle in transaction vs active),
    • xact_start age,
    • query text and application name.
  • If possible, kill the smallest blast radius first:
    • cancel head-of-line DDL,
    • terminate a single idle-in-transaction blocker,
    • avoid mass-killing many victims (it often just creates retry storms).

Takeaway

In lock incidents, “what to kill first” is about minimizing collateral damage while restoring throughput:

  • Cancel (pg_cancel_backend) is best for stopping a currently running statement—especially a head-of-line DDL waiter or runaway query.
  • Terminate (pg_terminate_backend) is best for ending lock-hostage sessions—especially idle in transaction—because it rolls back the transaction and releases locks immediately.

Use pg_blocking_pids() + pg_stat_activity to be sure you’re acting on the root blocker, not just the loudest victim.

48. Post-Incident Review: How to Prevent the Next Lock Freeze

The goal of a post-incident review isn’t to write a story about what happened. It’s to extract a small set of changes that make the next lock event:

  • less likely to occur, and
  • less damaging if it does occur.

For lock freezes, the highest-leverage fixes usually live in four places:

  1. transaction discipline,
  2. safer migrations and DDL practices,
  3. timeouts/guardrails,
  4. observability and runbooks.

References:


1) Reconstruct the blocking chain (root blocker vs head-of-line waiter)

For lock incidents, the most important postmortem artifact is:

  • Which PID was the root blocker?
  • Which statement was the head-of-line waiter (often DDL)?
  • Which relation(s) were the hotspots?

If you didn’t capture it live, it’s worth adding “snapshot locks and activity” to the incident runbook so you always have evidence next time (e.g., dumps of pg_stat_activitypg_locks, and a blocker summary query).

Reference: pg_stat_activity / pg_locks
https://www.postgresql.org/docs/current/monitoring-stats.html
https://www.postgresql.org/docs/current/view-pg-locks.html


2) Fix the #1 root cause: long-lived and idle-in-transaction sessions

If the root blocker was:

  • a long-running transaction,
  • or idle in transaction,

then the prevention work is usually clear:

  • make transactions shorter in the app,
  • ensure exception paths always rollback,
  • don’t keep transactions open across network calls or user think-time,
  • set a defensive database timeout:
SQLALTER DATABASE mydb SET idle_in_transaction_session_timeout = '60s';

This converts “infinite lock hostage” into “bounded damage.”

Reference: idle_in_transaction_session_timeout
https://www.postgresql.org/docs/current/runtime-config-client.html


3) Make migrations fail fast instead of queueing

If a migration/DDL step was involved (common), standardize a migration policy:

  • Every migration session sets:
    • lock_timeout (short)
    • statement_timeout (reasonable)
  • Prefer online primitives:
    • CREATE INDEX CONCURRENTLY
    • REINDEX CONCURRENTLY (when supported)
    • NOT VALID constraints + later validation
    • expand/backfill/contract

A concrete standard:

SQLBEGIN;
SET LOCAL lock_timeout = '2s';
SET LOCAL statement_timeout = '60s';
-- small DDL step
COMMIT;

References:


4) Add backpressure and sane retry behavior in the application

If the incident involved retry storms or pool exhaustion, the fix is often in the app:

  • treat lock timeouts and deadlocks as retryable,
  • use bounded exponential backoff with jitter,
  • cap concurrency and isolate background work into separate pools,
  • fail fast when the pool is exhausted rather than letting threads pile up.

Postgres reference for SQLSTATE codes (deadlock, serialization, etc.):
https://www.postgresql.org/docs/current/errcodes-appendix.html


5) Reduce the blast radius of “strong lock” operations

If the hotspot table is central and huge, consider structural changes:

  • partitioning (so maintenance and some locks are per-partition),
  • moving heavy reporting workloads off the primary,
  • redesigning workflows to avoid large rewrites.

Reference: partitioning overview
https://www.postgresql.org/docs/current/ddl-partitioning.html


6) Observability upgrades that pay off next time

Make sure you can answer these quickly during the next incident:

  • How many sessions are waiting on locks right now?
  • Which relation is the hotspot?
  • Who are the top blockers, and how old are their transactions?
  • Is there a queued AccessExclusiveLock?

Practical upgrades:

  • ensure you collect pg_stat_activity and lock snapshots during incidents,
  • enable and use pg_stat_statements for recurring “slow/blocked query” patterns (if feasible),
  • log slow statements (and consider logging lock waits via log_lock_waits).

Reference: runtime logging settings (including lock wait logging)
https://www.postgresql.org/docs/current/runtime-config-logging.html
pg_stat_statementshttps://www.postgresql.org/docs/current/pgstatstatements.html


7) Turn the incident into a checklist change

The most effective postmortems end with concrete, testable changes like:

  • “All migrations must set lock_timeout and run non-transactional steps for concurrent index builds.”
  • “We enforce idle_in_transaction_session_timeout in production.”
  • “We added pool isolation: background jobs can’t consume OLTP connections.”
  • “We added an on-call query pack to identify blockers in <2 minutes.”

This is how you reduce recurrence, not by hoping people remember.


Takeaway

To prevent the next lock freeze, focus on the systemic amplifiers: long-lived transactions (especially idle in transaction), migrations that wait forever for locks, and application retry storms that turn blocking into outages. Add guardrails (timeouts), adopt online migration patterns (expand/backfill/contract, concurrently, NOT VALID), and improve observability and runbooks so diagnosis and recovery are fast and repeatable.

49. Checklist: Pre-Flight Questions Before Running DDL

Use this checklist before you run any DDL on a production (or production-like) PostgreSQL system—especially on hot tables. It’s designed to prevent the two most common failure modes:

  • lock queue amplification (your DDL waits and stalls everyone), and
  • long lock hold (your DDL gets the lock and blocks traffic longer than expected).

References:


Scope and risk

  • □ What exact objects will this touch (tables, indexes, partitions, referenced tables via FKs, views)?
  • □ Is the target relation on a critical path (high QPS, heavy writes, core endpoints)?
  • □ Does this DDL rewrite the table or do a full scan/validation (high downtime risk)?

Locking expectations

Safety guardrails

Online alternatives and phased rollouts

Operational timing and coordination

  • □ Is this being run during an appropriate traffic window (or with a clear abort plan)?
  • □ Are other risky operations running at the same time (backfills, large batch jobs, heavy maintenance)?
  • □ In HA setups, have you considered replication lag and replica schema visibility?

Observability and rollback

Application readiness

  • □ Is the app compatible with the new schema (and the old schema during rollout)?
  • □ Are retries/backoff sane so transient lock timeouts don’t cause a retry storm?
  • □ Are you sure no code relies on the thing you’re about to drop?

Minimal “pre-flight” SQL snippet (optional)

If you want a tiny set of checks you can run right before DDL:

SQL-- 1) Look for long/idle transactions (common root blockers)
SELECT pid, usename, application_name, state,
       now() - xact_start AS xact_age,
       left(query, 200) AS query_preview
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start ASC
LIMIT 20;
-- 2) Confirm whether there is already lock contention
SELECT count(*) AS lock_waiters
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';

References:
https://www.postgresql.org/docs/current/monitoring-stats.html


Takeaway

If you can answer these pre-flight questions—especially around lock mode, table rewrites, timeouts, and long transactions—you’ll avoid the majority of production DDL lock incidents. The checklist is intentionally conservative: it’s cheaper to abort a migration than to recover from a lock-driven outage.

50. FAQ: Common Misconceptions About PostgreSQL “Stop-the-World” Locks

Below are the misconceptions that most often cause teams to get surprised by lock incidents in PostgreSQL—and the reality behind each one.

References (general):


“Postgres has lock escalation like SQL Server / Oracle”

No. PostgreSQL does not do classic automatic lock escalation from many row locks to a table lock. When it feels like escalation, it’s usually one of:

  • a strong table lock request (often DDL) entering the queue,
  • long transactions holding locks open,
  • lock queue amplification,
  • or heavy row contention causing widespread waits.

Reference: locking behavior
https://www.postgresql.org/docs/current/explicit-locking.html


“MVCC means reads never block writes”

MVCC means plain reads (SELECT) usually don’t block writes (INSERT/UPDATE/DELETE) on the same rows. But reads still take relation locks (ACCESS SHARE) and can participate in conflicts with DDL and table-rewrite operations.

Also, reads that explicitly lock rows (SELECT ... FOR UPDATE/SHARE) absolutely can block writers.

Reference: MVCC overview and explicit locking
https://www.postgresql.org/docs/current/mvcc.html
https://www.postgresql.org/docs/current/explicit-locking.html


“If I’m not running DDL, I can’t cause a stop-the-world event”

You can. The most common non-DDL trigger is a long-running transaction, especially one that becomes idle in transaction. It can:

  • hold row locks indefinitely,
  • block DDL lock acquisition,
  • prevent vacuum cleanup,
  • and enable queue pileups when a strong lock request shows up later.

Reference: monitoring sessions and transaction behavior
https://www.postgresql.org/docs/current/monitoring-stats.html
https://www.postgresql.org/docs/current/mvcc.html


“The query holding ACCESS EXCLUSIVE is always the culprit”

Sometimes, but not always.

Often the real problem is:

  • a session holding a weaker lock for a long time (root blocker), plus
  • a DDL statement waiting for ACCESS EXCLUSIVE (head-of-line waiter), plus
  • many sessions queueing behind that request.

Canceling the waiting DDL may immediately restore traffic, even though it wasn’t “holding” the lock yet.

Reference: lock conflict rules and lock wait visibility
https://www.postgresql.org/docs/current/explicit-locking.html
https://www.postgresql.org/docs/current/view-pg-locks.html


“If I don’t see many locks in pg_locks, it’s not a lock problem”

Not necessarily.

  • You need to look specifically for waiters (granted = false), not total lock rows.
  • Some contention shows up as transactionid waits rather than obvious relation locks.
  • And some “hangs” aren’t heavyweight locks at all—they’re LWLocks, I/O waits, or client/network waits, which won’t appear the same way in pg_locks.

References:
https://www.postgresql.org/docs/current/view-pg-locks.html
https://www.postgresql.org/docs/current/monitoring-stats.html


“CREATE INDEX CONCURRENTLY means ‘no locks’”

No. It means “minimizes blocking of normal reads/writes,” not “lock-free.”

Concurrent builds still take locks at different phases and can still wait behind conflicting DDL. They are usually much safer for production, but you still want timeouts and monitoring.

Reference: CREATE INDEX CONCURRENTLY
https://www.postgresql.org/docs/current/sql-createindex.html


“Adding a column is always safe”

Adding a nullable column with no default is usually a fast metadata change. But:

  • it can still require a strong lock to change the table definition,
  • and adding a column with a default or enforcing NOT NULL can be much riskier depending on version and the operation.

Reference: ALTER TABLE
https://www.postgresql.org/docs/current/sql-altertable.html


“TRUNCATE is just a fast DELETE”

It’s fast, but it’s not “just a delete.”

TRUNCATE takes an ACCESS EXCLUSIVE lock on the table(s), which blocks reads and writes. With CASCADE, it can lock and truncate additional referencing tables—expanding blast radius quickly.

Reference: TRUNCATE
https://www.postgresql.org/docs/current/sql-truncate.html


“If something is blocked, Postgres will eventually resolve it like a deadlock”

Deadlocks are cycles and Postgres detects them and aborts a participant. Lock blocking is usually a queue, not a cycle, and can last indefinitely if the blocker doesn’t finish.

So “wait long enough” is not a strategy if the blocker is idle in transaction or a migration is queued behind a long-running query.

Reference: deadlocks and explicit locking
https://www.postgresql.org/docs/current/explicit-locking.html


“Killing random waiting queries will fix it”

Usually it won’t. Killing victims often creates a retry storm and doesn’t remove the root cause.

The effective fix is typically:

  • cancel the head-of-line DDL waiter (if it’s amplifying), and/or
  • terminate the root blocker holding locks or holding a transaction open.

Reference: admin functions
https://www.postgresql.org/docs/current/functions-admin.html


“If CPU is low, Postgres must be fine”

Low CPU during an outage is a classic sign of waiting. Lock waits don’t burn CPU. The database can be “healthy” in resource metrics while being completely unusable for your workload.

Use wait events and lock views to confirm what you’re waiting on.

Reference: pg_stat_activity wait events
https://www.postgresql.org/docs/current/monitoring-stats.html


“Setting timeouts is dangerous; it’ll just cause errors”

Timeouts do cause errors—and that’s the point. They convert “infinite waiting that takes down the whole system” into “bounded failures that your app/migration can retry safely.”

The tradeoff is that your application must handle these errors intentionally (retries with backoff/jitter for idempotent work, and clean rollbacks).

Reference: timeouts (lock_timeoutstatement_timeoutidle_in_transaction_session_timeout)
https://www.postgresql.org/docs/current/runtime-config-client.html


Takeaway

Most PostgreSQL “stop-the-world” lock incidents come from misconceptions: assuming MVCC eliminates blocking, assuming Postgres escalates locks, assuming the waiting DDL is always the culprit, or assuming resource metrics reflect lock health. The reliable approach is to observe waits (pg_stat_activity), confirm lock types/modes (pg_locks), identify root blockers (pg_blocking_pids()), and enforce guardrails (timeouts + safe migration patterns).

51. Conclusion: Designing for Zero-Surprise Locking in PostgreSQL

PostgreSQL isn’t “bad at concurrency.” Most of the time it’s excellent at it—MVCC lets reads and writes coexist in ways that feel almost magical. The outages happen when a small number of predictable conditions line up:

  • a hot relation that everyone touches,
  • a long-running or idle-in-transaction session,
  • a strong lock request (often ACCESS EXCLUSIVE) that enters the queue,
  • and an application/pool that amplifies waiting into timeouts and retries.

None of that is random. The goal of “zero-surprise locking” is to make these events boring: visible early, bounded by guardrails, and avoidable through standard practices.

References:


The core design principles

If you remember only a few principles, make them these:

  1. Keep transactions short.
    Long transactions—and especially idle in transaction—are the silent enablers of lock freezes. They hold locks and snapshots open and make every DDL operation unpredictable.
    Reference: https://www.postgresql.org/docs/current/mvcc.html
  2. Assume hot-table DDL is dangerous until proven otherwise.
    Many schema changes need strong locks, and the wait to acquire them is often the real outage.
    Reference: https://www.postgresql.org/docs/current/sql-altertable.html
  3. Engineer failure modes with timeouts.
    lock_timeout prevents queue amplification; statement_timeout bounds total damage; idle_in_transaction_session_timeout prevents indefinite lock hostages.
    Reference: https://www.postgresql.org/docs/current/runtime-config-client.html
  4. Use online and phased techniques by default.
    Prefer CREATE INDEX CONCURRENTLYREINDEX CONCURRENTLYNOT VALID constraints + later validation, and expand/backfill/contract migrations.
    References:
  1. Make the application part of the solution.
    Bounded retries with jitter and real backpressure prevent lock incidents from turning into retry storms and pool exhaustion.

What “zero-surprise” looks like operationally

In a mature setup:

  • migrations either succeed quickly or fail quickly (never “hang for 20 minutes”),
  • one bad session can’t hold locks indefinitely,
  • on-call can identify the root blocker in minutes using pg_stat_activitypg_locks, and pg_blocking_pids(),
  • dangerous operations are either decomposed (online) or scheduled intentionally (window),
  • and post-incident reviews produce concrete guardrails, not folklore.

References:


The final takeaway

Stop-the-world moments in PostgreSQL are rarely mysterious. They’re the result of understandable lock rules interacting with real-world deployment and traffic patterns. If you design around those rules—short transactions, safe migrations, strict timeouts, good observability, and sane app backpressure—you can run PostgreSQL at high concurrency with very few “surprise freezes,” and when they do happen, you’ll have a predictable way to unwind them.

52. Further Reading: Official Docs and Deep Dives

Official PostgreSQL documentation


Deep dives and ecosystem resources


Suggested “study path” if you want mastery

  1. Read Explicit Locking and understand ACCESS EXCLUSIVE + lock conflicts.
    https://www.postgresql.org/docs/current/explicit-locking.html
  2. Learn incident inspection with pg_stat_activity and pg_locks.
    https://www.postgresql.org/docs/current/monitoring-stats.html
    https://www.postgresql.org/docs/current/view-pg-locks.html
  3. Learn migration-safe primitives: CONCURRENTLYNOT VALID, and timeouts.
    https://www.postgresql.org/docs/current/sql-createindex.html
    https://www.postgresql.org/docs/current/sql-altertable.html
    https://www.postgresql.org/docs/current/runtime-config-client.html