Should I use nodetool setcompactionthroughput or edit cassandra.yaml?

Use nodetool setcompactionthroughput for tuning: it applies live, needs no restart, and is trivial to revert. Edit cassandra.yaml only to persist the chosen value across restarts, and roll it out through config management with serial: 1 so nodes never restart together.

Did the parameter name change between Cassandra 4.0 and 4.1?

Yes. compaction_throughput_mb_per_sec was renamed compaction_throughput in 4.1 and now takes a size string such as 64MiB/s in cassandra.yaml. Automation that writes the YAML must branch on version, but nodetool setcompactionthroughput accepts a plain integer of MB/s on both 4.x and 5.x.

What value should I set compaction throughput to?

Keep it at or below 50% of the data volume's sustained sequential write throughput and approach the target in increments. Setting 0 disables throttling entirely and should never be used in production without explicit capacity validation, since it lets compaction saturate the disk and starve foreground reads and writes.

Why does raising throughput not clear my compaction backlog?

A large pending-tasks count is usually structural SSTable accumulation driven by strategy choice or tombstone load, not I/O throttling. If the queue will not drain even at a safe ceiling, address the backlog and strategy directly rather than pushing throughput higher, which only increases write amplification.

Is it safe to tune throughput while a repair is running?

No. Compaction and anti-entropy repair share the same I/O scheduler, and running them together at a high throughput ceiling risks OutOfMemoryError and SocketTimeoutException. Confirm nodetool netstats shows NORMAL mode with no sending or receiving streams before adjusting, since NORMAL alone does not mean the node is idle.

How to Tune `compaction_throughput_mb_per_sec` Safely

In Apache Cassandra 4.x and 5.x, background compaction remains the dominant consumer of disk I/O bandwidth. The compaction_throughput_mb_per_sec parameter — renamed compaction_throughput in 4.1, where it now takes a size string such as 64MiB/s — dictates the aggregate write throughput allocated to compaction threads across every keyspace on a node. Set it too high during a backlog or an active repair and you starve foreground reads and writes; set it too low and SSTables accumulate until the read path degrades. This page gives you a copy-paste, idempotent workflow to move that value in production without destabilizing the deployment or corrupting storage state. It assumes Cassandra 4.0, 4.1, or 5.0, nodetool on PATH, and a node you can reach over JMX. It sits under Compaction Error Categorization & Logging; read that first so you can tell a transient I/O stall apart from a structural failure before you touch the throttle.

Throughput tuning is never a standalone knob. The right ceiling depends on the compaction strategy in play — STCS, LCS, or TWCS — because each generates a different write-amplification profile, and on whether tombstone accumulation is inflating the bytes each compaction must rewrite. Treat the number below as a target you approach in increments, not a value you slam in once.

Pre-conditions & safety gates

Blindly increasing throughput during an active compaction backlog or a concurrent repair guarantees I/O starvation. Every gate below must pass before you adjust anything. Each lists the runnable command, the output to expect, and the rollback path if the gate fails.

Gate 1 — Compaction queue health

nodetool compactionstats -H

Safety check: Inspect pending tasks and the per-row completed/total columns. Abort if pending tasks are high relative to active compactions, or if the remaining bytes (sum of total - completed across rows) cannot drain within a few hours at the current throughput. A high pending count is structural SSTable accumulation, not I/O throttling — raising throughput will not fix it.

Expected output:

pending tasks: 12
id                                   compaction type  keyspace  table    completed     total          unit   progress
a1b2c3d0-1f2e-11ef-9a3b-0f1e2d3c4b5a  Compaction       ks1       events   1073741824    4294967296     bytes  25.00%
b2c3d4e1-1f2e-11ef-9a3b-0f1e2d3c4b5a  Compaction       ks1       sessions 536870912     1610612736     bytes  33.33%
Active compaction remaining time :   0h05m12s

On failure: Do not proceed. Investigate tombstone ratios with nodetool tablestats (nodetool cfstats is deprecated on 4.x and removed on 5.x) or trigger a targeted nodetool cleanup on over-provisioned nodes. On 5.0 you can cross-check the same state from the system_views.sstable_tasks virtual table.

Gate 2 — Active repair & streaming state

nodetool netstats

Safety check: Verify Mode: NORMAL and that the output contains both Not sending any streams. and Not receiving any streams.. Mode: NORMAL alone does not mean the node is idle — anti-entropy repair streams can run while the node reports NORMAL. Compaction and repair share the same I/O scheduler; running them together invites java.lang.OutOfMemoryError or java.net.SocketTimeoutException.

Expected output:

Mode: NORMAL
Not sending any streams.
Not receiving any streams.

On failure: Defer tuning until nodetool repair -pr completes, or terminate in-flight sessions via the JMX StorageService.forceTerminateAllRepairSessions operation. There is no nodetool repair --abort.

Gate 3 — Storage subsystem saturation

iostat -x 1 5 | grep -E "^Device|^nvme|^sd"

Safety check: Ensure await < 20ms and %util < 75% on the data volume. Sustained %util > 85% indicates physical disk saturation; raising throughput will only amplify write amplification and latency.

Expected output:

Device   rrqm/s wrqm/s   r/s   w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1    0.00   0.00  12.0  45.0  120.0  890.0    35.41     0.12   2.10    1.05    2.50  0.80  4.50

On failure: Halt tuning. Investigate filesystem fragmentation, RAID controller cache policies, or migrate data to NVMe-backed volumes.

Gate 4 — Log baseline correlation

Cross-reference system.log and debug.log for CompactionExecutor saturation, DiskFull warnings, or SSTableRewriter stalls. A transient I/O stall differs fundamentally from structural corruption, and only the parent guide’s failure taxonomy tells them apart deterministically. If any line resolves to a corruption or disk-full category rather than throttling, stop here and remediate that first.

Implementation

With every gate green, apply the change. Prefer the live nodetool path for agility; use the cassandra.yaml path only when you want the value to survive a restart.

Live throughput adjustment

nodetool setcompactionthroughput 128   # value is MB/s on both 4.x and 5.x
nodetool getcompactionthroughput        # always read it back; the setter is silent on success

Safety check: Keep the new value at or below 50% of the disk’s sustained sequential write throughput. Never set 0 (unlimited) in production without explicit capacity validation.

Expected output:

Current compaction throughput: 128 MB/s

Rollback: Revert immediately with nodetool setcompactionthroughput 64 (or your previous baseline) and watch compactionstats for 60 seconds to confirm the queue stabilizes.

Persistent configuration (rolling restart required)

# cassandra.yaml — Cassandra 4.0
compaction_throughput_mb_per_sec: 128

# cassandra.yaml — Cassandra 4.1 / 5.0 (renamed, size-string form)
compaction_throughput: 128MiB/s

Safety check: Apply via configuration management (Ansible/Puppet) with serial: 1 so nodes never restart simultaneously. Verify checksums match across nodes before rollout.
Expected output: The node restarts cleanly and nodetool getcompactionthroughput confirms Current compaction throughput: 128 MB/s.
Rollback: Revert the YAML value, restart the node, and watch for java.lang.RuntimeException: Unable to acquire compaction semaphore during startup.

Idempotent Python tuner with auto-rollback

The script below re-runs the safety gates, applies the adjustment, validates the post-state, and reverts automatically if a threshold breaches. It is safe to run repeatedly: it never pushes a value outside its hard bounds and treats a stable post-state as a no-op success.

#!/usr/bin/env python3
# requirements: Python 3.10+, Cassandra 4.0/4.1/5.0 with nodetool on PATH.
"""Idempotent compaction-throughput tuner for Cassandra 4.x/5.x.

Enforces safety boundaries, validates I/O and repair state, applies the change
via nodetool, and rolls back automatically if the post-state is not stable.
"""
import re
import subprocess
import sys
import time

NODETOOL = "/opt/cassandra/bin/nodetool"
MAX_SAFE_THROUGHPUT_MB = 256
MIN_SAFE_THROUGHPUT_MB = 16


def run_cmd(cmd: str, timeout: int = 30) -> tuple[int, str]:
    """Execute a shell command with strict error handling."""
    try:
        result = subprocess.run(
            cmd.split(), capture_output=True, text=True, timeout=timeout, check=True
        )
        return 0, result.stdout.strip()
    except subprocess.CalledProcessError as exc:
        return exc.returncode, exc.stderr.strip()
    except subprocess.TimeoutExpired:
        return 1, "Command timed out"


def validate_pre_flight() -> bool:
    """Re-check compaction queue and repair/streaming state before adjusting."""
    rc, out = run_cmd(f"{NODETOOL} compactionstats -H")
    if rc != 0:
        print(f"[FAIL] nodetool compactionstats failed: {out}")
        return False

    # Parse the real compactionstats format: a "pending tasks: N" line, then one
    # row per active compaction. Abort when pending work dwarfs what is draining.
    pending_match = re.search(r"pending tasks:\s*(\d+)", out)
    pending = int(pending_match.group(1)) if pending_match else 0
    active = len(re.findall(r"\bCompaction\b", out))
    if pending > 16 and pending > 4 * max(active, 1):
        print(f"[ABORT] Backlog too high: {pending} pending, {active} active")
        return False

    rc, out = run_cmd(f"{NODETOOL} netstats")
    # Mode: NORMAL alone does NOT mean idle; confirm no active streams explicitly.
    streaming_idle = (
        "Not sending any streams." in out and "Not receiving any streams." in out
    )
    if "Mode: NORMAL" not in out or not streaming_idle:
        print("[ABORT] Node not NORMAL or active streaming detected")
        return False

    print("[OK] Pre-flight validation passed.")
    return True


def apply_throughput(value: int, rollback_value: int) -> bool:
    """Apply a new throughput, validate the post-state, roll back on failure."""
    if not (MIN_SAFE_THROUGHPUT_MB <= value <= MAX_SAFE_THROUGHPUT_MB):
        print(f"[FAIL] {value} outside [{MIN_SAFE_THROUGHPUT_MB}-{MAX_SAFE_THROUGHPUT_MB}]")
        return False

    rc, out = run_cmd(f"{NODETOOL} setcompactionthroughput {value}")
    if rc != 0:
        print(f"[FAIL] Adjustment failed: {out}")
        return False

    # Idempotency guard: the setter is silent, so read the value back to confirm.
    rc, readback = run_cmd(f"{NODETOOL} getcompactionthroughput")
    if rc != 0 or str(value) not in readback:
        print(f"[ROLLBACK] Read-back mismatch. Reverting to {rollback_value} MB/s")
        run_cmd(f"{NODETOOL} setcompactionthroughput {rollback_value}")
        return False

    # Post-flight window: let the scheduler settle, then confirm the queue is sane.
    time.sleep(15)
    rc, out = run_cmd(f"{NODETOOL} compactionstats -H")
    if rc == 0 and "pending tasks:" in out:
        print("[OK] Throughput applied; post-state stable.")
        return True

    print(f"[ROLLBACK] Post-validation failed. Reverting to {rollback_value} MB/s")
    run_cmd(f"{NODETOOL} setcompactionthroughput {rollback_value}")
    return False


def main() -> None:
    target = int(sys.argv[1]) if len(sys.argv) > 1 else 128
    baseline = 64  # Previous value; fetch from config management in real runs.

    if not validate_pre_flight():
        sys.exit(1)
    if not apply_throughput(target, baseline):
        sys.exit(1)

    print("[COMPLETE] Tuning cycle finished. Monitor the queue for 300s.")


if __name__ == "__main__":
    main()

Verification steps

After applying the value, confirm it took effect and that the node is draining rather than choking.

# 1. The node reports the new ceiling.
nodetool getcompactionthroughput
# Expected: Current compaction throughput: 128 MB/s

# 2. Pending tasks trend down over ~5 minutes rather than climbing.
watch -n 30 'nodetool compactionstats -H | grep "pending tasks:"'
# Expected: a monotonically non-increasing count

# 3. Disk await stays inside budget under the new load.
iostat -x 5 3 | grep -E "^nvme|^sd"
# Expected: await < 20ms, %util < 85%

Feed the same signals into continuous telemetry: track org.apache.cassandra.metrics:type=Compaction,name=PendingTasks and disk await in Prometheus/Grafana. If pending tasks fall below 5 while await stays under 10ms, the adjustment is optimal. If await climbs above 25ms or foreground write latency rises more than 15%, back the value off in 25% increments. Wiring these MBeans into a poller is covered end to end in Python monitoring for Cassandra compaction.

The safe-tuning procedure is an iterative loop, summarized below.

Troubleshooting

java.lang.OutOfMemoryError / java.net.SocketTimeoutException shortly after raising throughput. Root cause: compaction and a repair stream contended for the same I/O scheduler because Gate 2 was skipped. Fix: nodetool setcompactionthroughput 16 to restore foreground priority, let the repair finish, then re-tune from a genuinely idle node. Never raise throughput while netstats shows active streams.
CorruptSSTableException during or after the change. Root cause: a compaction thread was killed with kill -9 (or the node crashed) mid-write, leaving a partial SSTable — not a symptom of the throughput value itself. Fix: never kill -9 a draining compaction; let it complete, then run nodetool verify -e on the affected keyspace and escalate to nodetool scrub only if verify confirms unrecoverable corruption.
java.lang.RuntimeException: Unable to acquire compaction semaphore at startup after a cassandra.yaml edit. Root cause: a malformed or out-of-range persistent value (for example a plain integer where 4.1/5.0 expects a MiB/s size string). Fix: revert the YAML to the last known-good value, restart, and confirm the correct form for the release — integer-MB on 4.0, size-string on 4.1 and 5.0.

If dynamic tuning has already triggered I/O starvation or heap pressure, run the recovery in order: throttle to 16 MB/s, let pending compactions drain naturally, run nodetool verify to confirm SSTable integrity, revert cassandra.yaml to baseline via config management, then capture system.log, gc.log, and iostat snapshots for root-cause analysis against the strategy in play. A backlog that will not drain even at a safe ceiling is a structural problem — see resolving high compaction backlog without downtime rather than pushing throughput higher.

Compaction Error Categorization & Logging — the parent guide that classifies each compaction failure signature so you know whether throttling is even the right lever.
Interpreting nodetool compactionstats output — decode every column the gates above rely on and turn it into a health verdict.
Python monitoring for Cassandra compaction — feed pending-task and await metrics into Prometheus and Grafana for closed-loop tuning.
Advanced Compaction Strategy Tuning & Monitoring — the parent section on strategy selection, tuning, and observability end to end.

Back to Compaction Error Categorization Logging

How to Tune compaction_throughput_mb_per_sec Safely

How to Tune `compaction_throughput_mb_per_sec` Safely