Interpreting nodetool compactionstats Output: Production Automation & Threshold Validation for Cassandra 4.x/5.x

Compaction telemetry is the primary signal for Cassandra storage health, yet raw polling of nodetool compactionstats in automated pipelines frequently triggers I/O starvation, JMX timeouts, or cascading disk pressure. nodetool compactionstats emits human-readable text (a pending tasks: N line followed by a table of active compactions); there is no JSON mode, so production automation must parse that text deterministically (or read the JMX org.apache.cassandra.metrics:type=Compaction MBeans) with strict validation and explicit rollback paths. This guide details how to safely parse compaction state, map metrics to operational thresholds, and integrate telemetry with repair workflows and strategy tuning pipelines.

1. Deterministic Extraction & Idempotent Polling

Production environments demand non-destructive, state-aware polling. The following Python routine enforces node health verification, disk pressure thresholds, and atomic state tracking to prevent duplicate processing during high-load windows.

#!/usr/bin/env python3
"""
Deterministic compactionstats extractor for Cassandra 4.x/5.x.
Safety: Node state validation, disk pressure guardrails, atomic state files.
Expected Output: Parsed compaction-state dict or None on failure.
Rollback Path: Clears corrupted state, falls back to safe exit, logs diagnostic trace.
"""
import json
import re
import subprocess
import sys
import os
import hashlib
from pathlib import Path
from typing import Optional

STATE_DIR = Path("/var/lib/cassandra/automation/compaction_state")
STATE_FILE = STATE_DIR / "compactionstats_last_run.sha256"
DISK_THRESHOLD = 0.85
NODETOOL_TIMEOUT = 30
LOCAL_ADDR = "127.0.0.1"  # local node's listen/broadcast address

PENDING_RE = re.compile(r"^pending tasks:\s*(\d+)", re.MULTILINE)
ACTIVE_TYPES = {"Compaction", "Validation", "Cleanup", "Scrub",
                "Upgradesstables", "Index build"}

class IdempotencyError(RuntimeError):
    """Raised when state could not be persisted (distinct from a no-op match)."""

def verify_node_state() -> bool:
    """Safety: Ensure the local node is Up/Normal (UN) before invocation."""
    try:
        proc = subprocess.run(
            ["nodetool", "status"],
            capture_output=True, text=True, timeout=15
        )
    except (subprocess.SubprocessError, OSError):
        return False
    if proc.returncode != 0:
        return False
    # Scope the UN check to the local node's row, not any node in the cluster.
    for line in proc.stdout.splitlines():
        cols = line.split()
        if len(cols) >= 2 and cols[1] == LOCAL_ADDR:
            return cols[0] == "UN"
    return False

def verify_disk_pressure() -> bool:
    """Safety: Halt polling if data directory exceeds threshold."""
    try:
        stat = os.statvfs("/var/lib/cassandra")
        used = 1.0 - (stat.f_bavail / stat.f_blocks)
        return used < DISK_THRESHOLD
    except OSError:
        return False

def parse_compactionstats(text: str) -> dict:
    """Parse the real `nodetool compactionstats` text columns:
    id  compaction type  keyspace  table  completed  total  unit  progress."""
    match = PENDING_RE.search(text)
    pending = int(match.group(1)) if match else 0

    active = []
    for line in text.splitlines():
        cols = line.split()
        # Skip header/blank/non-row lines; the type token follows the id.
        if len(cols) >= 8 and cols[1] in ACTIVE_TYPES:
            try:
                completed = int(cols[-4])
                total = int(cols[-3])
            except ValueError:
                continue
            active.append({
                "id": cols[0],
                "compaction_type": cols[1],
                "keyspace": cols[2],
                "table": cols[3],
                "completed": completed,
                "total": total,
                "unit": cols[-2],
                "progress": cols[-1],
            })
    return {"pending": pending, "active": active}

def fetch_compactionstats() -> Optional[dict]:
    """Idempotent fetch of the text output with structural validation."""
    if not verify_node_state() or not verify_disk_pressure():
        return None

    try:
        proc = subprocess.run(
            ["nodetool", "compactionstats"],
            capture_output=True, text=True, timeout=NODETOOL_TIMEOUT
        )
    except subprocess.TimeoutExpired:
        return None
    except (subprocess.SubprocessError, OSError):
        return None
    if proc.returncode != 0:
        return None
    if "pending tasks:" not in proc.stdout:
        return None
    return parse_compactionstats(proc.stdout)

def enforce_idempotency(payload: dict) -> bool:
    """Persist a snapshot digest. Returns True if this snapshot was already
    seen (no-op), False if it is new and was recorded. Raises IdempotencyError
    if the state could not be written, so callers do not confuse a write
    failure with a successful no-op."""
    STATE_DIR.mkdir(parents=True, exist_ok=True)
    digest = hashlib.sha256(json.dumps(payload, sort_keys=True).encode()).hexdigest()
    if STATE_FILE.exists() and STATE_FILE.read_text().strip() == digest:
        return True
    # Atomic write with rollback on failure
    tmp = STATE_FILE.with_suffix(".tmp")
    try:
        tmp.write_text(digest)
        tmp.replace(STATE_FILE)
    except OSError as exc:
        # Rollback: remove partial state to force fresh evaluation next cycle
        if tmp.exists():
            tmp.unlink()
        raise IdempotencyError("failed to persist compaction state") from exc
    return False

if __name__ == "__main__":
    data = fetch_compactionstats()
    if data is None:
        sys.exit(1)
    try:
        already_seen = enforce_idempotency(data)
    except IdempotencyError:
        sys.exit(1)
    if already_seen:
        sys.exit(0)
    print(json.dumps(data, indent=2))

Safety Checks: Local-node UN status verification, os.statvfs disk fraction calculation, subprocess timeout enforcement, and atomic file replacement via tmp.replace(). Expected Output: Parsed compaction-state dict printed as JSON to stdout, or silent exit (0 for idempotent match, 1 for failure/skip). Rollback Path: On atomic write failure, the .tmp file is purged and IdempotencyError is raised so the caller exits non-zero rather than treating the failure as a successful no-op.

2. Output Decoding & Operational Mapping

The text output begins with a pending tasks: N line, then a table whose columns are id compaction type keyspace table completed total unit progress. Each field maps directly to storage subsystem behavior.

Field Operational Meaning Alert Threshold
pending tasks Queue depth awaiting the compaction executor > 20 indicates backpressure
compaction type Compaction, Cleanup, Scrub, Upgradesstables, Validation, Index build Index build spikes during schema changes
total Total work to process (in unit) for this compaction Baseline for progress calc
completed Work already merged (in unit) progress = completed / total
progress Percent string emitted by nodetool (e.g. 42.10%) Stalled value across polls signals a stuck compaction

Safety Checks: Always validate total > 0 before calculating progress. Guard against division-by-zero on newly queued tasks. Throughput is not reported per-row by compactionstats; derive it from BytesCompacted JMX deltas or nodetool getcompactionthroughput for the configured limit. Expected Output: Normalized progress percentage and queue depth metrics ready for Prometheus/Grafana ingestion. Rollback Path: If progress stalls at identical completed values across three consecutive polling cycles, trigger emergency compaction suspension via nodetool disableautocompaction and escalate to on-call.

The following tree maps a compactionstats reading to an operational decision:

flowchart TD READ["Read nodetool compactionstats"] --> PEND{"Pending tasks high (over 20)"} PEND -->|"yes"| THR["Throttle throughput"] PEND -->|"no"| PROG{"Progress rate low or stalled"} PROG -->|"stalled with pending"| INV["Investigate disk I/O"] PROG -->|"low but moving"| BOOST["Boost concurrent compactors"] PROG -->|"healthy"| OK["No action needed"]
Mapping nodetool compactionstats output to a decision

3. Threshold Validation & Async Tracking Integration

Modern Cassandra deployments rely on asynchronous compaction tracking rather than synchronous JMX polling. Tight loops exhaust thread pools and mask true I/O bottlenecks. Instead, implement exponential backoff paired with state validation.

def validate_compaction_health(payload: dict) -> dict:
    """Returns health verdict with actionable thresholds.
    Consumes the parsed dict from parse_compactionstats(): a `pending`
    integer and an `active` list of compaction rows."""
    pending = payload.get("pending", 0)
    active = payload.get("active", [])

    # Lowest progress among active compactions; compactionstats does not
    # report per-task throughput, so use progress to spot stalled work.
    min_progress = min(
        (row["completed"] / row["total"]
         for row in active if row.get("total", 0) > 0),
        default=1.0,
    )

    verdict = {"status": "healthy", "actions": []}
    if pending > 20:
        verdict["status"] = "backpressure"
        verdict["actions"].append("throttle_throughput")
    if active and min_progress < 0.05 and pending > 5:
        verdict["status"] = "possible_stall"
        verdict["actions"].append("investigate_disk_io")
    return verdict

Integrating this validation with Async Compaction Tracking & Metrics allows pipelines to decouple polling frequency from I/O pressure. Instead of fixed intervals, use the JMX CompletedTasks counter to trigger event-driven evaluations.

Safety Checks: Minimum throughput floor validation, pending queue depth capping, and non-blocking verdict generation. Expected Output: Structured health dictionary with explicit remediation directives. Rollback Path: If suspend_autocompaction is triggered, schedule a delayed nodetool enableautocompaction via systemd timer or cron to prevent permanent compaction starvation.

4. Repair Overlap & Strategy Coordination

Repair operations generate tombstone-heavy SSTables that immediately feed into the compaction queue. Without coordination, nodetool repair and background compaction compete for the same I/O bandwidth, causing latency spikes and read timeouts.

When parsing compactionstats during repair windows, filter active rows by compaction_type == "Compaction" and cross-reference keyspace/tables targeted by the running repair job. If pending queue depth exceeds 30 during repair, throttle compaction throughput:

# Safety: Throttle only during active repair windows.
# setcompactionthroughput is silent on success; verify via getcompactionthroughput.
nodetool setcompactionthroughput 50
nodetool getcompactionthroughput
# Rollback: nodetool setcompactionthroughput 0 (unlimited)

For deployments leveraging Advanced Compaction Strategy Tuning & Monitoring, align strategy parameters with observed queue behavior. LeveledCompactionStrategy (LCS) requires strict throughput caps to prevent L0-L1 merge storms, while SizeTieredCompactionStrategy (STCS) tolerates higher burst rates but demands aggressive tombstone purging.

Safety Checks: Verify repair job state via nodetool netstats (active streams) or nodetool repair_admin list / system_distributed.repair_history before throttling (nodetool compactionhistory reports completed compactions, not repair state). Confirm strategy type via nodetool describecluster or schema introspection. Expected Output: Stabilized queue depth, predictable repair completion times, and reduced p95 read latency. Rollback Path: If throttling causes repair timeouts, immediately restore unlimited throughput (nodetool setcompactionthroughput 0), clear pending state, and reschedule repair during off-peak windows.

5. Emergency Recovery & State Rollback Procedures

Automation failures around compaction telemetry require deterministic recovery paths. Corrupted state files, stuck JMX connections, or disk pressure events must be handled without manual intervention.

  1. State File Corruption: If STATE_FILE contains malformed hashes, purge the automation directory:
  rm -rf /var/lib/cassandra/automation/compaction_state/*
  # Safety: Only remove automation artifacts, never data directories
  # Expected Output: Clean slate for next polling cycle
  # Rollback: N/A (state is ephemeral)
  1. Stuck Compaction Detection: If the completed value remains static for > 15 minutes:
  nodetool stop COMPACTION
  # Safety: Halts active compaction operations of this type
  # Output: none on success (silent); check the exit code
  # Rollback: nodetool enableautocompaction && nodetool repair -pr
  1. JMX Timeout Recovery: If nodetool consistently times out:
  systemctl restart cassandra
  # Safety: Verify node drains connections before restart
  # Expected Output: Clean JMX reinitialization
  # Rollback: Restore from snapshot if data directory corruption detected

All automation pipelines should wrap these procedures in try/except blocks with structured logging. Never allow unhandled exceptions to leave the node in a partially throttled or suspended state.