Async Compaction Tracking & Metrics: Operational Guide

Asynchronous compaction in Apache Cassandra decouples SSTable merging from the synchronous write path, allowing ingestion to proceed while background threads consolidate data. While this architectural separation improves write availability, untracked async work silently degrades read latency, inflates disk utilization, and blocks anti-entropy repair streams. Production-grade tracking requires correlating JMX thread-pool metrics with nodetool compactionstats output to distinguish between healthy operational backlog and compaction starvation. For broader context on aligning compaction behavior with workload patterns, foundational principles are documented in Advanced Compaction Strategy Tuning & Monitoring.

Core Tracking Architecture & JMX Exposure

The async compaction subsystem exposes telemetry through the Java Management Extensions (JMX) interface. In Cassandra v4.x and v5.x, metrics are primarily published under org.apache.cassandra.metrics:type=Compaction and org.apache.cassandra.concurrent:type=CompactionExecutor. Operators must track three primary counters: PendingTasks, TotalCompactionsCompleted, and BytesCompacted. More critically, thread pool saturation is revealed through ActiveCount, TaskCount, and CompletedTaskCount.

The async metrics flow from the node through scraping into alert routing as shown below:

flowchart LR CAS["Cassandra (JMX or Prometheus endpoint)"] --> SCRAPE["Scrape interval (15 to 30s)"] SCRAPE --> PROM["Prometheus store"] PROM --> RULES["Alert rules"] RULES --> PD["PagerDuty"] RULES --> OG["OpsGenie"]
Async compaction metrics pipeline to alert routing

When PendingTasks consistently exceeds the effective concurrent_compactors limit, the compaction queue stalls and read amplification increases due to unmerged tombstones and overlapping SSTables. Note that v4.x+ dynamically scales the compaction thread pool based on available CPU cores and I/O capacity, making static thresholds less reliable than trend-based analysis. Parsing the text output of nodetool compactionstats programmatically (or reading the JMX metrics directly) remains the most reliable method for extracting pending tasks and active-compaction counts per strategy. For detailed field mapping, text output parsing, and threshold interpretation, consult Interpreting nodetool compactionstats Output.

Compaction velocity must be benchmarked against write ingestion rates. Calculate bytes_compacted_per_second over rolling 60-second windows. If velocity drops below 30% of the configured compaction_throughput_mb_per_sec, investigate disk I/O saturation, page cache pressure, or file descriptor exhaustion. For time-series workloads, window expiration triggers compaction bursts that temporarily spike pending tasks; this is expected and should not trigger false alarms. Align your tracking logic with Strategy Selection for Time-Series Workloads to differentiate between normal window rollover and pathological backlog growth.

Automated Polling & Threshold Enforcement

Manual polling is insufficient for distributed environments. Implement a lightweight Python daemon that queries JMX via jmx_exporter or parses nodetool compactionstats text output. Use the official cassandra-driver or py4j for direct JMX access if your environment restricts shell execution. The following Python snippet demonstrates safe metric polling with exponential backoff, subprocess isolation, and structured logging aligned with modern observability stacks:

import subprocess
import re
import time
import logging
import sys

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(name)s: %(message)s',
    handlers=[logging.StreamHandler(sys.stdout)]
)
logger = logging.getLogger("cassandra_compaction_tracker")

PENDING_RE = re.compile(r"^pending tasks:\s*(\d+)", re.MULTILINE)

def poll_compaction_metrics(node_ip: str, timeout: int = 15) -> dict:
    """
    Safely execute `nodetool compactionstats` and parse its text output.
    nodetool compactionstats emits a `pending tasks: N` line followed by a
    table of active compactions (id, compaction type, keyspace, table,
    completed, total, unit, progress); there is no JSON mode. Compatible
    with Cassandra v4.0+ and v5.x.
    """
    cmd = ["nodetool", "-h", node_ip, "compactionstats"]
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=True
        )
    except subprocess.CalledProcessError as e:
        logger.error(f"nodetool failed on {node_ip}: {e.stderr.strip()}")
        return {}
    except subprocess.TimeoutExpired:
        logger.warning(f"Poll timeout exceeded for {node_ip}")
        return {}

    return parse_compactionstats(result.stdout)

def parse_compactionstats(text: str) -> dict:
    """Extract pending count and active-compaction rows from text output."""
    match = PENDING_RE.search(text)
    pending = int(match.group(1)) if match else 0

    active = []
    for line in text.splitlines():
        cols = line.split()
        # Active rows look like:
        # <id> Compaction <keyspace> <table> <completed> <total> <unit> <progress>
        if len(cols) >= 8 and cols[1] in ("Compaction", "Validation",
                                          "Cleanup", "Scrub", "Upgradesstables"):
            try:
                completed = int(cols[-4])
                total = int(cols[-3])
            except ValueError:
                continue
            active.append({
                "keyspace": cols[2],
                "table": cols[3],
                "completed": completed,
                "total": total,
                "unit": cols[-2],
                "progress": cols[-1],
            })

    return {"pending": pending, "active": active}

def evaluate_backlog(metrics: dict) -> bool:
    """
    Evaluate pending tasks. Returns True if backlog is within acceptable limits.
    Compaction velocity should be derived from BytesCompacted JMX deltas across
    polls rather than a single compactionstats snapshot.
    """
    pending = metrics.get("pending", 0)

    if pending > 10:
        logger.warning(f"High pending compactions detected: {pending}")
        return False
    return True

if __name__ == "__main__":
    target_node = "127.0.0.1"
    backoff = 2
    for attempt in range(5):
        data = poll_compaction_metrics(target_node)
        if data:
            evaluate_backlog(data)
            break
        time.sleep(backoff)
        backoff = min(backoff * 2, 30)

For comprehensive guidance on integrating this polling logic into Prometheus exporters, Datadog agents, or custom telemetry pipelines, refer to Python Monitoring for Cassandra Compaction.

Backlog Analysis, Velocity & Alert Thresholds

Backlog analysis requires differentiating between transient spikes and sustained starvation. Implement sliding window calculations for bytes_compacted_per_second and compare against compaction_throughput_mb_per_sec. When velocity consistently falls below 30% of the configured limit, the node is likely I/O bound or experiencing thread pool contention. Alerting should trigger on:

  1. PendingTasks > concurrent_compactors * 2 for > 5 minutes
  2. BytesCompacted delta approaching zero while PendingTasks grows
  3. ActiveCount consistently at 0 despite high TaskCount

For detailed threshold calibration, alert routing, and integration with PagerDuty/Opsgenie, see Compaction Backlog Analysis & Alerting.

Repair Synchronization & Read Path Implications

Async compaction tracking directly impacts anti-entropy repair scheduling. Repair operations in Cassandra v4.x/v5.x rely on streaming consistent data ranges; if compaction queues are saturated, repair streams compete for the same disk I/O and network bandwidth, causing timeouts and inconsistent Merkle tree validation. Schedule incremental repairs (nodetool repair -pr; incremental is the default) during low-compaction windows, or dynamically pause repair when PendingTasks exceeds a safety threshold.

When compaction lags, tombstone density increases across SSTables, forcing the read coordinator to perform extensive filtering. This directly influences Speculative Retry & Read Repair Tuning parameters, as slow reads trigger speculative requests that further saturate the network. If nodes become I/O bound, Fallback Routing & Read Path Optimization strategies should be engaged to route queries to replicas with healthier compaction states. Additionally, unhandled merge failures must be captured through Compaction Error Categorization & Logging workflows. Monitor org.apache.cassandra.db.compaction.CompactionManager logs for CorruptSSTableException, OutOfMemoryError, or IOException during merge operations, and route these to dedicated error-handling runbooks rather than generic alert channels.

Production Validation & Capacity Alignment

Validate all automation steps against Cassandra v4.x/v5.x behavioral standards. The compaction_throughput_mb_per_sec setting defaults to 64 (it was renamed compaction_throughput in 4.1+); a value of 0 means unlimited. Because the effective rate is shared across concurrent compactors, derive velocity from actual BytesCompacted JMX deltas rather than static config assumptions.

Capacity planning must incorporate compaction overhead. A healthy cluster typically maintains compaction I/O at 20-35% of total disk throughput during peak ingestion. Use historical BytesCompacted trends alongside write throughput to forecast disk expansion or strategy migration. For rigorous throughput validation and hardware sizing methodologies, align your tracking framework with Performance Benchmarking & Capacity Planning best practices. Regularly audit nodetool info for Load and Uptime alongside compaction metrics to prevent silent degradation from masking underlying hardware failures.

Related guides