Python Monitoring for Cassandra Compaction: Operational Guide
Compaction remains the dominant I/O consumer in Apache Cassandra and the primary driver of P99 latency degradation during maintenance windows, node decommissioning, or repair cycles. In distributed environments running v4.x and v5.x, static thresholds and manual nodetool compactionstats polling fail to capture the dynamic nature of SSTable merging, anti-entropy synchronization, and background thread scheduling. A deterministic Python-driven monitoring pipeline bridges this gap by providing structured telemetry ingestion, automated backlog computation, and safe orchestration of repair and lifecycle workflows. This guide details how to instrument compaction observability, synchronize maintenance scheduling, and enforce operational safeguards without disrupting client read/write paths.
At a high level, the monitoring pipeline collects metrics, evaluates compaction state, and dispatches a safe action as shown below:
Telemetry Architecture & Metric Collection
Modern Cassandra exposes compaction state through the native Prometheus endpoint (/metrics) and legacy JMX. Python monitoring should bypass CLI text parsing and consume structured time-series data directly. The critical metrics required for operational visibility include:
cassandra_compaction_pendingtaskscassandra_compaction_completedtaskscassandra_compaction_bytescompacted
Polling at 5–10 second intervals using requests with connection pooling allows for rolling window calculations that smooth transient spikes from memtable flushes or garbage collection pauses. When aligning collection cadence with cluster topology, reference Advanced Compaction Strategy Tuning & Monitoring to ensure metric granularity matches strategy-specific merge windows. For example, LeveledCompactionStrategy (LCS) requires tighter polling intervals than SizeTieredCompactionStrategy (STCS) due to its aggressive, multi-level merge behavior.
import requests
import time
import logging
from typing import Dict, Optional
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(message)s"
)
class CassandraMetricsCollector:
def __init__(self, node_ip: str, prom_port: int = 9103):
self.base_url = f"http://{node_ip}:{prom_port}/metrics"
self.session = requests.Session()
retry_strategy = Retry(
total=3, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
self.session.mount("http://", adapter)
def scrape_compaction_metrics(self) -> Dict[str, float]:
"""Parse Prometheus text format for compaction-specific metrics."""
try:
resp = self.session.get(self.base_url, timeout=5)
resp.raise_for_status()
metrics = {}
for line in resp.text.splitlines():
if line.startswith("cassandra_compaction_") and not line.startswith("#"):
# Handle metric format: metric_name{labels} value
parts = line.split()
if len(parts) >= 2:
key = parts[0].split("{")[0]
try:
metrics[key] = float(parts[1])
except ValueError:
continue
return metrics
except requests.RequestException as e:
logging.error(f"Failed to scrape metrics from {self.base_url}: {e}")
return {}Backlog Computation & Thresholding
Compaction velocity must be continuously evaluated against write ingestion rates to prevent backlog accumulation. The compaction deficit is calculated by comparing the delta of cassandra_compaction_bytescompacted over a fixed interval against the cluster’s sustained write throughput. When pending tasks exceed concurrent_compactors or backlog growth outpaces throughput, automated alerts should trigger. This logic directly feeds into Compaction Backlog Analysis & Alerting, where dynamic thresholding prevents false positives during peak ingestion or GC-induced stalls.
The Python implementation should maintain a sliding window of throughput deltas, applying exponential moving averages (EMA) to filter out noise. Below is a production-ready backlog evaluator:
from collections import deque
import math
class CompactionBacklogEvaluator:
def __init__(self, window_size: int = 12, ema_alpha: float = 0.3):
self.window_size = window_size
self.ema_alpha = ema_alpha
self.bytes_history = deque(maxlen=window_size)
self.time_history = deque(maxlen=window_size)
self.ema_throughput = 0.0
def update(self, bytes_compacted: float, timestamp: float) -> Optional[float]:
"""Update sliding window and return EMA-smoothed throughput (bytes/sec)."""
if not self.bytes_history:
self.bytes_history.append(bytes_compacted)
self.time_history.append(timestamp)
return None
prev_bytes = self.bytes_history[-1]
prev_time = self.time_history[-1]
delta_bytes = bytes_compacted - prev_bytes
delta_time = timestamp - prev_time
if delta_time <= 0:
return self.ema_throughput
current_throughput = delta_bytes / delta_time
self.bytes_history.append(bytes_compacted)
self.time_history.append(timestamp)
# EMA calculation
self.ema_throughput = (self.ema_alpha * current_throughput) + \
((1 - self.ema_alpha) * self.ema_throughput)
return self.ema_throughput
def compute_backlog_deficit(self, write_throughput: float) -> float:
"""Return deficit in bytes/sec. Positive = backlog growing."""
return max(0.0, write_throughput - self.ema_throughput)Repair Synchronization & Node Lifecycle Automation
Compaction state dictates safe execution windows for nodetool repair and node decommissioning. In Cassandra 4.x/5.x, incremental repair is the default, but full anti-entropy sweeps remain necessary for data consistency after topology changes. Python automation must query pending compaction tasks before initiating repair streams. Because a healthy node almost always has at least one pending task, gate on a sensible backlog threshold (e.g. MAX_PENDING around 20) rather than PendingTasks > 0. If pending tasks exceed that threshold, or measured compaction throughput exceeds 70% of the configured compaction_throughput (compaction_throughput_mb_per_sec prior to 4.1), the script should defer repair execution. This prevents I/O contention that could trigger ReadTimeout exceptions or stall streaming operations.
The orchestration logic should validate nodetool status and nodetool gossipinfo to ensure the target node is fully joined and streaming is idle. When integrating with broader maintenance pipelines, align scheduling with Async Compaction Tracking & Metrics to account for background compaction threads that operate independently of user-facing queries.
import subprocess
import json
def is_node_safe_for_repair(node_ip: str, pending_tasks: int, throughput: float, max_throughput: float, max_pending: int = 20) -> bool:
"""Validate node state before triggering repair."""
if pending_tasks > max_pending:
logging.warning(f"Deferring repair on {node_ip}: {pending_tasks} pending compaction tasks (> {max_pending}).")
return False
if throughput > (max_throughput * 0.7):
logging.warning(f"Deferring repair on {node_ip}: Compaction throughput at {throughput:.0f} bytes/sec.")
return False
# Verify streaming status via nodetool (v4.x/5.x compatible)
try:
result = subprocess.run(
["nodetool", "-h", node_ip, "netstats"],
capture_output=True, text=True, timeout=10
)
if "Streaming" in result.stdout and "active" in result.stdout.lower():
logging.warning(f"Deferring repair on {node_ip}: Active streaming detected.")
return False
except subprocess.SubprocessError as e:
logging.error(f"Failed to check netstats on {node_ip}: {e}")
return False
return True
def schedule_repair(node_ip: str, full_repair: bool = False):
"""Execute repair with v4.x/5.x compatible flags."""
cmd = ["nodetool", "-h", node_ip, "repair"]
if full_repair:
cmd.append("--full")
# In 4.x+, --sequential is default; --parallel is available for specific topologies
logging.info(f"Executing: {' '.join(cmd)}")
subprocess.run(cmd, check=True)Error Handling, Logging & Read Path Safeguards
Compaction failures often manifest as SSTableCorruptException, OutOfMemoryError, or disk space exhaustion. Python monitors must implement structured Compaction Error Categorization & Logging to route failures to centralized observability stacks (e.g., JSON stdout for Fluentd/Vector ingestion). When compaction stalls, the read path may experience degraded performance due to stale tombstones or fragmented SSTables. To mitigate this, automation should dynamically validate speculative_retry configurations and acknowledge that the probabilistic read_repair_chance/dclocal_read_repair_chance options were removed entirely in Cassandra 4.0, replaced by the table-level read_repair option ('BLOCKING' default, or 'NONE') plus Speculative Retry & Read Repair Tuning.
Fallback Routing & Read Path Optimization should be triggered when compaction backlogs exceed safe thresholds. The monitoring agent can temporarily route traffic away from degraded nodes by updating service discovery tags or adjusting client-side load balancing policies. For implementation details on latency-aware routing and real-time metric correlation, see Python Script for Real-Time Compaction Latency Tracking.
Production Deployment & Capacity Planning
Deploying this pipeline requires careful resource allocation. The monitoring agent should run as a lightweight systemd service or Kubernetes sidecar, consuming <2% CPU and <50MB RAM. Capacity planning must account for compaction I/O amplification, particularly in time-series workloads where TWCS aggressively merges partitions. Validate throughput projections using Performance Benchmarking & Capacity Planning methodologies, ensuring disk IOPS and network bandwidth scale linearly with compaction velocity.
Additionally, integrate with strategy selection workflows for Strategy Selection for Time-Series Workloads to prevent over-provisioning of compaction threads. In Cassandra 5.x, UnifiedCompactionStrategy (UCS) introduces adaptive thread scaling, which requires the Python monitor to dynamically adjust concurrent_compactors based on real-time backlog metrics rather than static configuration. Always test automation workflows in staging environments with production-like data distributions before enabling auto-remediation in critical clusters.