Strategy Selection for Time-Series Workloads in Cassandra 4.x/5.x

Time-series ingestion in Apache Cassandra demands a compaction architecture that aligns strictly with data lifecycle boundaries, TTL expiration curves, and query access patterns. When the strategy matches the workload, SSTables age out cleanly, tombstones are dropped by dropping whole files rather than scanning them, and read amplification stays flat. When it does not, the same append-only telemetry that Cassandra handles effortlessly turns into compaction storms, repair drift, and disk exhaustion. This guide is for DBAs and SREs who need to pick a strategy for temporal data, size its windows against retention, and keep repair from colliding with the resulting compaction schedule. It sits under Advanced Compaction Strategy Tuning & Monitoring; read that first if you have not yet mapped your table-level strategy to your workload. Every command and threshold below is validated against Cassandra 4.0, 4.1, and 5.0, with version drift called out inline.

Use this page when your data is append-mostly, timestamp-ordered, and TTL-bounded — infrastructure metrics, IoT sensor streams, event logs, financial ticks. If your workload is update-in-place or read-dominated point lookups, the trade-offs between STCS, LCS, and TWCS point elsewhere and this guide is the wrong tool.

Concept: how TWCS window boundaries govern lifecycle

TimeWindowCompactionStrategy (TWCS) groups SSTables into discrete temporal buckets defined by compaction_window_size and compaction_window_unit. Each SSTable is assigned to the window that contains the maximum timestamp of the data it holds. Within the current (newest) window, TWCS behaves exactly like SizeTieredCompactionStrategy (STCS), merging similarly sized SSTables as they flush. Once a window closes — no more writes land in its time range — TWCS performs a single major compaction to collapse that window into one SSTable and then leaves it untouched forever. This is the property that makes TWCS efficient for temporal data: old data is never rewritten, so write amplification approaches its theoretical floor and disk I/O is spent almost entirely on the hot window.

The design has one hard rule and one hard failure mode. The rule: window boundaries must mirror both your query granularity and your data retention policy. The failure mode: any write whose timestamp lands in an already-closed window — late-arriving data, a backfill, or a client clock skew — forces Cassandra to merge cold data back into a sealed bucket, defeating the whole model. Because TWCS drops an entire expired SSTable in one operation rather than scanning row-by-row for tombstones, correct sizing means expired data leaves via file deletion, which is orders of magnitude cheaper than the tombstone management and garbage collection path that STCS and LCS depend on for the same job.

Sizing follows directly from TTL. Aim for roughly 20–30 active windows over the lifetime of the data: compaction_window_size ≈ TTL / 25. Fewer windows and each SSTable grows large enough that the closing major compaction spikes I/O; more windows and per-SSTable overhead (bloom filters, index summaries, file handles) multiplies. For infrastructure telemetry, a 1-day window paired with a 90-day TTL yields ~90 windows — often acceptable. High-frequency IoT streams with a 7-day TTL are better served by 1-hour or 6-hour windows to keep each bucket small and the closing compaction cheap; the full window-to-TTL mapping and partition-key constraints are worked through in Configuring TWCS for IoT Sensor Data Streams.

The decision of which strategy to run at all is upstream of any window math. The tree below maps workload profile to strategy before a single parameter is touched.

Configuration reference

TWCS behaviour is set entirely through the per-table compaction map; there is no cluster-wide TWCS toggle. The keys below are the ones that determine lifecycle correctness. compaction_throughput and concurrent_compactors from cassandra.yaml still bound the aggregate work, but they are node-scoped and shared with every other table.

Key	Default	Valid range	Impact on compaction / repair / throughput
`compaction_window_unit`	`DAYS`	`MINUTES`, `HOURS`, `DAYS`	Time unit for the bucket; pair with size so `size × unit` divides the TTL into ~20–30 windows
`compaction_window_size`	`1`	`1` – tens	Number of units per window; too large spikes the closing major compaction, too small multiplies per-SSTable overhead
`unsafe_aggressive_sstable_expiration`	`false`	`true` / `false`	Drops fully expired SSTables without checking overlap; big I/O win but risks resurrecting data if repair is behind — only with disciplined repair cadence
`unchecked_tombstone_compaction`	`false`	`true` / `false`	Allows single-SSTable tombstone compaction; usually leave off under TWCS since whole-file drop handles expiry
`max_threshold` / `min_threshold`	`32` / `4`	`2` – 64	STCS thresholds applied within the active window only; rarely tuned for TWCS
`default_time_to_live` (table option)	`0`	`0` – `630720000` (20y)	Table-level TTL; must be consistent so windows expire predictably

A concrete schema for a 7-day IoT retention with hourly windows:

-- Cassandra 4.x / 5.x — TWCS with hourly windows for a 7-day TTL (~168 windows)
ALTER TABLE telemetry.readings
WITH compaction = {
  'class': 'TimeWindowCompactionStrategy',
  'compaction_window_unit': 'HOURS',
  'compaction_window_size': '6',
  'unsafe_aggressive_sstable_expiration': 'false'
}
AND default_time_to_live = 604800
AND gc_grace_seconds = 10800;

Two version notes. First, on Cassandra 5.0 the default strategy for new tables is UnifiedCompactionStrategy (UCS), not STCS — TWCS is still the recommended explicit choice for pure time-series, but UCS with a tiered scaling_parameters value can approximate it for mixed temporal/update workloads. Second, keep gc_grace_seconds shorter than or equal to a single window when you rely on whole-SSTable expiration, but never shorter than your repair cadence, or deletes can resurrect. The interaction of gc_grace_seconds with repair is covered under anti-entropy repair.

Step-by-step: migrating a time-series table to sized TWCS

Run each step on a single canary node or table first, confirm the window behaviour, then roll forward. Altering compaction triggers a strategy rebuild that reprocesses existing SSTables, so treat it as an I/O event, not a metadata flip.

Gate on cluster health before touching schema. A strategy change during an existing backlog compounds it. Confirm headroom first:

# Cassandra 4.x/5.x — all nodes Up/Normal, queue shallow, throughput ceiling known
nodetool status | grep -E "^UN" | wc -l      # expect = node count
nodetool compactionstats -H                  # pending tasks should be low/idle
nodetool getcompactionthroughput

Expected output for a healthy node:

pending tasks: 0
Current compaction throughput: 64 MB/s

If pending tasks is already high, drain the backlog before proceeding — the procedure in Compaction Backlog Analysis & Alerting applies here as a pre-flight gate.

Compute the window from the TTL. Divide the retention window into ~20–30 buckets. For a 7-day (604800 s) TTL: 604800 / 25 ≈ 24192 s ≈ 6.7 h, so a 6-hour window (~28 windows) is a clean fit. Record the number; you verify against it later.
Apply the strategy on the canary table. Use the ALTER TABLE from the configuration reference above. On a large existing table, expect the node to reprocess SSTables into the new windows; monitor that it does not saturate I/O:
```
watch -n 5 'nodetool compactionstats -H | head -20'
```
Verify the window bucketing took effect. On Cassandra 5.0, read the live task and SSTable state from the virtual tables without JMX:
```
SELECT keyspace_name, table_name, level, size_bytes
FROM system_views.sstable_tasks;
```
On 4.x, inspect on-disk grouping and per-table stats with nodetool tablestats telemetry.readings (the current form; cfstats is a deprecated alias).
Schedule repair around the compaction schedule, not against it. Incremental repair is the default from 4.0; scope it to primary ranges and run it during a compaction lull:
```
# Only after step 1 confirms pending tasks are low
nodetool repair -pr telemetry
```
Gate this in automation: if pending compactions exceed a ceiling (a common rule is 2 × concurrent_compactors) or a repair session is already active, defer rather than overlap. Streaming and compaction share the same disk budget, so forcing both to peak together is the fastest route to node eviction. The driver-based scheduler that enforces this gate is documented in Python Monitoring for Cassandra Compaction.
Roll forward node by node, repeating steps 3–5, and only after every node reports stable windows should you consider nodetool cleanup — never during an active compaction.

Verification & observability

Confirm the strategy is behaving, not just that the ALTER succeeded. Three signals tell the whole story.

Watch the queue drain and stay flat after the rebuild settles:

nodetool compactionstats -H

Prove that old windows are retiring by whole-file deletion rather than tombstone scans. Correlate completions against history on 5.0:

SELECT keyspace_name, columnfamily_name, compacted_at, bytes_in, bytes_out
FROM system.compaction_history
WHERE keyspace_name = 'telemetry' ALLOW FILTERING;

A healthy TWCS signature shows bytes_out far smaller than bytes_in at window rollover (data collapsed to one SSTable), then expired windows disappearing from nodetool tablestats with no matching tombstone-scan spike. The single most diagnostic read-amplification signal is SSTablesPerReadHistogram from nodetool tablehistograms — under correctly sized TWCS a typical read should touch only one or two SSTables. Grep the log to confirm windows are closing on schedule and no late writes are reopening them:

grep -E "TimeWindowCompactionStrategy|Major compaction|dropping expired SSTable" /var/log/cassandra/system.log

If you scrape JMX, the same org.apache.cassandra.metrics:type=Compaction PendingTasks and BytesCompacted counters used for backlog velocity apply here; the sampling math is shared with Async Compaction Tracking & Metrics, so one scrape feeds both dashboards.

Failure modes & rollback

Late-arriving data reopens sealed windows. A backfill job or a client with a skewed clock writes rows whose timestamp falls in a closed window. TWCS is forced to merge new data into a sealed SSTable, read amplification climbs, and expiry stops being a clean file-drop. Detection: SSTablesPerReadHistogram rising on an otherwise-stable table, plus compaction_history showing repeated compactions on old windows. Rollback: stop the offending writer; if backfill is a permanent requirement, TWCS is the wrong fit — evaluate STCS or UCS, since the trade-offs between STCS, LCS, and TWCS favour a strategy that tolerates out-of-order writes.

Window too large — closing compaction storm. A window sized so that each bucket accumulates tens of GB turns the closing major compaction into an I/O spike that starves reads and repair. Detection: periodic PendingTasks and iowait spikes aligned to the window boundary cadence. Rollback: re-ALTER to a smaller compaction_window_size; the next rebuild re-buckets existing data into finer windows. Because ALTER reprocesses SSTables, apply it during off-peak and one node at a time.

Repair behind expiry resurrects deleted rows. With gc_grace_seconds set shorter than the repair cadence (or unsafe_aggressive_sstable_expiration enabled while repair lags), a whole SSTable is dropped before all replicas have converged, and a stale replica later resurfaces deleted data. Detection: deleted rows reappearing on reads after a node was down longer than gc_grace_seconds. Rollback: raise gc_grace_seconds back above your repair interval, disable aggressive expiration, and run a full repair to reconverge before re-tightening. During heavy compaction windows, keep reads off saturated replicas with the speculative_retry and read_repair tuning in Fallback Routing & Read Path Optimization.

FAQ

When should I use TWCS instead of STCS or LCS for time-series data?

Use TWCS when data is append-mostly, timestamp-ordered, and every row has a TTL. TWCS drops whole expired SSTables instead of scanning tombstones, so it wins decisively on write amplification and cleanup cost. Reach for LCS only if the same table also serves latency-sensitive point lookups with in-place updates, and for STCS only if you cannot control write ordering. On Cassandra 5.0, UCS with a tiered scaling_parameters can substitute when a table mixes temporal and update patterns.

How do I size compaction_window_size against my TTL?

Target roughly 20–30 active windows over the data lifetime: compaction_window_size ≈ TTL / 25, then round to a clean unit. A 7-day TTL suggests ~6-hour windows; a 90-day TTL suggests ~3-4 day windows. Too few windows makes each closing compaction an I/O spike; too many multiplies bloom-filter and file-handle overhead. Verify the resulting window count with nodetool tablestats after the ALTER.

What happens if gc_grace_seconds is shorter than my repair cadence?

Deleted data can resurrect. If a whole SSTable expires and is dropped before every replica has repaired the delete, a replica that was down longer than gc_grace_seconds will reintroduce the row on the next read or repair. Keep gc_grace_seconds at or above your repair interval, and only enable unsafe_aggressive_sstable_expiration when repair is reliably keeping pace.

Why is read latency climbing even though I switched to TWCS?

The most common cause is late-arriving writes landing in sealed windows, which forces old SSTables to recompact and inflates SSTablesPerReadHistogram. Check for clock skew on clients and for backfill jobs writing historical timestamps. A secondary cause is a window sized so large that reads still span many SSTables inside the active bucket — shrink compaction_window_size and re-verify.

Does changing the compaction strategy require downtime?

No, but it is not free. ALTER TABLE ... WITH compaction triggers a strategy rebuild that reprocesses existing SSTables into the new layout, consuming disk I/O. Apply it during off-peak, one node at a time, gate it behind low pending compactions, and watch nodetool compactionstats until the rebuild settles before moving to the next node.

Advanced Compaction Strategy Tuning & Monitoring — the parent guide covering strategy selection, tuning, and observability end to end.
Configuring TWCS for IoT Sensor Data Streams — the copy-paste deployment path for hourly-window TWCS with repair orchestration.
Understanding STCS vs LCS vs TWCS — the strategy trade-offs that decide whether TWCS is the right fit at all.
Compaction Backlog Analysis & Alerting — the pre-flight backlog gate before any strategy change or repair.
Python Monitoring for Cassandra Compaction — the driver-based scheduler that gates repair on compaction pressure.
Fallback Routing & Read Path Optimization — keeping reads off I/O-saturated replicas during window rollover.

Strategy Selection for Time-Series Workloads in Cassandra 4.x/5.x

Related guides