Skip to content

Raft Clustering

LoomCache is fundamentally a distributed system. A single node dying should never result in lost data. To govern this distributed state, LoomCache implements the Raft Consensus Algorithm — a widely adopted consensus protocol for distributed systems.

Raft Consensus Flow

Waiting for client request...

Client
Leader Node
WAL Term
Follower 1
Follower 2

At any given instance, one node in the LoomCache cluster is the Leader. The Leader is responsible for receiving write requests (e.g., PUT, DELETE). It writes the request to its local log and immediately broadcasts an AppendEntries RPC to the Followers.

Clients automatically discover the leader via LeaderTracker. If a write hits a follower, the client receives a RESPONSE_REDIRECT and transparently reroutes to the current leader.

A write is not considered “committed” until the Leader receives acknowledging responses from a strict majority of the cluster (the Quorum).

[!CAUTION] If a 5-node cluster suffers a 2-node failure, the remaining 3 nodes maintain quorum and continue serving writes. If 3 nodes fail, the remaining 2 lose quorum and will block writes to prevent “Split-Brain” data corruption.

The cluster currently runs a single Raft group covering every node. Every node holds every partition; all writes flow through the same log. Three-node clusters are the recommended minimum (quorum = 2). The sharding/ package ships multi-group primitives but production sharding is unsupported/fail-closed until per-group WAL, Raft metadata, snapshot, install-snapshot, and restart recovery are proven.

LoomCache hardens standard Raft with several safety mechanisms:

  • Pre-vote phase: Before starting a real election, candidates run a pre-vote round. This prevents partitioned nodes from incrementing the global term number and disrupting stable clusters.
  • Randomized timeouts: Election timeouts are randomized between 300–600ms (configurable) to prevent simultaneous elections across the cluster.
  • Leader lease: The leader maintains a lease based on recent heartbeat acknowledgments from a majority. This enables fast linearizable reads without quorum round-trips on every GET.
  • TimeoutNow RPC: Explicit leadership transfer for graceful shutdowns and rolling upgrades.
ParameterDefaultPurpose
Heartbeat interval100msLeader → Follower keepalive
Election timeout (min)300msMinimum wait before election
Election timeout (max)600msMaximum wait (randomized)
Replication interval50msLog replication check frequency
Max entries per append100Batch size for AppendEntries RPC

Every committed entry is persisted to disk before acknowledgment:

  1. WalWriter appends the entry: 4-byte length + 8-byte term + 8-byte index + command + 4-byte CRC32
  2. fsync() is enforced after each commit (configurable)
  3. Snapshots are created every 10,000 entries (Kryo-serialized state of all data structures)
  4. On restart: load snapshot → replay WAL → rejoin cluster

[!NOTE] Place WAL on SSD/NVMe storage for optimal write latency. fsync typically adds 1–10ms on SSD.

LoomCache provides strong consistency (linearizability). This means that once a write is acknowledged to a client, all subsequent reads across the cluster — regardless of leader elections or network partitions — are guaranteed to reflect that write.

Read consistency options:

  • Linearizable: Route to leader with valid lease — fast, no quorum round-trip
  • Eventual: Route to any follower — stale by ≤100ms, enables read scaling

Linearizability is verified with a Wing-Gong checker under the Java Jepsen-style harness — see Jepsen testing for the scenario catalogue.

LoomCache supports live cluster scaling via Raft configuration changes:

  • AddServer: New node joins, receives snapshot + WAL catch-up, then participates in quorum
  • RemoveServer: Node is removed from voting membership; partitions are migrated first
  • No downtime required — the cluster continues serving reads and writes during membership changes