Skip to content

Raft Clustering

LoomCache is fundamentally a distributed system. A single node dying should never result in lost data. To govern this distributed state, LoomCache implements the Raft Consensus Algorithm — the same protocol used by etcd, CockroachDB, and TiKV.

Raft Consensus Flow

Waiting for client request...

Client
Leader Node
WAL Term
Follower 1
Follower 2

At any given instance, one node in the LoomCache cluster is the Leader. The Leader is responsible for receiving write requests (e.g., PUT, DELETE). It writes the request to its local log and immediately broadcasts an AppendEntries RPC to the Followers.

Clients automatically discover the leader via LeaderTracker. If a write hits a follower, the client receives a RESPONSE_REDIRECT and transparently reroutes to the current leader.

A write is not considered “committed” until the Leader receives acknowledging responses from a strict majority of the cluster (the Quorum).

[!CAUTION] If a 5-node cluster suffers a 2-node failure, the remaining 3 nodes maintain quorum and continue serving writes. If 3 nodes fail, the remaining 2 lose quorum and will block writes to prevent “Split-Brain” data corruption.

LoomCache uses a single Raft group where every node holds all data — the same model as etcd and ZooKeeper. All 7 data structures (Map, Queue, Set, SortedSet, Topic, Lock, Counter) replicate through the same Raft log. Three-node clusters are the recommended minimum (quorum = 2).

LoomCache hardens standard Raft with several safety mechanisms:

  • Pre-vote phase: Before starting a real election, candidates run a pre-vote round. This prevents partitioned nodes from incrementing the global term number and disrupting stable clusters.
  • Randomized timeouts: Election timeouts are randomized between 300–600ms (configurable) to prevent simultaneous elections across the cluster.
  • Leader lease: The leader maintains a lease based on recent heartbeat acknowledgments from a majority. This enables fast linearizable reads without quorum round-trips on every GET.
  • TimeoutNow RPC: Explicit leadership transfer for graceful shutdowns and rolling upgrades.
ParameterDefaultPurpose
Heartbeat interval100msLeader → Follower keepalive
Election timeout (min)300msMinimum wait before election
Election timeout (max)600msMaximum wait (randomized)
Replication interval50msLog replication check frequency
Max entries per append100Batch size for AppendEntries RPC

Every committed entry is persisted to disk before acknowledgment:

  1. WalWriter appends the entry: 4-byte length + 8-byte term + 8-byte index + command + 4-byte CRC32
  2. fsync() is enforced after each commit (configurable)
  3. Snapshots are created every 10,000 entries (Kryo-serialized state of all data structures)
  4. On restart: load snapshot → replay WAL → rejoin cluster

[!NOTE] Place WAL on SSD/NVMe storage for optimal write latency. fsync typically adds 1–10ms on SSD.

LoomCache provides strong consistency (linearizability). This means that once a write is acknowledged to a client, all subsequent reads across the cluster — regardless of leader elections or network partitions — are guaranteed to reflect that write.

Read consistency options:

  • Linearizable: Route to leader with valid lease — fast, no quorum round-trip
  • Eventual: Route to any follower — stale by ≤100ms, enables read scaling

We actively enforce and test linearizability using custom chaos testing harnesses in our CI pipelines, including network partition injection, leader kills, and concurrent write verification across 11,600+ test methods.

LoomCache supports live cluster scaling via Raft configuration changes:

  • AddServer: New node joins, receives snapshot + WAL catch-up, then participates in quorum
  • RemoveServer: Node is removed from voting membership; partitions are migrated first
  • No downtime required — the cluster continues serving reads and writes during membership changes