System Architecture & Design
LoomCache is six Maven modules with a strict dependency chain. Every node runs one TCP server and at least one Raft
state machine (raft-0 by default); opt-in sharding starts multiple Raft groups and routes operations to the owning
group. Production deployments must keep sharding disabled until every group has independent WAL, Raft metadata,
snapshot, install-snapshot, and restart recovery evidence; the production profile is expected to fail closed otherwise.
The client SDK learns leaders and partition ownership via redirects/table refreshes and caches them for partition-aware
routing.
Architecture Visualized
Section titled “Architecture Visualized”LoomCache Architecture Stack
Waiting for client request...
Modules
Section titled “Modules”- loom-common (Java 17+): wire protocol (
MessageType,MessageCodec,Message), Kryo serialization,ClusterConfig,TlsConfig,AuthConfig, model DTOs, exceptions. - loom-server (Java 25,
--enable-preview): TCP server, Raft, data structures, WAL, snapshots, CP subsystem. - loom-client (Java 17+): smart routing, connection pools, near cache, retry, dedup, CP facades.
- loom-cli (Java 17+): data-structure inspection, cluster state, Raft log export, CP admin commands.
- loom-spring-boot: Spring Boot 4.0.5 auto-configuration, REST controllers, cache/session beans.
- loom-integration-tests: multi-node IT suite with a Java Jepsen-style harness.
Write path
Section titled “Write path”- Client hashes the key, picks a likely partition owner, and sends the request (e.g.
MAP_PUT, opcode0x02). TcpServeraccepts on a virtual thread and dispatches viaMessageHandler.- Followers respond with
RESPONSE_REDIRECTplus the leader’s address; the client updatesLeaderTrackerand retries on the leader. - The leader wraps the message as a Raft
LogEntry, appends toRaftLog, and replicates viaAppendEntries. - Once a majority acknowledges, the state-machine applier decodes and runs the command through
DataOperationHandlerand returns the response message. - The client response is released after committed apply; persistent Raft logs and WAL/snapshot components provide the disk durability path.
Read path
Section titled “Read path”- Linearizable reads use ReadIndex: the leader captures its commit index at receive time, confirms its lease, then answers. No disk I/O, no quorum round-trip.
- Linearizable reads on followers return
RESPONSE_REDIRECT— everything routes to the leader. - CP atomic reads (
CP_ATOMIC_GET) take the same ReadIndex path viaConsistencySubsystemHandler.
Implementation lives in loom-server/src/main/java/com/loomcache/server/consensus:
RaftNode— FSM (FOLLOWER → CANDIDATE → LEADER) with pre-vote, randomized election timeouts, and leader lease.RaftLog— in-memory index backed byPersistentRaftLog.LeaderLease— tracks lease validity so the leader can answer linearizable reads without a quorum RTT.RaftInvariantChecker— runtime invariant assertions exercised by tests.ConfigChange— joint-consensus membership changes.
Default deployments run a single Raft group over the full cluster. When ClusterConfig.shardingEnabled(true) is set,
RaftGroupManager starts multiple groups and CacheNode.resolveRaftGroup(...) dispatches Raft RPCs by group name.
That multi-group path is a development/certification surface, not production support, until per-group recovery is
proven end to end.
Persistence
Section titled “Persistence”- Write-ahead log in
loom-server/.../persistence:WalWriter,WalReader,WalCompactor, with CRC32 records, sidecar.checksumvalidation on boot, rotation, and compression. - Snapshots:
SnapshotManager,SnapshotStore,DeltaSnapshot,SnapshotChain,SnapshotScheduler. RaftMetadataStorepersists term / vote / commit index.- Setting
dataDir = nulldisables persistence (useful in tests).
Cluster membership
Section titled “Cluster membership”- SWIM gossip (
SwimGossipProtocol), phi-accrual failure detector (PhiAccrualFailureDetector),MembershipProtocol. - Discovery strategies composed by
CompositeDiscovery:StaticDiscovery,DnsDiscovery,KubernetesApiDiscovery,MulticastDiscovery,EnvironmentDiscovery,FileBasedDiscovery. ConsistentHashRing+PartitionTablefeed smart client routing.
CP and Executor
Section titled “CP and Executor”- Wire
CP_ATOMIC_*operations are the supported client CP surface. Wire locks and semaphores are unsupported and must fail closed in production until session create/heartbeat/close/expiry/force-close opcodes are part of the Raft-backed contract. - Embedded CP primitives can be used in-process for non-production or controlled embedded scenarios, but they are not a substitute for a supported wire/session lifecycle.
- Executor opcodes are not a production scheduling claim unless the release notes for the target artifact explicitly name submit/cancel/shutdown recovery semantics.
Thread model
Section titled “Thread model”- One virtual thread per TCP connection in
TcpServer. - A virtual-thread executor (
CommandExecutorPool) dispatches commands. - Raft replication, WAL fsync, snapshot scheduling, and partition migration run on dedicated
ScheduledExecutorServices — not virtual threads — so the fan-out stays bounded.