Skip to content

System Architecture & Design

LoomCache is six Maven modules with a strict dependency chain. Every node runs one TCP server and at least one Raft state machine (raft-0 by default); opt-in sharding starts multiple Raft groups and routes operations to the owning group. Production deployments must keep sharding disabled until every group has independent WAL, Raft metadata, snapshot, install-snapshot, and restart recovery evidence; the production profile is expected to fail closed otherwise. The client SDK learns leaders and partition ownership via redirects/table refreshes and caches them for partition-aware routing.

LoomCache Architecture Stack

Waiting for client request...

LoomClientMurmurHash3
Network Layer
VThreadsmTLS
Protocol & Auth
RBAC113 msgs
Raft Leader
Raft Log
Follower 1
Follower 2
WAL
State & Storagefsync WAL + Map/Queue/Topic
Latency--
  • loom-common (Java 17+): wire protocol (MessageType, MessageCodec, Message), Kryo serialization, ClusterConfig, TlsConfig, AuthConfig, model DTOs, exceptions.
  • loom-server (Java 25, --enable-preview): TCP server, Raft, data structures, WAL, snapshots, CP subsystem.
  • loom-client (Java 17+): smart routing, connection pools, near cache, retry, dedup, CP facades.
  • loom-cli (Java 17+): data-structure inspection, cluster state, Raft log export, CP admin commands.
  • loom-spring-boot: Spring Boot 4.0.5 auto-configuration, REST controllers, cache/session beans.
  • loom-integration-tests: multi-node IT suite with a Java Jepsen-style harness.
  1. Client hashes the key, picks a likely partition owner, and sends the request (e.g. MAP_PUT, opcode 0x02).
  2. TcpServer accepts on a virtual thread and dispatches via MessageHandler.
  3. Followers respond with RESPONSE_REDIRECT plus the leader’s address; the client updates LeaderTracker and retries on the leader.
  4. The leader wraps the message as a Raft LogEntry, appends to RaftLog, and replicates via AppendEntries.
  5. Once a majority acknowledges, the state-machine applier decodes and runs the command through DataOperationHandler and returns the response message.
  6. The client response is released after committed apply; persistent Raft logs and WAL/snapshot components provide the disk durability path.
  • Linearizable reads use ReadIndex: the leader captures its commit index at receive time, confirms its lease, then answers. No disk I/O, no quorum round-trip.
  • Linearizable reads on followers return RESPONSE_REDIRECT — everything routes to the leader.
  • CP atomic reads (CP_ATOMIC_GET) take the same ReadIndex path via ConsistencySubsystemHandler.

Implementation lives in loom-server/src/main/java/com/loomcache/server/consensus:

  • RaftNode — FSM (FOLLOWER → CANDIDATE → LEADER) with pre-vote, randomized election timeouts, and leader lease.
  • RaftLog — in-memory index backed by PersistentRaftLog.
  • LeaderLease — tracks lease validity so the leader can answer linearizable reads without a quorum RTT.
  • RaftInvariantChecker — runtime invariant assertions exercised by tests.
  • ConfigChange — joint-consensus membership changes.

Default deployments run a single Raft group over the full cluster. When ClusterConfig.shardingEnabled(true) is set, RaftGroupManager starts multiple groups and CacheNode.resolveRaftGroup(...) dispatches Raft RPCs by group name. That multi-group path is a development/certification surface, not production support, until per-group recovery is proven end to end.

  • Write-ahead log in loom-server/.../persistence: WalWriter, WalReader, WalCompactor, with CRC32 records, sidecar .checksum validation on boot, rotation, and compression.
  • Snapshots: SnapshotManager, SnapshotStore, DeltaSnapshot, SnapshotChain, SnapshotScheduler.
  • RaftMetadataStore persists term / vote / commit index.
  • Setting dataDir = null disables persistence (useful in tests).
  • SWIM gossip (SwimGossipProtocol), phi-accrual failure detector (PhiAccrualFailureDetector), MembershipProtocol.
  • Discovery strategies composed by CompositeDiscovery: StaticDiscovery, DnsDiscovery, KubernetesApiDiscovery, MulticastDiscovery, EnvironmentDiscovery, FileBasedDiscovery.
  • ConsistentHashRing + PartitionTable feed smart client routing.
  • Wire CP_ATOMIC_* operations are the supported client CP surface. Wire locks and semaphores are unsupported and must fail closed in production until session create/heartbeat/close/expiry/force-close opcodes are part of the Raft-backed contract.
  • Embedded CP primitives can be used in-process for non-production or controlled embedded scenarios, but they are not a substitute for a supported wire/session lifecycle.
  • Executor opcodes are not a production scheduling claim unless the release notes for the target artifact explicitly name submit/cancel/shutdown recovery semantics.
  • One virtual thread per TCP connection in TcpServer.
  • A virtual-thread executor (CommandExecutorPool) dispatches commands.
  • Raft replication, WAL fsync, snapshot scheduling, and partition migration run on dedicated ScheduledExecutorServices — not virtual threads — so the fan-out stays bounded.