System Architecture & Design

LoomCache is built from the ground up to be a high-performance distributed cache engine with entirely custom architecture containing zero external cache dependencies. This page explains the inner mechanics.

Architecture Visualized

LoomCache Architecture Stack

Zero-dependency, pure Java execution path

1. Client Layer

2. Network Layer

3. Protocol & Auth

4. Consensus Layer

5. State & Storage

WAL.DAT

1. Client Layer

LoomClient routes request using MurmurHash3 to the correct partition leader.

Latency Budget< 0.5ms

CP-by-Default Architecture

LoomCache implements a CP-by-default (Consistent and Partition-tolerant) architecture under the CAP theorem, similar to etcd and ZooKeeper.

Every single write operation goes through Raft consensus:

Received by a client and routed to the Raft leader (transparent redirect via RESPONSE_REDIRECT).
Appended to the leader’s Raft log via RaftNode.appendLogEntry().
Encoded with MessageCodec.encode() for wire transmission.
Replicated to a majority of followers via AppendEntries RPC.
Once majority acknowledges, the leader advances commitIndex.
The commit is persisted to WAL with fsync enforcement.
Applied to the state machine by DataOperationHandler.
Response sent back to client with write acknowledgment.

This strict path guarantees no data loss during network partitions and maintains strict linearizability for all modifications.

Single Raft Group, Full Replication

LoomCache uses a single Raft group where:

Every node holds all data (no sharding — same model as etcd and ZooKeeper)
All 7 data structures go through the same Raft log
Three-node clusters are the recommended minimum (quorum = 2)
Larger clusters add redundancy but increase replication latency

Read Path: Leader Lease

Reads are served with configurable consistency:

Linearizable reads (leader with valid lease):

Client routes to the leader (cached address in LeaderTracker)
Leader checks lease expiration (recent heartbeat ACKs from majority)
If valid: serve directly from state machine — no quorum round-trip

Eventual consistency reads (follower):

Client routes to any follower for read scaling
Data may be stale by ≤100ms (default heartbeat config)

Write Path: Client Redirect

Clients handle RESPONSE_REDIRECT transparently:

Client: MapPut("key", "val") → Follower
Follower: "Not the leader, try Node-2"
Client: Caches leader → Reroutes to Node-2
Node-2: Append → Replicate → Commit → Apply → ACK

During elections (no leader), the client retries every 100ms for up to 15 seconds before throwing TimeoutException.

Core Layers

1. Networking (TcpServer)

LoomCache relies heavily on Java 25 Virtual Threads (Project Loom). The TcpServer spins up a lightweight virtual thread per incoming connection. When the thread performs a blocking I/O operation to read the custom binary protocol bytes, the JVM unmounts the carrier OS thread, allowing a single node to effortlessly handle hundreds of thousands of concurrent connections.

The networking layer includes:

Per-peer CircuitBreaker (CLOSED/OPEN/HALF_OPEN) preventing cascading failures
BackpressureController with watermark-based flow control
CommandExecutorPool for pipelined I/O
ConnectionContext tracking per-connection state and sequence numbers

2. Protocol Parser & RBAC

Data is transmitted using a highly optimized, custom Kryo-based binary protocol supporting 101 distinct message types. As bytes are parsed into Message objects via MessageCodec, the AuthenticationHandler verifies the requested operation against the user’s role-based access control lists (RBAC), instantly dropping unauthorized payloads. LoomCache also supports 52 RESP (Redis) commands for compatibility.

3. Raft Consensus Engine

Our custom RaftNode implementation hardens standard Raft with features like:

Pre-vote: Prevents partitioned nodes from randomly incrementing election terms and disrupting stable clusters.
Leader Leases: Lets the Raft leader serve strongly consistent local reads rapidly without needing a quorum validation on every GET.
Dynamic membership changes: AddServer/RemoveServer RPCs for live cluster scaling.
TimeoutNow RPC: Explicit leadership transfer for graceful shutdowns.
Log compaction: Snapshots every 10,000 entries to bound recovery time.

4. Write-Ahead Log (WAL)

Memory isn’t infinitely durable. Before any cluster ACK is returned to the user, the WalWriter appends the state change to disk (fsync enabled by default) ensuring total durability even during complete cluster power loss scenarios.

WAL format: 4-byte length + 8-byte term + 8-byte index + command bytes + 4-byte CRC32

Snapshots are created every 10,000 committed entries containing Kryo-serialized state of all 7 data structures. On restart: load snapshot → replay WAL entries → rejoin cluster.

5. Cluster Management

The cluster layer manages topology and failure detection:

PartitionTable with 16,384 consistent hash slots
PartitionMigrationManager for live rebalancing
PhiAccrualFailureDetector with adaptive thresholds for WAN/cloud
DiscoveryHealthChecker detecting stale/unreachable peers
5 discovery strategies: Static, DNS, Environment, File-based, Multicast

All 7 Data Structures via Single Raft Log

Every data structure operation is replicated through the same Raft log:

Data Structure	Operations	Example
LoomMap	PUT, GET, DELETE	`MapPut("user:1", "{name:Alice}")`
LoomQueue	OFFER, POLL	`QueueOffer("task-123")`
LoomSet	ADD, REMOVE, CONTAINS	`SetAdd("tag:java")`
LoomSortedSet	ADD, RANK, INCR	`SortedSetAdd("alice", 100.0)`
LoomTopic	PUBLISH, SUBSCRIBE	`TopicPublish("events", "login")`
LoomLock	TRY_LOCK, UNLOCK	`LockTryLock("critical", 10s)`
LoomCounter	INCREMENT, GET	`CounterIncrement("requests")`

Every operation appends an entry to the Raft log, ensuring strict ordering and durability. Followers apply entries in the same order, maintaining replica consistency.

Module Architecture

The codebase is organized into 5 Maven modules with 278 production source files:

loom-common/      Protocol, serialization, config, exceptions
loom-server/      Raft, networking, persistence, data structures
loom-client/      SDK with routing, near cache, connection pool
loom-spring-boot/ Auto-config, REST controllers, health
loom-integration-tests/ 11,600+ tests across 195 files

Observability

LoomCache provides comprehensive production observability:

Micrometer metrics in Prometheus format on configurable HTTP port
JFR custom events for CacheOperation, Eviction, NetworkConnection, RaftAppend, RaftElection
Hot key detection via sampling-based HotKeyDetector
Per-slot metrics tracking access count, key count, and memory per hash slot
Raft replication lag metrics per follower
Non-blocking audit logging with structured events
MDC context propagation for async operations via MdcPropagatingExecutor