Performance Tuning & Virtual Threads

Out of the box, LoomCache handles extreme concurrency workloads without complex tuning. This is primarily due to its integration with Java 25 Project Loom.

Java Virtual Threads (Project Loom)

Millions of lightweight virtual threads multiplexed over a few OS Carrier Threads.

OS Thread 1

Carrier

OS Thread 2

Carrier

The Virtual Thread Advantage

Older cache servers rely on complex, non-blocking asynchronous event loops (like Netty or epoll) because operating system (OS) threads are too heavy. A standard Linux server might struggle to maintain 10,000 active OS threads without suffocating under context-switching overhead.

LoomCache’s TcpServer completely ignores this limitation by utilizing Virtual Threads. For every single client connection, LoomCache spins up a dedicated virtual thread.

When that thread blocks to read binary data from the network socket, or waits for an fsync down to the WAL, the JVM automatically unmounts the virtual thread from the underlying OS carrier thread. This allows a standard 8-core machine to manage 100,000+ simultaneous open connections using readable, blocking Java code.

Concurrency Model	Max Connections	Code Complexity
OS Threads	~10,000	Low
Netty/epoll	~100,000	High (callbacks)
Reactive (WebFlux)	~100,000	Very High (reactive chains)
Virtual Threads	~1,000,000+	Low (blocking code)

Garbage Collection

LoomCache heavily utilizes off-heap memory and Kryo 5 object pooling to minimize object allocation pressure on the garbage collector. To ensure ultra-low latency, configure the JVM to use ZGC or G1GC:

# Production JVM Args — ZGC (recommended)
JAVA_OPTS="-XX:+UseZGC -XX:MaxGCPauseMillis=10 -Xmx8G -Xms8G"

# Alternative — G1GC
JAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -Xmx8G -Xms8G"

Memory Tuning

Setting	Recommendation	Why
`-Xmx` / `-Xms`	Same value (e.g., 8G)	Avoids heap resizing pauses
`maxMemoryBytes`	70–80% of container limit	Leaves room for JVM overhead
`maxMapEntries`	Tune with `maxMemoryBytes`	Match to available heap
Kryo pooling	Enabled by default	Reduces allocation churn

Request Pipelining

The Pipeline API batches multiple commands into a single network round-trip:

Pipeline pipeline = client.newPipeline();
pipeline.mapPut("cache", "k1", "v1")
        .mapPut("cache", "k2", "v2")
        .mapGet("cache", "k1");
List<Object> results = pipeline.flush();

Throughput impact: Pipelining reduces per-operation latency by 10–100x by amortizing the TCP round-trip cost across many commands. Configure batch size (default 100) and flush timeout (default 10ms) via the client builder.

Near Cache Optimization

The client-side near cache eliminates network round-trips for frequently read keys:

LoomClient client = LoomClient.builder()
    .nearCacheEnabled(true)
    .nearCacheTtl(Duration.ofSeconds(30))
    .nearCacheMaxSize(10000)
    .build();

Metric	Without Near Cache	With Near Cache
GET latency	~1–5ms (network)	~0.01ms (local)
Network load	Every read	Only misses
Consistency	Linearizable	Eventual (TTL-bounded)

Near cache entries are invalidated via server push (instant) with TTL-based polling as fallback.

Hot-Key Detection

Hot-Key Detection & Near Caching

Mitigating Thundering Herd problems automatically

100k Users

Loom NearCacheKey: '/trending'

Partition LeaderNetwork I/O

Normal Traffic Distribution

A classic distributed systems problem is the “Thundering Herd” or “Hot Key” issue, where 90% of traffic suddenly targets a single Map key (which lives on a single partition leader).

LoomCache combats this via the HotKeyDetector. Through non-blocking, sliding-window sampling (default 5% sampling rate), LoomCache identifies hotspots and exposes them through SlotMetrics.

Configuration

HotKeyConfig.builder()
    .enabled(true)
    .samplingRate(0.05)         // 5% of accesses sampled
    .threshold(100)              // 100+ accesses/window = hot
    .window(Duration.ofSeconds(60))
    .maxTrackedKeys(50000)
    .build();

Applications can query hot key data and configure the LoomClient’s Near Cache to aggressively cache hot keys client-side, offloading the Raft leader entirely.

JFR Profiling

LoomCache emits 5 custom JFR (Java Flight Recorder) events for production profiling:

Event	Data
`CacheOperationEvent`	Key, operation, latency, result
`EvictionEvent`	Key, policy, reason
`NetworkConnectionEvent`	Peer, connect/disconnect, duration
`RaftAppendEvent`	Term, index, entry count, latency
`RaftElectionEvent`	Candidate, term, votes, outcome

Enable JFR in production with zero overhead:

java -XX:StartFlightRecording=filename=loomcache.jfr,duration=60s ...

Raft Consensus Tuning

Parameter	Default	Aggressive	Conservative
Heartbeat interval	100ms	50ms	500ms
Election timeout	300–600ms	100–200ms	1000–2000ms
Max entries/append	100	200	50
Replication interval	50ms	25ms	100ms

Trade-offs: Aggressive settings reduce failure detection latency but increase CPU and network overhead. Conservative settings reduce churn but increase time-to-recovery after failures.