Skip to content

Performance Tuning & Virtual Threads

Out of the box, LoomCache handles extreme concurrency workloads without complex tuning. This is primarily due to its integration with Java 25 Project Loom.

Java Virtual Threads (Project Loom)

Millions of lightweight virtual threads multiplexed over a few OS Carrier Threads.

VT
VT
VT

OS Thread 1

Carrier

OS Thread 2

Carrier

Older cache servers rely on complex, non-blocking asynchronous event loops (like Netty or epoll) because operating system (OS) threads are too heavy. A standard Linux server might struggle to maintain 10,000 active OS threads without suffocating under context-switching overhead.

LoomCache’s TcpServer completely ignores this limitation by utilizing Virtual Threads. For every single client connection, LoomCache spins up a dedicated virtual thread.

When that thread blocks to read binary data from the network socket, or waits for an fsync down to the WAL, the JVM automatically unmounts the virtual thread from the underlying OS carrier thread. This allows a standard 8-core machine to manage 100,000+ simultaneous open connections using readable, blocking Java code.

Concurrency ModelMax ConnectionsCode Complexity
OS Threads~10,000Low
Netty/epoll~100,000High (callbacks)
Reactive (WebFlux)~100,000Very High (reactive chains)
Virtual Threads~1,000,000+Low (blocking code)

LoomCache heavily utilizes off-heap memory and Kryo 5 object pooling to minimize object allocation pressure on the garbage collector. To ensure ultra-low latency, configure the JVM to use ZGC or G1GC:

Terminal window
# Production JVM Args — ZGC (recommended)
JAVA_OPTS="-XX:+UseZGC -XX:MaxGCPauseMillis=10 -Xmx8G -Xms8G"
# Alternative — G1GC
JAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -Xmx8G -Xms8G"
SettingRecommendationWhy
-Xmx / -XmsSame value (e.g., 8G)Avoids heap resizing pauses
maxMemoryBytes70–80% of container limitLeaves room for JVM overhead
maxMapEntriesTune with maxMemoryBytesMatch to available heap
Kryo poolingEnabled by defaultReduces allocation churn

The Pipeline API batches multiple commands into a single network round-trip:

Pipeline pipeline = client.newPipeline();
pipeline.mapPut("cache", "k1", "v1")
.mapPut("cache", "k2", "v2")
.mapGet("cache", "k1");
List<Object> results = pipeline.flush();

Throughput impact: Pipelining reduces per-operation latency by 10–100x by amortizing the TCP round-trip cost across many commands. Configure batch size (default 100) and flush timeout (default 10ms) via the client builder.

The client-side near cache eliminates network round-trips for frequently read keys:

LoomClient client = LoomClient.builder()
.nearCacheEnabled(true)
.nearCacheTtl(Duration.ofSeconds(30))
.nearCacheMaxSize(10000)
.build();
MetricWithout Near CacheWith Near Cache
GET latency~1–5ms (network)~0.01ms (local)
Network loadEvery readOnly misses
ConsistencyLinearizableEventual (TTL-bounded)

Near cache entries are invalidated via server push (instant) with TTL-based polling as fallback.

Hot-Key Detection & Near Caching

Mitigating Thundering Herd problems automatically

100k Users
Loom NearCacheKey: '/trending'
Partition LeaderNetwork I/O
Normal Traffic Distribution

A classic distributed systems problem is the “Thundering Herd” or “Hot Key” issue, where 90% of traffic suddenly targets a single Map key (which lives on a single partition leader).

LoomCache combats this via the HotKeyDetector. Through non-blocking, sliding-window sampling (default 5% sampling rate), LoomCache identifies hotspots and exposes them through SlotMetrics.

HotKeyConfig.builder()
.enabled(true)
.samplingRate(0.05) // 5% of accesses sampled
.threshold(100) // 100+ accesses/window = hot
.window(Duration.ofSeconds(60))
.maxTrackedKeys(50000)
.build();

Applications can query hot key data and configure the LoomClient’s Near Cache to aggressively cache hot keys client-side, offloading the Raft leader entirely.

LoomCache emits 5 custom JFR (Java Flight Recorder) events for production profiling:

EventData
CacheOperationEventKey, operation, latency, result
EvictionEventKey, policy, reason
NetworkConnectionEventPeer, connect/disconnect, duration
RaftAppendEventTerm, index, entry count, latency
RaftElectionEventCandidate, term, votes, outcome

Enable JFR in production with zero overhead:

Terminal window
java -XX:StartFlightRecording=filename=loomcache.jfr,duration=60s ...
ParameterDefaultAggressiveConservative
Heartbeat interval100ms50ms500ms
Election timeout300–600ms100–200ms1000–2000ms
Max entries/append10020050
Replication interval50ms25ms100ms

Trade-offs: Aggressive settings reduce failure detection latency but increase CPU and network overhead. Conservative settings reduce churn but increase time-to-recovery after failures.