Performance Tuning & Virtual Threads
Out of the box, LoomCache handles extreme concurrency workloads without complex tuning. This is primarily due to its integration with Java 25 Project Loom.
Java Virtual Threads (Project Loom)
Millions of lightweight virtual threads multiplexed over a few OS Carrier Threads.
OS Thread 1
OS Thread 2
The Virtual Thread Advantage
Section titled “The Virtual Thread Advantage”Older cache servers rely on complex, non-blocking asynchronous event loops (like Netty or epoll) because operating system (OS) threads are too heavy. A standard Linux server might struggle to maintain 10,000 active OS threads without suffocating under context-switching overhead.
LoomCache’s TcpServer completely ignores this limitation by utilizing Virtual Threads. For every single client connection, LoomCache spins up a dedicated virtual thread.
When that thread blocks to read binary data from the network socket, or waits for an fsync down to the WAL, the JVM automatically unmounts the virtual thread from the underlying OS carrier thread. This allows a standard 8-core machine to manage 100,000+ simultaneous open connections using readable, blocking Java code.
| Concurrency Model | Max Connections | Code Complexity |
|---|---|---|
| OS Threads | ~10,000 | Low |
| Netty/epoll | ~100,000 | High (callbacks) |
| Reactive (WebFlux) | ~100,000 | Very High (reactive chains) |
| Virtual Threads | ~1,000,000+ | Low (blocking code) |
Garbage Collection
Section titled “Garbage Collection”LoomCache heavily utilizes off-heap memory and Kryo 5 object pooling to minimize object allocation pressure on the garbage collector. To ensure ultra-low latency, configure the JVM to use ZGC or G1GC:
# Production JVM Args — ZGC (recommended)JAVA_OPTS="-XX:+UseZGC -XX:MaxGCPauseMillis=10 -Xmx8G -Xms8G"
# Alternative — G1GCJAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -Xmx8G -Xms8G"Memory Tuning
Section titled “Memory Tuning”| Setting | Recommendation | Why |
|---|---|---|
-Xmx / -Xms | Same value (e.g., 8G) | Avoids heap resizing pauses |
maxMemoryBytes | 70–80% of container limit | Leaves room for JVM overhead |
maxMapEntries | Tune with maxMemoryBytes | Match to available heap |
| Kryo pooling | Enabled by default | Reduces allocation churn |
Request Pipelining
Section titled “Request Pipelining”The Pipeline API batches multiple commands into a single network round-trip:
Pipeline pipeline = client.newPipeline();pipeline.mapPut("cache", "k1", "v1") .mapPut("cache", "k2", "v2") .mapGet("cache", "k1");List<Object> results = pipeline.flush();Throughput impact: Pipelining reduces per-operation latency by 10–100x by amortizing the TCP round-trip cost across many commands. Configure batch size (default 100) and flush timeout (default 10ms) via the client builder.
Near Cache Optimization
Section titled “Near Cache Optimization”The client-side near cache eliminates network round-trips for frequently read keys:
LoomClient client = LoomClient.builder() .nearCacheEnabled(true) .nearCacheTtl(Duration.ofSeconds(30)) .nearCacheMaxSize(10000) .build();| Metric | Without Near Cache | With Near Cache |
|---|---|---|
| GET latency | ~1–5ms (network) | ~0.01ms (local) |
| Network load | Every read | Only misses |
| Consistency | Linearizable | Eventual (TTL-bounded) |
Near cache entries are invalidated via server push (instant) with TTL-based polling as fallback.
Hot-Key Detection
Section titled “Hot-Key Detection”Hot-Key Detection & Near Caching
Mitigating Thundering Herd problems automatically
A classic distributed systems problem is the “Thundering Herd” or “Hot Key” issue, where 90% of traffic suddenly targets a single Map key (which lives on a single partition leader).
LoomCache combats this via the HotKeyDetector. Through non-blocking, sliding-window sampling (default 5% sampling rate), LoomCache identifies hotspots and exposes them through SlotMetrics.
Configuration
Section titled “Configuration”HotKeyConfig.builder() .enabled(true) .samplingRate(0.05) // 5% of accesses sampled .threshold(100) // 100+ accesses/window = hot .window(Duration.ofSeconds(60)) .maxTrackedKeys(50000) .build();Applications can query hot key data and configure the LoomClient’s Near Cache to aggressively cache hot keys client-side, offloading the Raft leader entirely.
JFR Profiling
Section titled “JFR Profiling”LoomCache emits 5 custom JFR (Java Flight Recorder) events for production profiling:
| Event | Data |
|---|---|
CacheOperationEvent | Key, operation, latency, result |
EvictionEvent | Key, policy, reason |
NetworkConnectionEvent | Peer, connect/disconnect, duration |
RaftAppendEvent | Term, index, entry count, latency |
RaftElectionEvent | Candidate, term, votes, outcome |
Enable JFR in production with zero overhead:
java -XX:StartFlightRecording=filename=loomcache.jfr,duration=60s ...Raft Consensus Tuning
Section titled “Raft Consensus Tuning”| Parameter | Default | Aggressive | Conservative |
|---|---|---|---|
| Heartbeat interval | 100ms | 50ms | 500ms |
| Election timeout | 300–600ms | 100–200ms | 1000–2000ms |
| Max entries/append | 100 | 200 | 50 |
| Replication interval | 50ms | 25ms | 100ms |
Trade-offs: Aggressive settings reduce failure detection latency but increase CPU and network overhead. Conservative settings reduce churn but increase time-to-recovery after failures.