Performance Tuning

LoomCache has not been independently benchmarked by a third party. Treat the release performance gates below as acceptance checks for production-like validation, then measure with your workload before committing an application-facing SLO.

Java Virtual Threads (Project Loom)

Many lightweight virtual threads multiplexed over a few OS carrier threads.

OS Thread 1

Carrier

OS Thread 2

Carrier

Performance validation

LoomCache write latency is bound by Raft majority commit and WAL durability, not by a backup-ack shortcut. Release performance gates are same-hardware checks for the release artifact; they are not a universal latency SLO for every deployment.

For an operator-grade benchmark, define these checks before the run:

Check	What to capture
Evidence quality	Release version, workload definition, client count, value size, hardware, disks, JVM flags, network placement, WAL/durability mode, TLS/auth mode, runtime config identifiers, and raw results retained with the run.
Correctness	Zero application-visible errors, zero read mismatches, zero prepopulate read-back mismatches/errors, zero final verification mismatches/errors, and read checks against the latest value acknowledged by the harness for each key.
Throughput floor	The minimum sustained operations per second your application requires for each workload.
Same-hardware comparison	Any baseline system or previous LoomCache version measured on the same hosts, disks, network, JVM, and client placement.
Tail-latency cap	P95/P99/P99.9 limits that map to the application SLO, with steady-state and degraded-state targets tracked separately.

Any application-facing latency SLO is valid only while the cluster is ACTIVE, no member is partitioned, no snapshot install is in progress, WAL fsync latency is within the storage target, and command queues are not backpressured. During leader election, partition healing, full snapshot transfer, or sustained backpressure, use the degradation matrix instead of the steady state SLO.

Track the release gates and any application-facing SLO with these metrics:

loomcache.raft.commit_latency_seconds for Raft commit latency.
WAL fsync logs plus loomcache.raft.fsync_batch_size and loomcache.raft.snapshot_save_seconds for storage tail symptoms.
loomcache.command.queue_wait_seconds and server-busy responses for saturation.
Client-side operation latency histograms from the application or load driver.

Benchmark posture

Run benchmarks against a deployed three-member cluster with the same WAL, TLS/auth, JVM heap, disks, and network placement you plan to operate. Keep WAL fsync enabled for durability-sensitive measurements. If you disable fsync for exploratory testing, label the result as trend-only and do not use it as production evidence.

Record the hardware, JVM flags, value size, client count, achieved QPS, P50/P95/P99/P99.9, error rate, read checks, acknowledged writes, read mismatches, prepopulate read-back checks/errors/mismatches, final verification checks/errors/mismatches, dropped latency samples, maximum WAL fsync latency, disk profile, network path, TLS/auth posture, and runtime config identifiers with every benchmark result. Release evidence generated by bench/run-comparison.sh uses schema v3 and scripts/validate-performance-gates.sh rejects older schemas or final verification mismatches. Re-run the same workload after configuration changes, storage changes, JVM updates, and LoomCache upgrades so regressions are measured against your own baseline.

JVM

The server requires Java 25+ with --enable-preview. The shipped container launches the JVM from docker/entrypoint.sh as java --enable-preview ${JAVA_OPTS} ..., where the image JAVA_OPTS default is:

-XX:InitialRAMPercentage=25
-XX:MaxRAMPercentage=75
-XX:+UseG1GC
-XX:MaxGCPauseMillis=100
-Dcom.sun.net.ssl.checkRevocation=true
-Dcom.sun.security.enableCRLDP=true

The image sizes the heap as a percentage of the container memory limit rather than a fixed -Xmx. Override JAVA_OPTS to pin an explicit heap when you need deterministic sizing.

Tuning advice:

Keep MaxGCPauseMillis at or below the Raft heartbeat interval (100 ms) to avoid GC-induced spurious elections (the shipped Docker default is 100).
For larger heaps (> 8 GiB) add -XX:+AlwaysPreTouch to avoid first-touch page faults.
Virtual threads do not benefit from bigger stacks; leave stack sizes at their defaults.

Virtual threads

The server uses a virtual-thread-per-connection model. When a thread blocks on socket I/O or fsync, the JVM unmounts it from its carrier thread. This lets the server run straightforward blocking code without a reactive framework; validate your target connection count with release performance evidence and the configured admission limits.

Raft timing

LoomCache has two distinct timing layers: the cluster-membership layer (ClusterConfig) and the internal Raft consensus engine. The public configuration keys tune membership heartbeat/failure detection; they do not change the internal Raft election timing.

Cluster membership and failure detection (`ClusterConfig`)

These control peer liveness detection at the membership level:

Property	Default	Notes
`heartbeatIntervalMs`	`5000`	Interval between membership heartbeats and peer pings.
`heartbeatTimeoutMs`	`60000`	A peer unseen this long is considered unreachable.
`idempotencyTtlMs`	`900_000`	Dedup cache retention aligned with 2PC decision retention.

Spring Boot keys: loomcache.server.raft.heartbeat-interval-ms and loomcache.server.raft.election-timeout-ms. Despite the raft prefix, embedded Spring Boot nodes map them to cluster heartbeat and failure-detection windows. Standalone server config uses loomcache.raft.heartbeat-interval-ms and loomcache.raft.election-timeout-ms for the same behavior.

Raft consensus engine defaults

These govern Raft leader election and log replication within the consensus group. They are consensus-engine defaults, not the Spring Boot loomcache.server.raft.* operator settings:

Parameter	Default	Notes
`heartbeatIntervalMs`	`100 ms`	Raft leader → follower AppendEntries / lease-refresh rate.
`electionTimeoutMinMs`	`300 ms`	Minimum randomized follower election timeout.
`electionTimeoutMaxMs`	`600 ms`	Maximum randomized follower election timeout.

Keep MaxGCPauseMillis at or below the Raft heartbeat interval (100 ms) to avoid GC-induced spurious elections. The Raft consensus engine also enables pre-vote, leader lease, and ReadIndex — all on by default.

Client

On LoomClient.Builder:

Setting	Default	Notes
`connectionTimeout`	5 s	Socket connect timeout.
`requestTimeout`	120 s	Per-call.
`maxRetries`	3	Non-negative.
`retryBaseDelay`	100 ms	Exponential × 2 with ±25 % jitter, capped at 5 s.
`nearCacheEnabled`	`false`	Opt-in; server push + TTL when enabled.
`nearCacheTtl`	0 (off)	Fallback TTL; `0` means no time-based expiry.
`nearCacheMaxSize`	10 000	Client-local LRU cap; not server max-idle parity.

Connection management includes reconnect/backoff handling, and per-pool wait statistics are exposed through client connection-pool metrics.

Near cache

The near cache is a client-side LRU with server-push invalidation and sequence tracking. Disable it for write-heavy maps — every write invalidates all subscribers. This is a local cache policy only; server-side LRU/LFU/FIFO/RANDOM, finite max-entry/max-memory eviction, and max-idle remain unsupported for production until eviction decisions are Raft-applied and proven through WAL/snapshot/restart tests.

Capacity sizing

Size by heap budget first, then entries, listener registrations, query execution metadata, and named data-structure count. The Docker image sizes the heap from the container memory limit (-XX:MaxRAMPercentage=75) rather than a fixed -Xmx; production nodes should set an explicit, deterministic heap and leave headroom for Raft, WAL buffers, serializers, metrics, TCP connections, and GC.

Use this worksheet per member:

usable_cache_heap    = Xmx * 0.60
entry_budget         = local_entry_copies * estimated_entry_bytes
listener_budget      = listener_registrations * estimated_listener_registration_bytes
query_metadata_budget = bounded_query_working_set_bytes
remaining_heap       = usable_cache_heap - entry_budget - listener_budget - query_metadata_budget
safe_instance_cap    = floor(remaining_heap / measured_empty_instance_bytes)

Keep the resulting safe_instance_cap at or below the per-type data-structure instance cap (default 10_000, exported as loomcache_datastructures_max_instances_per_type). If remaining_heap is negative, reduce entries, listeners, query metadata, or the instance count before increasing traffic. CREATE INDEX and declarative SQL indexes are unsupported/rejected in this release, so production query budgets must not assume index acceleration.

Per-key entry model

Map instances track estimated live map memory through the runtime memory estimator and enforce maxMemoryBytesPerMap when configured. The runtime estimate for each map entry is:

Component	Bytes used by the built-in estimator
Entry/container overhead	`48`
String key or value	`16 + 3 * char_count`
Non-string key or value	`64` until measured more precisely

For cluster sizing, count stored copies, not only logical keys. The production-supported single-Raft-group path is full replication: every member keeps each committed entry. A steady-state cluster keeps about logical_entries * member_count entry copies before temporary migration headroom. Per member, start with:

local_entry_copies = logical_entries * 1.20

The 1.20 factor reserves imbalance and migration headroom. Raise it for skewed keys or long migrations. For a map with 32-character string keys and 256-character string values, the built-in estimate is 48 + (16 + 3*32) + (16 + 3*256) = 944 bytes per local copy before listener fan-out.

Per-listener model

Listeners add registration state and delivery work, not per-key storage. Model them separately:

Listener type	Stored state to budget
Remote entry listener	One map-to-peer subscription plus one peer-to-map subscription in the listener delivery manager.
Predicate entry listener	Remote entry listener state plus one predicate subscription holding the peer id and predicate object.
Embedded map listener	One `CopyOnWriteArrayList` registration holding the listener reference and optional predicate.
Continuous Query Cache listener	A remote listener plus a client-side cached view of the matching entries.
Topic subscriber	One subscriber entry per subscription, plus optional filtered-subscriber state and executor state when multi-threading is enabled.

Use heap-delta measurements in staging for estimated_listener_registration_bytes; the dominant cost is usually the application listener or predicate capture, not the registry entry itself. Listener fan-out also multiplies mutation CPU and network writes, so capacity tests should include the expected listener count even when heap looks comfortable.

Data-structure count versus RAM

The default maxInstancesPerType = 10_000 is a safety ceiling, not a promise that every heap can run 10,000 active maps, queues, topics, sets, ringbuffers, and CRDTs with entries and listeners. Empty-instance allowance for that ceiling looks like this before entries/listeners/query working sets:

Heap (`Xmx`)	60% usable cache heap	Allowance per empty instance at 10,000 instances
512 MiB	307 MiB	about 31 KiB
2 GiB	1.2 GiB	about 126 KiB
8 GiB	4.8 GiB	about 503 KiB

Measure empty-instance heap delta for the specific data structures you use, then cap names to the lower of the measured safe_instance_cap and 10_000. When embedding a node, set the data-structure instance guardrail during bootstrap. When running the stock server, enforce application-level naming quotas and alert on loomcache_datastructures_max_instances_per_type together with data structure count, memory, and listener-count metrics.

Backpressure & rate limiting

Command-queue backpressure returns a server-busy response when the queue fills; clients back off.
Spring Boot REST rate limiting can cap request rates when the HTTP surface is enabled.

Metrics worth alerting on

loomcache.raft.commit_latency_seconds — Raft health.
WAL fsync logs plus loomcache.raft.fsync_batch_size / loomcache.raft.snapshot_save_seconds — disk bottlenecks.
loomcache.command.queue_wait_seconds — backpressure proximity.
loomcache.tcp.connections.active — server-side connection load; client-side pool wait counts are available through the client stats snapshot, not a Prometheus gauge.
LoomCache does not emit a certificate-expiration gauge. The certExpirationCriticalDays / certExpirationWarningDays thresholds on TlsConfig are configuration only — track certificate notAfter lifetimes externally and rotate before the critical window.

For Spring Boot and the sample deployments, Actuator (including /actuator/prometheus) is served on the Spring Boot web port server.port (8080 in the samples), so scrape https://<node>:8080/actuator/prometheus. Direct-node deployments with the standalone metrics listener can scrape http://<node>:9090/metrics. Sample Grafana dashboards live in grafana/.

LoomCache is an independent open-source project. It is not affiliated with, endorsed by, or sponsored by Hazelcast, Inc. or by any other company whose products are named in this documentation. “Hazelcast” is a trademark of Hazelcast, Inc.; references to it are nominative and describe only migration and comparison. All other product and company names are trademarks of their respective owners and are used for identification purposes only.

Performance Tuning

Java Virtual Threads (Project Loom)

OS Thread 1

OS Thread 2

Performance validation

Benchmark posture

JVM

Virtual threads

Raft timing

Cluster membership and failure detection (ClusterConfig)

Raft consensus engine defaults

Client

Near cache

Capacity sizing

Per-key entry model

Per-listener model

Data-structure count versus RAM

Backpressure & rate limiting

Metrics worth alerting on

Cluster membership and failure detection (`ClusterConfig`)