Skip to content

Kubernetes & Docker Deployment

Kubernetes StatefulSet

Pod Auto-Discovery via Headless Service

Pod-0
10.0.1.5
PVC/wal
Pod-1
10.0.2.5
PVC/wal
Pod-2
10.0.3.5
PVC/wal

LoomCache ships three deployment shapes:

  1. Embedded in an existing JVM (single-node, in-process).
  2. Standalone process — Dockerfile, docker-compose.yml.
  3. Kubernetes — k8s/deployment.yaml for a single-pod smoke test and k8s/configmap.yaml, k8s/service.yaml, k8s/statefulset.yaml for a quorum-backed cluster.

The production profile is intentionally conservative. These paths are not production-supported unless the named evidence exists on the release artifact:

  • Multi-group sharding is unsupported and must fail closed in production until every Raft group has independent WAL, Raft metadata, snapshot, install-snapshot, and restart recovery evidence. Use the default single replicated Raft group for production certification work.
  • Wire CP locks and semaphores are unsupported and fail closed in production until client-visible session lifecycle opcodes exist for create/heartbeat/close/expiry/force-close and are recovered through Raft. CP_ATOMIC_* remains the supported wire CP surface.
  • Registry-backed or local JCache is not durable production storage. JCache is production-durable only when operations are routed through the cluster client and Raft-backed map path with the same TLS/auth and persistence gates as normal LoomMap traffic.
  • MapStore, MapLoader, EntryStore, and generic MapStore are fully disabled/fail-closed in production profiles.
  • QueueStore snapshot/restart parity is not production-supported and must fail closed until restart, rollback, and duplicate/lost item failure windows are certified.
  • CREATE INDEX and declarative SQL indexes are unsupported/rejected in this release and are not Hazelcast index parity claims.
  • LoomMap<K,V> does not imply Hazelcast-style implicit object support. Arbitrary POJO map values are not production-supported in this release; use documented scalar/binary encodings until the public map path is certified.
  • Server-side LRU, LFU, finite max-entry/max-memory eviction, and max-idle semantics are not production-supported until eviction decisions are Raft-applied and proven through WAL/snapshot/restart tests.

The shipped Dockerfile uses eclipse-temurin:25-jdk to build the Spring Boot runnable jar and 25-jre to run with --enable-preview.

Terminal window
docker build -t loomcache:dev .
mkdir -p tls
docker run -d \
--name loomcache \
-p 7654:7654 -p 8080:8080 \
--env-file tls/loomcache.env \
-v "$PWD/tls:/etc/loomcache/tls:ro" \
-e LOOMCACHE_PROFILE=production \
-e LOOMCACHE_CLUSTER_CLUSTER_ID=loomcache-docker-smoke \
-e LOOMCACHE_CLUSTER_SEEDS=loomcache.local:7654 \
-e LOOMCACHE_NODE_ID=node-1 \
-e LOOMCACHE_NODE_HOST=loomcache.local \
-e LOOMCACHE_NODE_PORT=7654 \
-e LOOMCACHE_SERVER_ENABLED=true \
-e LOOMCACHE_SERVER_PORT=7654 \
-e LOOMCACHE_SERVER_BIND_ADDRESS=0.0.0.0 \
-e SERVER_PORT=8080 \
-e LOOMCACHE_SERVER_PERSISTENCE_ENABLED=true \
-e LOOMCACHE_SERVER_PERSISTENCE_WAL_DIRECTORY=/var/lib/loomcache \
-e LOOMCACHE_SERVER_EVICTION_POLICY=NONE \
-e LOOMCACHE_SERVER_EVICTION_MAX_ENTRIES=0 \
-e LOOMCACHE_SERVER_EVICTION_MAX_MEMORY_BYTES=9223372036854775807 \
-e SPRING_DATASOURCE_URL=jdbc:postgresql://postgres.example.com:5432/loomcache \
-e SPRING_DATASOURCE_USERNAME=loomcache \
-e SPRING_JPA_HIBERNATE_DDL_AUTO=validate \
-e SPRING_JPA_DATABASE_PLATFORM=org.hibernate.dialect.PostgreSQLDialect \
-e LOOMCACHE_TLS_ENABLED=true \
-e LOOMCACHE_TLS_REQUIRE_CLIENT_AUTH=true \
-e LOOMCACHE_TLS_KEY_STORE_PATH=/etc/loomcache/tls/keystore.p12 \
-e LOOMCACHE_TLS_TRUST_STORE_PATH=/etc/loomcache/tls/truststore.p12 \
-e LOOMCACHE_TLS_REVOCATION_CHECKING_ENABLED=true \
-e LOOMCACHE_TLS_REVOCATION_SOFT_FAIL=false \
-e LOOMCACHE_AUTH_ENABLED=true \
-e LOOMCACHE_AUTH_GATEWAY_TRUST=false \
-e LOOMCACHE_AUTH_CERT_PERMISSIONS_LOOMCACHEADMIN=ADMIN \
-e LOOMCACHE_AUTH_CERT_PERMISSIONS_LOOMCACHECLIENT=READ_WRITE \
-e SERVER_SSL_ENABLED=true \
-e SERVER_SSL_KEY_STORE=/etc/loomcache/tls/keystore.p12 \
-e SERVER_SSL_TRUST_STORE=/etc/loomcache/tls/truststore.p12 \
-e SERVER_SSL_CLIENT_AUTH=need \
-e JAVA_OPTS="-Xms512m -Xmx512m -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+FlightRecorder -Dloomcache.production.allow-standalone=true -Dcom.sun.net.ssl.checkRevocation=true -Dcom.sun.security.enableCRLDP=true" \
-e LOOMCACHE_HEALTHCHECK_HOST=loomcache.local \
-v loomcache-data:/var/lib/loomcache \
loomcache:dev

The image runs as an unprivileged loomcache user and checks https://loomcache.local:8080/actuator/health/readiness with the mounted CA and PKCS12 client certificate. The tls/loomcache.env file must provide LOOMCACHE_TLS_KEY_STORE_PASSWORD, LOOMCACHE_TLS_TRUST_STORE_PASSWORD, SERVER_SSL_KEY_STORE_PASSWORD, SERVER_SSL_TRUST_STORE_PASSWORD, and SPRING_DATASOURCE_PASSWORD from a secret manager or local test secret file. Client certificates must use CNs that match the configured LOOMCACHE_AUTH_CERT_PERMISSIONS_* entries. The certificate SAN must include loomcache.local, because the healthcheck validates that DNS identity while connecting to loopback. The example opts into standalone production mode for smoke testing; multi-node production deployments should use static Raft bootstrap servers instead.

GraalVM native-image support is not part of the release artifact. See Native Image Feasibility for the current AOT probe and remaining support gates.

Terminal window
docker-compose up -d
docker-compose ps
docker-compose logs -f loomcache-node1
ServiceCluster → hostREST/Actuator → hostMetrics → host
loomcache-node17654:76548080:80809090:9090
loomcache-node27655:76548081:80809091:9090
loomcache-node37656:76548082:80809092:9090

Environment variables consumed by LoomProperties through Spring’s relaxed binding:

  • LOOMCACHE_NODE_PORT, LOOMCACHE_SERVER_PORT, LOOMCACHE_SERVER_BIND_ADDRESS, LOOMCACHE_SERVER_ENABLED
  • LOOMCACHE_CLUSTER_SEEDS
  • LOOMCACHE_SERVER_PERSISTENCE_ENABLED, LOOMCACHE_SERVER_PERSISTENCE_WAL_DIRECTORY, LOOMCACHE_SERVER_PERSISTENCE_SNAPSHOT_THRESHOLD
  • LOOMCACHE_SERVER_EVICTION_POLICY, LOOMCACHE_SERVER_EVICTION_MAX_ENTRIES, LOOMCACHE_SERVER_EVICTION_MAX_MEMORY_BYTES
  • LOOMCACHE_METRICS_PORT
  • LOOMCACHE_SERVER_RAFT_ELECTION_TIMEOUT_MS, LOOMCACHE_SERVER_RAFT_HEARTBEAT_INTERVAL_MS
  • SERVER_PORT for the Spring Boot HTTP listener

Manifests in k8s/:

  • k8s/namespace.yamlloomcache namespace and base labels.
  • k8s/secret.yaml — empty TLS Secret schema placeholders for dry-run validation.
  • k8s/configmap.yamlloomcache-config with cluster-wide env vars.
  • k8s/service.yaml — headless service for DNS + clients.
  • k8s/networkpolicy.yaml — default-deny ingress/egress policy.
  • k8s/poddisruptionbudget.yaml — quorum protection for voluntary disruption.
  • k8s/statefulset.yaml — 3-replica StatefulSet with per-pod PVCs.
  • k8s/deployment.yaml — single-pod smoke-test Deployment with one encrypted PVC.
  • k8s/README.md — apply order, usage boundaries, and StatefulSet vs Deployment guidance.
Terminal window
kubectl apply -f k8s/namespace.yaml
kubectl apply --server-side --dry-run=server -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/networkpolicy.yaml
kubectl apply -f k8s/poddisruptionbudget.yaml
kubectl apply -f k8s/statefulset.yaml
kubectl rollout status statefulset/loomcache -n loomcache

The production manifests expect a pre-provisioned loomcache-kms-encrypted StorageClass that creates encrypted volumes with the production KMS key. The StatefulSet uses updateStrategy: OnDelete so server upgrades are manual homogeneous maintenance events rather than automatic rolling updates. For digest cutovers, run K8S_GATE_RESTART_CHECK=1 scripts/k8s-manifest-gate.sh so the gate deliberately replaces every OnDelete member and verifies all pods are running the stamped digest. The Kubernetes manifest gate verifies the stamped image with gh attestation verify before apply and runs live NetworkPolicy probes for explicitly allowed and unlabeled denied client pods after rollout.

Do not apply k8s/secret.yaml over a namespace where the production secret manager has populated TLS material. The file is intentionally empty in git and exists for schema review and server-side dry-run validation only.

For a quick single-member smoke test:

Terminal window
kubectl apply -f k8s/namespace.yaml
kubectl apply --server-side --dry-run=server -f k8s/secret.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/networkpolicy.yaml
kubectl apply -f k8s/deployment.yaml
kubectl rollout status deployment/loomcache-single -n loomcache

Do not scale the sample Deployment past one replica; it self-seeds and uses one encrypted PVC. Use the StatefulSet for any multi-member or production-like cluster. The single-pod sample is covered by loomcache-single-network-policy; keep that policy in place for any smoke-test apply path.

For in-cluster peer discovery you can enable the built-in Kubernetes API strategy and omit loomcache.cluster.seeds. The API server URL, namespace file, bearer-token file, and CA certificate default to the standard service-account mount paths:

Terminal window
LOOMCACHE_DISCOVERY_KUBERNETES_ENABLED=true
LOOMCACHE_DISCOVERY_KUBERNETES_SERVICE_NAME=loomcache
LOOMCACHE_DISCOVERY_KUBERNETES_PORT_NAME=member
# or resolve pods directly:
LOOMCACHE_DISCOVERY_KUBERNETES_POD_LABEL_SELECTOR=app=loomcache

Add loom-spring-boot to your application:

loomcache:
node:
host: node-1.internal.example.com
port: 5701
server:
enabled: true
bind-address: 0.0.0.0
port: 5701
persistence:
enabled: true
wal-directory: /var/lib/loomcache
cluster:
seeds:
- host1:5701
- host2:5701
- host3:5701

For Kubernetes API discovery, replace cluster.seeds with:

loomcache:
discovery:
kubernetes:
enabled: true
service-name: loomcache
port-name: member
auto-populate-member-attributes: true
partition-group-label: loomcache.io/partition-group

Use pod-label-selector instead of service-name when you want to resolve pods directly. api-server, namespace-path, token-path, and ca-certificate-path default to the in-cluster service-account locations. When auto-populate-member-attributes is enabled, discovery copies Kubernetes Node labels into rack, zone, region, node, and partition-group peer attributes; grant the service account get on nodes in addition to the peer-discovery resources. Use partition-group.type: SPI when backup placement should follow the discovered partition-group hint.

The auto-configuration wires a CacheNode, a LoomClient, a LoomCacheManager, and the REST controllers listed in Client API.

LoomCache uses LoomVersion for semantic software versions and PROTOCOL_HELLO for opt-in protocol negotiation. The current build advertises:

FieldCurrent valueMeaning
CURRENT_VERSION1.0.0Software version of this build.
MIN_COMPATIBLE_VERSION1.0.0Oldest peer this build declares compatible.
PROTOCOL_VERSION2Wire codec version; both sides must use this version today.

Compatibility is bidirectional: the client/server or peer/peer pair is supported only when each side’s version is at least the other’s advertised minimum-compatible version, both versions have the same major version, and both sides use wire protocol version 2.

Client / peerServer / peerSupported?Policy
1.0.x, protocol 21.0.x, protocol 2YesDefault supported range for this release line.
1.y.z, protocol 21.x.z, protocol 2Yes, if both advertise compatible minimumsSame-major minor/patch mixes are allowed by the handshake, but new features must stay disabled until every required server supports them.
1.x, protocol not 21.x, protocol 2NoThe current codec does not down-negotiate wire versions. Upgrade the older side first or keep a homogeneous window.
2.x1.xNoMajor versions are breaking. Handshake rejects them.
Any version older than the peer’s MIN_COMPATIBLE_VERSIONCurrent peerNoUpgrade the older side before connecting it.
Unknown or un-handshaken legacy clientCurrent serverBest effort onlyEnable LoomClient.Builder.strictHandshake(true) in production so incompatible peers fail at connect time instead of later in the request path.

Server-to-server rolling upgrades are stricter than client/server handshakes. The cluster version gate exposed by POST /api/cluster/version rejects future, too-old, and incompatible versions, but mixed server versions under production write load remain a release-notes decision. Until a release explicitly marks rolling server upgrades supported, use homogeneous server-version windows: back up the cluster, stop or drain traffic according to the degradation matrix, upgrade every member to the same server build, then return the cluster to ACTIVE.

During every upgrade:

  1. Confirm /api/cluster/status reports the expected clusterVersion, member liveness, and operational state.
  2. Enable client strictHandshake(true) for new deployments and verify loomcache.handshake.rejected stays at zero.
  3. Upgrade clients within the same major version before they use new feature-gated APIs.
  4. Keep a rollback copy of the previous server artifact and data backup until smoke tests pass.
  5. Run the SLO benchmark and persistence validation from the performance and persistence guides after the upgrade.

LoomCache treats public Java APIs, REST endpoints, configuration keys, wire opcodes, WAL records, and snapshot formats as compatibility commitments. Deprecation is the default path for changing those surfaces; silent removal is allowed only for unreleased internals or security emergencies.

SurfaceMinimum noticeRemoval rule
Public Java client/server APIsOne minor releaseMark with @Deprecated(since = "...", forRemoval = ...), document the replacement, and keep source compatibility until the announced removal release.
REST endpoints and configuration keysOne minor releaseKeep the old path or key as an alias, emit an operator warning, and document the new path or key in release notes.
Wire opcodes and protocol fieldsRemainder of the current major releaseDo not reuse opcode byte values. Add new behavior behind PROTOCOL_HELLO feature negotiation and keep the old request form until the next major version.
WAL and snapshot formatsRemainder of the current major releaseKeep dual-read support for old persisted records. Write the new format only after the release notes describe rollback and restore limits.
Unsafe or vulnerable behaviorSecurity-advisory windowA CVE, credential leak, or data-corruption risk can shorten notice, but the advisory must include the mitigation and the safest upgrade path.

Every deprecation entry must include a migration path before it ships:

  1. Replacement API, config key, endpoint, opcode, or format.
  2. Runtime warning or handshake rejection mode that lets operators find usage before removal.
  3. Release-note entry naming the first deprecated release and the earliest possible removal release.
  4. Tests that cover the old and new behavior while both are supported, including protocol negotiation for wire changes.
  5. Compatibility-matrix update when the change affects mixed-version clients, peers, snapshots, or WAL files.

Removal is a separate release decision. The removal PR must point back to the deprecation notice, delete or migrate the compatibility tests intentionally, and confirm the release notes still give a working migration or rollback story.

See Default Ports for the full allocation and the 5701 vs. 7654 split.

PortRoleTypical scope
5701Default JVM/Spring Boot binary member TCPLocal JVM, bare metal, direct config
7654Docker/Kubernetes sample binary member TCPContainer samples
9090Prometheus scrapeKubernetes service to Spring Actuator, or direct metrics listener
8080Spring Boot HTTP / RESTOptional Boot app and management JSON

See Security & mTLS. Minimum viable production configuration: tlsConfig.enabled = true, keystore and truststore paths pointing at PKCS12 files, requireClientAuth = true, revocationCheckingEnabled = true, and revocationSoftFail = false; also enable JSSE CRLDP or OCSP with JVM flags so hard-fail revocation has a source.