Kubernetes & Docker Deployment
Kubernetes StatefulSet
Pod Auto-Discovery via Headless Service
LoomCache ships three deployment shapes:
- Embedded in an existing JVM (single-node, in-process).
- Standalone process —
Dockerfile,docker-compose.yml. - Kubernetes —
k8s/deployment.yamlfor a single-pod smoke test andk8s/configmap.yaml,k8s/service.yaml,k8s/statefulset.yamlfor a quorum-backed cluster.
Production support boundaries
Section titled “Production support boundaries”The production profile is intentionally conservative. These paths are not production-supported unless the named evidence exists on the release artifact:
- Multi-group sharding is unsupported and must fail closed in production until every Raft group has independent WAL, Raft metadata, snapshot, install-snapshot, and restart recovery evidence. Use the default single replicated Raft group for production certification work.
- Wire CP locks and semaphores are unsupported and fail closed in production until client-visible session lifecycle
opcodes exist for create/heartbeat/close/expiry/force-close and are recovered through Raft.
CP_ATOMIC_*remains the supported wire CP surface. - Registry-backed or local JCache is not durable production storage. JCache is production-durable only when operations
are routed through the cluster client and Raft-backed map path with the same TLS/auth and persistence gates as normal
LoomMaptraffic. - MapStore, MapLoader, EntryStore, and generic MapStore are fully disabled/fail-closed in production profiles.
- QueueStore snapshot/restart parity is not production-supported and must fail closed until restart, rollback, and duplicate/lost item failure windows are certified.
CREATE INDEXand declarative SQL indexes are unsupported/rejected in this release and are not Hazelcast index parity claims.LoomMap<K,V>does not imply Hazelcast-style implicit object support. Arbitrary POJO map values are not production-supported in this release; use documented scalar/binary encodings until the public map path is certified.- Server-side LRU, LFU, finite max-entry/max-memory eviction, and max-idle semantics are not production-supported until eviction decisions are Raft-applied and proven through WAL/snapshot/restart tests.
Docker single-node
Section titled “Docker single-node”The shipped Dockerfile uses eclipse-temurin:25-jdk to build the Spring Boot runnable jar and 25-jre to run with
--enable-preview.
docker build -t loomcache:dev .mkdir -p tls
docker run -d \ --name loomcache \ -p 7654:7654 -p 8080:8080 \ --env-file tls/loomcache.env \ -v "$PWD/tls:/etc/loomcache/tls:ro" \ -e LOOMCACHE_PROFILE=production \ -e LOOMCACHE_CLUSTER_CLUSTER_ID=loomcache-docker-smoke \ -e LOOMCACHE_CLUSTER_SEEDS=loomcache.local:7654 \ -e LOOMCACHE_NODE_ID=node-1 \ -e LOOMCACHE_NODE_HOST=loomcache.local \ -e LOOMCACHE_NODE_PORT=7654 \ -e LOOMCACHE_SERVER_ENABLED=true \ -e LOOMCACHE_SERVER_PORT=7654 \ -e LOOMCACHE_SERVER_BIND_ADDRESS=0.0.0.0 \ -e SERVER_PORT=8080 \ -e LOOMCACHE_SERVER_PERSISTENCE_ENABLED=true \ -e LOOMCACHE_SERVER_PERSISTENCE_WAL_DIRECTORY=/var/lib/loomcache \ -e LOOMCACHE_SERVER_EVICTION_POLICY=NONE \ -e LOOMCACHE_SERVER_EVICTION_MAX_ENTRIES=0 \ -e LOOMCACHE_SERVER_EVICTION_MAX_MEMORY_BYTES=9223372036854775807 \ -e SPRING_DATASOURCE_URL=jdbc:postgresql://postgres.example.com:5432/loomcache \ -e SPRING_DATASOURCE_USERNAME=loomcache \ -e SPRING_JPA_HIBERNATE_DDL_AUTO=validate \ -e SPRING_JPA_DATABASE_PLATFORM=org.hibernate.dialect.PostgreSQLDialect \ -e LOOMCACHE_TLS_ENABLED=true \ -e LOOMCACHE_TLS_REQUIRE_CLIENT_AUTH=true \ -e LOOMCACHE_TLS_KEY_STORE_PATH=/etc/loomcache/tls/keystore.p12 \ -e LOOMCACHE_TLS_TRUST_STORE_PATH=/etc/loomcache/tls/truststore.p12 \ -e LOOMCACHE_TLS_REVOCATION_CHECKING_ENABLED=true \ -e LOOMCACHE_TLS_REVOCATION_SOFT_FAIL=false \ -e LOOMCACHE_AUTH_ENABLED=true \ -e LOOMCACHE_AUTH_GATEWAY_TRUST=false \ -e LOOMCACHE_AUTH_CERT_PERMISSIONS_LOOMCACHEADMIN=ADMIN \ -e LOOMCACHE_AUTH_CERT_PERMISSIONS_LOOMCACHECLIENT=READ_WRITE \ -e SERVER_SSL_ENABLED=true \ -e SERVER_SSL_KEY_STORE=/etc/loomcache/tls/keystore.p12 \ -e SERVER_SSL_TRUST_STORE=/etc/loomcache/tls/truststore.p12 \ -e SERVER_SSL_CLIENT_AUTH=need \ -e JAVA_OPTS="-Xms512m -Xmx512m -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+FlightRecorder -Dloomcache.production.allow-standalone=true -Dcom.sun.net.ssl.checkRevocation=true -Dcom.sun.security.enableCRLDP=true" \ -e LOOMCACHE_HEALTHCHECK_HOST=loomcache.local \ -v loomcache-data:/var/lib/loomcache \ loomcache:devThe image runs as an unprivileged loomcache user and checks
https://loomcache.local:8080/actuator/health/readiness with the mounted CA and PKCS12 client certificate. The
tls/loomcache.env file must provide LOOMCACHE_TLS_KEY_STORE_PASSWORD, LOOMCACHE_TLS_TRUST_STORE_PASSWORD,
SERVER_SSL_KEY_STORE_PASSWORD, SERVER_SSL_TRUST_STORE_PASSWORD, and SPRING_DATASOURCE_PASSWORD from a secret
manager or local test secret file. Client certificates must use CNs that match the configured
LOOMCACHE_AUTH_CERT_PERMISSIONS_* entries. The certificate SAN must include loomcache.local, because the healthcheck
validates that DNS identity while connecting to loopback. The example opts into standalone production mode for smoke
testing; multi-node production deployments should use static Raft bootstrap servers instead.
GraalVM native-image support is not part of the release artifact. See Native Image Feasibility for the current AOT probe and remaining support gates.
Docker Compose 3-node cluster
Section titled “Docker Compose 3-node cluster”docker-compose up -ddocker-compose psdocker-compose logs -f loomcache-node1| Service | Cluster → host | REST/Actuator → host | Metrics → host |
|---|---|---|---|
loomcache-node1 | 7654:7654 | 8080:8080 | 9090:9090 |
loomcache-node2 | 7655:7654 | 8081:8080 | 9091:9090 |
loomcache-node3 | 7656:7654 | 8082:8080 | 9092:9090 |
Environment variables consumed by LoomProperties through Spring’s relaxed binding:
LOOMCACHE_NODE_PORT,LOOMCACHE_SERVER_PORT,LOOMCACHE_SERVER_BIND_ADDRESS,LOOMCACHE_SERVER_ENABLEDLOOMCACHE_CLUSTER_SEEDSLOOMCACHE_SERVER_PERSISTENCE_ENABLED,LOOMCACHE_SERVER_PERSISTENCE_WAL_DIRECTORY,LOOMCACHE_SERVER_PERSISTENCE_SNAPSHOT_THRESHOLDLOOMCACHE_SERVER_EVICTION_POLICY,LOOMCACHE_SERVER_EVICTION_MAX_ENTRIES,LOOMCACHE_SERVER_EVICTION_MAX_MEMORY_BYTESLOOMCACHE_METRICS_PORTLOOMCACHE_SERVER_RAFT_ELECTION_TIMEOUT_MS,LOOMCACHE_SERVER_RAFT_HEARTBEAT_INTERVAL_MSSERVER_PORTfor the Spring Boot HTTP listener
Kubernetes
Section titled “Kubernetes”Manifests in k8s/:
k8s/namespace.yaml—loomcachenamespace and base labels.k8s/secret.yaml— empty TLS Secret schema placeholders for dry-run validation.k8s/configmap.yaml—loomcache-configwith cluster-wide env vars.k8s/service.yaml— headless service for DNS + clients.k8s/networkpolicy.yaml— default-deny ingress/egress policy.k8s/poddisruptionbudget.yaml— quorum protection for voluntary disruption.k8s/statefulset.yaml— 3-replica StatefulSet with per-pod PVCs.k8s/deployment.yaml— single-pod smoke-test Deployment with one encrypted PVC.k8s/README.md— apply order, usage boundaries, and StatefulSet vs Deployment guidance.
kubectl apply -f k8s/namespace.yamlkubectl apply --server-side --dry-run=server -f k8s/secret.yamlkubectl apply -f k8s/configmap.yamlkubectl apply -f k8s/service.yamlkubectl apply -f k8s/networkpolicy.yamlkubectl apply -f k8s/poddisruptionbudget.yamlkubectl apply -f k8s/statefulset.yamlkubectl rollout status statefulset/loomcache -n loomcacheThe production manifests expect a pre-provisioned loomcache-kms-encrypted StorageClass that creates encrypted volumes
with the production KMS key. The StatefulSet uses updateStrategy: OnDelete so server upgrades are manual homogeneous
maintenance events rather than automatic rolling updates.
For digest cutovers, run K8S_GATE_RESTART_CHECK=1 scripts/k8s-manifest-gate.sh so the gate deliberately replaces every
OnDelete member and verifies all pods are running the stamped digest. The Kubernetes manifest gate verifies the stamped
image with gh attestation verify before apply and runs live NetworkPolicy probes for explicitly allowed and unlabeled
denied client pods after rollout.
Do not apply k8s/secret.yaml over a namespace where the production secret manager has populated TLS material. The file
is intentionally empty in git and exists for schema review and server-side dry-run validation only.
For a quick single-member smoke test:
kubectl apply -f k8s/namespace.yamlkubectl apply --server-side --dry-run=server -f k8s/secret.yamlkubectl apply -f k8s/service.yamlkubectl apply -f k8s/networkpolicy.yamlkubectl apply -f k8s/deployment.yamlkubectl rollout status deployment/loomcache-single -n loomcacheDo not scale the sample Deployment past one replica; it self-seeds and uses one encrypted PVC. Use the StatefulSet for
any multi-member or production-like cluster. The single-pod sample is covered by loomcache-single-network-policy;
keep that policy in place for any smoke-test apply path.
For in-cluster peer discovery you can enable the built-in Kubernetes API strategy and omit
loomcache.cluster.seeds. The API server URL, namespace file, bearer-token file, and CA certificate default to the
standard service-account mount paths:
LOOMCACHE_DISCOVERY_KUBERNETES_ENABLED=trueLOOMCACHE_DISCOVERY_KUBERNETES_SERVICE_NAME=loomcacheLOOMCACHE_DISCOVERY_KUBERNETES_PORT_NAME=member# or resolve pods directly:LOOMCACHE_DISCOVERY_KUBERNETES_POD_LABEL_SELECTOR=app=loomcacheSpring Boot embedded
Section titled “Spring Boot embedded”Add loom-spring-boot to your application:
loomcache: node: host: node-1.internal.example.com port: 5701 server: enabled: true bind-address: 0.0.0.0 port: 5701 persistence: enabled: true wal-directory: /var/lib/loomcache cluster: seeds: - host1:5701 - host2:5701 - host3:5701For Kubernetes API discovery, replace cluster.seeds with:
loomcache: discovery: kubernetes: enabled: true service-name: loomcache port-name: member auto-populate-member-attributes: true partition-group-label: loomcache.io/partition-groupUse pod-label-selector instead of service-name when you want to resolve pods directly. api-server,
namespace-path, token-path, and ca-certificate-path default to the in-cluster service-account locations.
When auto-populate-member-attributes is enabled, discovery copies Kubernetes Node labels into rack, zone,
region, node, and partition-group peer attributes; grant the service account get on nodes in addition to the
peer-discovery resources. Use partition-group.type: SPI when backup placement should follow the discovered
partition-group hint.
The auto-configuration wires a CacheNode, a LoomClient, a LoomCacheManager, and the REST controllers listed in
Client API.
Version compatibility
Section titled “Version compatibility”LoomCache uses LoomVersion for semantic software versions and PROTOCOL_HELLO for opt-in protocol negotiation.
The current build advertises:
| Field | Current value | Meaning |
|---|---|---|
CURRENT_VERSION | 1.0.0 | Software version of this build. |
MIN_COMPATIBLE_VERSION | 1.0.0 | Oldest peer this build declares compatible. |
PROTOCOL_VERSION | 2 | Wire codec version; both sides must use this version today. |
Compatibility is bidirectional: the client/server or peer/peer pair is supported only when each side’s version is at
least the other’s advertised minimum-compatible version, both versions have the same major version, and both sides use
wire protocol version 2.
| Client / peer | Server / peer | Supported? | Policy |
|---|---|---|---|
1.0.x, protocol 2 | 1.0.x, protocol 2 | Yes | Default supported range for this release line. |
1.y.z, protocol 2 | 1.x.z, protocol 2 | Yes, if both advertise compatible minimums | Same-major minor/patch mixes are allowed by the handshake, but new features must stay disabled until every required server supports them. |
1.x, protocol not 2 | 1.x, protocol 2 | No | The current codec does not down-negotiate wire versions. Upgrade the older side first or keep a homogeneous window. |
2.x | 1.x | No | Major versions are breaking. Handshake rejects them. |
Any version older than the peer’s MIN_COMPATIBLE_VERSION | Current peer | No | Upgrade the older side before connecting it. |
| Unknown or un-handshaken legacy client | Current server | Best effort only | Enable LoomClient.Builder.strictHandshake(true) in production so incompatible peers fail at connect time instead of later in the request path. |
Server-to-server rolling upgrades are stricter than client/server handshakes. The cluster version gate exposed by
POST /api/cluster/version rejects future, too-old, and incompatible versions, but mixed server versions under
production write load remain a release-notes decision. Until a release explicitly marks rolling server upgrades
supported, use homogeneous server-version windows: back up the cluster, stop or drain traffic according to the
degradation matrix, upgrade every member to the same server build, then return the cluster to ACTIVE.
During every upgrade:
- Confirm
/api/cluster/statusreports the expectedclusterVersion, member liveness, and operational state. - Enable client
strictHandshake(true)for new deployments and verifyloomcache.handshake.rejectedstays at zero. - Upgrade clients within the same major version before they use new feature-gated APIs.
- Keep a rollback copy of the previous server artifact and data backup until smoke tests pass.
- Run the SLO benchmark and persistence validation from the performance and persistence guides after the upgrade.
Deprecation policy
Section titled “Deprecation policy”LoomCache treats public Java APIs, REST endpoints, configuration keys, wire opcodes, WAL records, and snapshot formats as compatibility commitments. Deprecation is the default path for changing those surfaces; silent removal is allowed only for unreleased internals or security emergencies.
| Surface | Minimum notice | Removal rule |
|---|---|---|
| Public Java client/server APIs | One minor release | Mark with @Deprecated(since = "...", forRemoval = ...), document the replacement, and keep source compatibility until the announced removal release. |
| REST endpoints and configuration keys | One minor release | Keep the old path or key as an alias, emit an operator warning, and document the new path or key in release notes. |
| Wire opcodes and protocol fields | Remainder of the current major release | Do not reuse opcode byte values. Add new behavior behind PROTOCOL_HELLO feature negotiation and keep the old request form until the next major version. |
| WAL and snapshot formats | Remainder of the current major release | Keep dual-read support for old persisted records. Write the new format only after the release notes describe rollback and restore limits. |
| Unsafe or vulnerable behavior | Security-advisory window | A CVE, credential leak, or data-corruption risk can shorten notice, but the advisory must include the mitigation and the safest upgrade path. |
Every deprecation entry must include a migration path before it ships:
- Replacement API, config key, endpoint, opcode, or format.
- Runtime warning or handshake rejection mode that lets operators find usage before removal.
- Release-note entry naming the first deprecated release and the earliest possible removal release.
- Tests that cover the old and new behavior while both are supported, including protocol negotiation for wire changes.
- Compatibility-matrix update when the change affects mixed-version clients, peers, snapshots, or WAL files.
Removal is a separate release decision. The removal PR must point back to the deprecation notice, delete or migrate the compatibility tests intentionally, and confirm the release notes still give a working migration or rollback story.
See Default Ports for the full allocation and the 5701 vs. 7654 split.
| Port | Role | Typical scope |
|---|---|---|
| 5701 | Default JVM/Spring Boot binary member TCP | Local JVM, bare metal, direct config |
| 7654 | Docker/Kubernetes sample binary member TCP | Container samples |
| 9090 | Prometheus scrape | Kubernetes service to Spring Actuator, or direct metrics listener |
| 8080 | Spring Boot HTTP / REST | Optional Boot app and management JSON |
TLS / mTLS
Section titled “TLS / mTLS”See Security & mTLS. Minimum viable production configuration: tlsConfig.enabled = true,
keystore and truststore paths pointing at PKCS12 files, requireClientAuth = true,
revocationCheckingEnabled = true, and revocationSoftFail = false; also enable JSSE CRLDP or OCSP with JVM flags
so hard-fail revocation has a source.