Kubernetes & Docker Deployment
LoomCache is fully cloud-native and designed to run inside Linux containers. Because nodes must persist state to Write-Ahead Logs (WAL) and establish stable network identities for Raft consensus, it is highly recommended to deploy using Kubernetes StatefulSets.
Kubernetes StatefulSet
Pod Auto-Discovery via Headless Service
Docker Single-Node Quickstart
Section titled “Docker Single-Node Quickstart”# Build the Docker imagedocker build -t loomcache:1.0 .
# Run a single nodedocker run -d \ --name loomcache-node \ -p 7654:7654 -p 9090:9090 -p 8080:8080 \ -e LOOMCACHE_NODE_ID=single-node \ -e LOOMCACHE_SERVER_ENABLED=true \ -e LOOMCACHE_SERVER_PORT=7654 \ -e LOOMCACHE_SERVER_PERSISTENCE_ENABLED=true \ -v loomcache-wal:/var/lib/loomcache/wal \ loomcache:1.0
# Health checkcurl http://localhost:8080/health
# Prometheus metricscurl http://localhost:9090/metricsDocker Compose 3-Node Cluster
Section titled “Docker Compose 3-Node Cluster”# Start the clusterdocker-compose up -d
# Verify all nodes are runningdocker-compose ps
# Check health on individual nodescurl http://localhost:7654/healthcurl http://localhost:7655/healthcurl http://localhost:7656/healthThe docker-compose.yml defines three nodes with DNS-based intra-cluster communication:
| Node | Client Port | Metrics Port | Internal DNS |
|---|---|---|---|
| node-1 | 7654 | 9090 | loomcache-node1:7654 |
| node-2 | 7655 | 9091 | loomcache-node2:7654 |
| node-3 | 7656 | 9092 | loomcache-node3:7654 |
Kubernetes Deployment
Section titled “Kubernetes Deployment”Stateful Cluster Topologies
Section titled “Stateful Cluster Topologies”Unlike stateless microservices, LoomCache nodes form a cohesive, consistent hash ring. They require stable network identifiers so that clients can securely route packets to the correct partition leaders without constant DNS cache misses.
StatefulSet Configuration
Section titled “StatefulSet Configuration”apiVersion: apps/v1kind: StatefulSetmetadata: name: loomcachespec: serviceName: loomcache-headless replicas: 3 selector: matchLabels: app: loomcache template: metadata: labels: app: loomcache spec: containers: - name: loomcache image: loomcache:1.0 ports: - containerPort: 7654 name: cache - containerPort: 9090 name: metrics - containerPort: 8080 name: health env: - name: LOOMCACHE_NODE_ID valueFrom: fieldRef: fieldPath: metadata.name - name: LOOMCACHE_NODES value: "loomcache-0.loomcache-headless:7654,loomcache-1.loomcache-headless:7654,loomcache-2.loomcache-headless:7654" volumeMounts: - name: wal-data mountPath: /var/lib/loomcache/wal livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 volumeClaimTemplates: - metadata: name: wal-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 10GiAuto-Discovery via Headless Services
Section titled “Auto-Discovery via Headless Services”When deploying to Kubernetes, nodes dynamically discover each other using a headless service. The DnsDiscovery or EnvironmentDiscovery strategies allow new pods to query the Kubernetes API:
- Pod-0 starts up and checks
loomcache-headless.default.svc.cluster.local. - As Pod-1 and Pod-2 spin up, they see the seed nodes and initiate Raft Pre-Vote and Leader Election.
- Once a majority (quorum of 2/3) is achieved, the cluster turns green and begins accepting writes.
apiVersion: v1kind: Servicemetadata: name: loomcache-headlessspec: clusterIP: None selector: app: loomcache ports: - port: 7654 name: cachePersistent Volumes (PVCs)
Section titled “Persistent Volumes (PVCs)”Every pod requires a Persistent Volume Claim (PVC) mounted to /var/lib/loomcache/wal. This directory holds the active .dat files for the Write-Ahead Log. Even if a pod crashes, Kubernetes will remount the exact same PVC to the new pod, allowing it to instantly replay the WAL from the last index and effortlessly rejoin the cluster.
Do not use emptyDir or ephemeral storage unless you are testing locally.
Health Probes & Circuit Breakers
Section titled “Health Probes & Circuit Breakers”LoomCache provides an HTTP endpoint for Kubernetes Readiness and Liveness probes (/health). Intra-cluster detection is even faster: LoomCache natively implements Akka-style Phi-Accrual Failure Detectors via the DiscoveryHealthChecker. If a node experiences a sudden JVM crash or network partition, the remaining peers’ Circuit Breakers trigger to OPEN state, instantly re-routing traffic.
Monitoring with Prometheus
Section titled “Monitoring with Prometheus”Every LoomCache node exposes Micrometer metrics in Prometheus format:
# prometheus.yml scrape configscrape_configs: - job_name: loomcache static_configs: - targets: - loomcache-0:9090 - loomcache-1:9090 - loomcache-2:9090Key metrics:
| Metric | Description |
|---|---|
loomcache_raft_term | Current Raft term |
loomcache_raft_commit_index | Committed log index |
loomcache_raft_replication_lag | Per-follower replication lag |
loomcache_cache_hits_total | Cache hit count |
loomcache_cache_misses_total | Cache miss count |
loomcache_evictions_total | Eviction count by policy |
loomcache_circuit_breaker_state | Per-peer circuit breaker state |
loomcache_audit_events_total | Total audit events logged |
Backup and Restore
Section titled “Backup and Restore”WAL Backup
Section titled “WAL Backup”# Backup a Docker volumedocker run --rm \ -v loomcache-wal-1:/data \ -v $(pwd):/backup \ alpine tar czf /backup/wal-backup.tar.gz -C /data .
# Restoredocker volume create loomcache-wal-restoreddocker run --rm \ -v loomcache-wal-restored:/data \ -v $(pwd):/backup \ alpine tar xzf /backup/wal-backup.tar.gz -C /dataSnapshot-Based Recovery
Section titled “Snapshot-Based Recovery”LoomCache creates automatic snapshots every 10,000 committed entries. On restart, nodes load the latest snapshot and replay only the WAL entries after the snapshot index — reducing startup time for long-running clusters.
Graceful Shutdown
Section titled “Graceful Shutdown”LoomCache implements an 8-phase graceful shutdown with configurable drain timeout:
- Stop accepting new connections
- Drain in-flight requests (default: 200ms)
- Transfer leadership via TimeoutNow RPC
- Flush pending WAL entries
- Close client connections
- Stop Raft replication
- Close peer connections
- Final cleanup and exit
# Graceful stop in Dockerdocker stop --time=30 loomcache-node