Skip to content

Kubernetes & Docker Deployment

LoomCache is fully cloud-native and designed to run inside Linux containers. Because nodes must persist state to Write-Ahead Logs (WAL) and establish stable network identities for Raft consensus, it is highly recommended to deploy using Kubernetes StatefulSets.

Kubernetes StatefulSet

Pod Auto-Discovery via Headless Service

Pod-0
10.0.1.5
PVC/wal
Pod-1
10.0.2.5
PVC/wal
Pod-2
10.0.3.5
PVC/wal
Terminal window
# Build the Docker image
docker build -t loomcache:1.0 .
# Run a single node
docker run -d \
--name loomcache-node \
-p 7654:7654 -p 9090:9090 -p 8080:8080 \
-e LOOMCACHE_NODE_ID=single-node \
-e LOOMCACHE_SERVER_ENABLED=true \
-e LOOMCACHE_SERVER_PORT=7654 \
-e LOOMCACHE_SERVER_PERSISTENCE_ENABLED=true \
-v loomcache-wal:/var/lib/loomcache/wal \
loomcache:1.0
# Health check
curl http://localhost:8080/health
# Prometheus metrics
curl http://localhost:9090/metrics
Terminal window
# Start the cluster
docker-compose up -d
# Verify all nodes are running
docker-compose ps
# Check health on individual nodes
curl http://localhost:7654/health
curl http://localhost:7655/health
curl http://localhost:7656/health

The docker-compose.yml defines three nodes with DNS-based intra-cluster communication:

NodeClient PortMetrics PortInternal DNS
node-176549090loomcache-node1:7654
node-276559091loomcache-node2:7654
node-376569092loomcache-node3:7654

Unlike stateless microservices, LoomCache nodes form a cohesive, consistent hash ring. They require stable network identifiers so that clients can securely route packets to the correct partition leaders without constant DNS cache misses.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: loomcache
spec:
serviceName: loomcache-headless
replicas: 3
selector:
matchLabels:
app: loomcache
template:
metadata:
labels:
app: loomcache
spec:
containers:
- name: loomcache
image: loomcache:1.0
ports:
- containerPort: 7654
name: cache
- containerPort: 9090
name: metrics
- containerPort: 8080
name: health
env:
- name: LOOMCACHE_NODE_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: LOOMCACHE_NODES
value: "loomcache-0.loomcache-headless:7654,loomcache-1.loomcache-headless:7654,loomcache-2.loomcache-headless:7654"
volumeMounts:
- name: wal-data
mountPath: /var/lib/loomcache/wal
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
volumeClaimTemplates:
- metadata:
name: wal-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi

When deploying to Kubernetes, nodes dynamically discover each other using a headless service. The DnsDiscovery or EnvironmentDiscovery strategies allow new pods to query the Kubernetes API:

  1. Pod-0 starts up and checks loomcache-headless.default.svc.cluster.local.
  2. As Pod-1 and Pod-2 spin up, they see the seed nodes and initiate Raft Pre-Vote and Leader Election.
  3. Once a majority (quorum of 2/3) is achieved, the cluster turns green and begins accepting writes.
apiVersion: v1
kind: Service
metadata:
name: loomcache-headless
spec:
clusterIP: None
selector:
app: loomcache
ports:
- port: 7654
name: cache

Every pod requires a Persistent Volume Claim (PVC) mounted to /var/lib/loomcache/wal. This directory holds the active .dat files for the Write-Ahead Log. Even if a pod crashes, Kubernetes will remount the exact same PVC to the new pod, allowing it to instantly replay the WAL from the last index and effortlessly rejoin the cluster.

Do not use emptyDir or ephemeral storage unless you are testing locally.

LoomCache provides an HTTP endpoint for Kubernetes Readiness and Liveness probes (/health). Intra-cluster detection is even faster: LoomCache natively implements Akka-style Phi-Accrual Failure Detectors via the DiscoveryHealthChecker. If a node experiences a sudden JVM crash or network partition, the remaining peers’ Circuit Breakers trigger to OPEN state, instantly re-routing traffic.

Every LoomCache node exposes Micrometer metrics in Prometheus format:

# prometheus.yml scrape config
scrape_configs:
- job_name: loomcache
static_configs:
- targets:
- loomcache-0:9090
- loomcache-1:9090
- loomcache-2:9090

Key metrics:

MetricDescription
loomcache_raft_termCurrent Raft term
loomcache_raft_commit_indexCommitted log index
loomcache_raft_replication_lagPer-follower replication lag
loomcache_cache_hits_totalCache hit count
loomcache_cache_misses_totalCache miss count
loomcache_evictions_totalEviction count by policy
loomcache_circuit_breaker_statePer-peer circuit breaker state
loomcache_audit_events_totalTotal audit events logged
Terminal window
# Backup a Docker volume
docker run --rm \
-v loomcache-wal-1:/data \
-v $(pwd):/backup \
alpine tar czf /backup/wal-backup.tar.gz -C /data .
# Restore
docker volume create loomcache-wal-restored
docker run --rm \
-v loomcache-wal-restored:/data \
-v $(pwd):/backup \
alpine tar xzf /backup/wal-backup.tar.gz -C /data

LoomCache creates automatic snapshots every 10,000 committed entries. On restart, nodes load the latest snapshot and replay only the WAL entries after the snapshot index — reducing startup time for long-running clusters.

LoomCache implements an 8-phase graceful shutdown with configurable drain timeout:

  1. Stop accepting new connections
  2. Drain in-flight requests (default: 200ms)
  3. Transfer leadership via TimeoutNow RPC
  4. Flush pending WAL entries
  5. Close client connections
  6. Stop Raft replication
  7. Close peer connections
  8. Final cleanup and exit
Terminal window
# Graceful stop in Docker
docker stop --time=30 loomcache-node