Kubernetes & Docker Deployment

LoomCache is fully cloud-native and designed to run inside Linux containers. Because nodes must persist state to Write-Ahead Logs (WAL) and establish stable network identities for Raft consensus, it is highly recommended to deploy using Kubernetes StatefulSets.

Kubernetes StatefulSet

Pod Auto-Discovery via Headless Service

Pod-0

10.0.1.5

PVC/wal

Pod-1

10.0.2.5

PVC/wal

Pod-2

10.0.3.5

PVC/wal

Docker Single-Node Quickstart

# Build the Docker image
docker build -t loomcache:1.0 .

# Run a single node
docker run -d \
  --name loomcache-node \
  -p 7654:7654 -p 9090:9090 -p 8080:8080 \
  -e LOOMCACHE_NODE_ID=single-node \
  -e LOOMCACHE_SERVER_ENABLED=true \
  -e LOOMCACHE_SERVER_PORT=7654 \
  -e LOOMCACHE_SERVER_PERSISTENCE_ENABLED=true \
  -v loomcache-wal:/var/lib/loomcache/wal \
  loomcache:1.0

# Health check
curl http://localhost:8080/health

# Prometheus metrics
curl http://localhost:9090/metrics

Docker Compose 3-Node Cluster

# Start the cluster
docker-compose up -d

# Verify all nodes are running
docker-compose ps

# Check health on individual nodes
curl http://localhost:7654/health
curl http://localhost:7655/health
curl http://localhost:7656/health

The docker-compose.yml defines three nodes with DNS-based intra-cluster communication:

Node	Client Port	Metrics Port	Internal DNS
node-1	7654	9090	`loomcache-node1:7654`
node-2	7655	9091	`loomcache-node2:7654`
node-3	7656	9092	`loomcache-node3:7654`

Kubernetes Deployment

Stateful Cluster Topologies

Unlike stateless microservices, LoomCache nodes form a cohesive, consistent hash ring. They require stable network identifiers so that clients can securely route packets to the correct partition leaders without constant DNS cache misses.

StatefulSet Configuration

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: loomcache
spec:
  serviceName: loomcache-headless
  replicas: 3
  selector:
    matchLabels:
      app: loomcache
  template:
    metadata:
      labels:
        app: loomcache
    spec:
      containers:
        - name: loomcache
          image: loomcache:1.0
          ports:
            - containerPort: 7654
              name: cache
            - containerPort: 9090
              name: metrics
            - containerPort: 8080
              name: health
          env:
            - name: LOOMCACHE_NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: LOOMCACHE_NODES
              value: "loomcache-0.loomcache-headless:7654,loomcache-1.loomcache-headless:7654,loomcache-2.loomcache-headless:7654"
          volumeMounts:
            - name: wal-data
              mountPath: /var/lib/loomcache/wal
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: wal-data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi

Auto-Discovery via Headless Services

When deploying to Kubernetes, nodes dynamically discover each other using a headless service. The DnsDiscovery or EnvironmentDiscovery strategies allow new pods to query the Kubernetes API:

Pod-0 starts up and checks loomcache-headless.default.svc.cluster.local.
As Pod-1 and Pod-2 spin up, they see the seed nodes and initiate Raft Pre-Vote and Leader Election.
Once a majority (quorum of 2/3) is achieved, the cluster turns green and begins accepting writes.

apiVersion: v1
kind: Service
metadata:
  name: loomcache-headless
spec:
  clusterIP: None
  selector:
    app: loomcache
  ports:
    - port: 7654
      name: cache

Persistent Volumes (PVCs)

Every pod requires a Persistent Volume Claim (PVC) mounted to /var/lib/loomcache/wal. This directory holds the active .dat files for the Write-Ahead Log. Even if a pod crashes, Kubernetes will remount the exact same PVC to the new pod, allowing it to instantly replay the WAL from the last index and effortlessly rejoin the cluster.

Do not use emptyDir or ephemeral storage unless you are testing locally.

Health Probes & Circuit Breakers

LoomCache provides an HTTP endpoint for Kubernetes Readiness and Liveness probes (/health). Intra-cluster detection is even faster: LoomCache natively implements Akka-style Phi-Accrual Failure Detectors via the DiscoveryHealthChecker. If a node experiences a sudden JVM crash or network partition, the remaining peers’ Circuit Breakers trigger to OPEN state, instantly re-routing traffic.

Monitoring with Prometheus

Every LoomCache node exposes Micrometer metrics in Prometheus format:

# prometheus.yml scrape config
scrape_configs:
  - job_name: loomcache
    static_configs:
      - targets:
          - loomcache-0:9090
          - loomcache-1:9090
          - loomcache-2:9090

Key metrics:

Metric	Description
`loomcache_raft_term`	Current Raft term
`loomcache_raft_commit_index`	Committed log index
`loomcache_raft_replication_lag`	Per-follower replication lag
`loomcache_cache_hits_total`	Cache hit count
`loomcache_cache_misses_total`	Cache miss count
`loomcache_evictions_total`	Eviction count by policy
`loomcache_circuit_breaker_state`	Per-peer circuit breaker state
`loomcache_audit_events_total`	Total audit events logged

Backup and Restore

WAL Backup

# Backup a Docker volume
docker run --rm \
  -v loomcache-wal-1:/data \
  -v $(pwd):/backup \
  alpine tar czf /backup/wal-backup.tar.gz -C /data .

# Restore
docker volume create loomcache-wal-restored
docker run --rm \
  -v loomcache-wal-restored:/data \
  -v $(pwd):/backup \
  alpine tar xzf /backup/wal-backup.tar.gz -C /data

Snapshot-Based Recovery

LoomCache creates automatic snapshots every 10,000 committed entries. On restart, nodes load the latest snapshot and replay only the WAL entries after the snapshot index — reducing startup time for long-running clusters.

Graceful Shutdown

LoomCache implements an 8-phase graceful shutdown with configurable drain timeout:

Stop accepting new connections
Drain in-flight requests (default: 200ms)
Transfer leadership via TimeoutNow RPC
Flush pending WAL entries
Close client connections
Stop Raft replication
Close peer connections
Final cleanup and exit

# Graceful stop in Docker
docker stop --time=30 loomcache-node