com.loomcache.server.CacheNode

All Implemented Interfaces:: MessageHandler

public class CacheNode extends Object implements MessageHandler

A single cache cluster node — Consistency-by-default architecture.

Replication Model (v1): Full Replication

All nodes participate in a SINGLE Raft group. Every node holds ALL data. This is similar to etcd and ZooKeeper — simple, strong consistency, at the cost of storage efficiency for very large datasets.

Writes: go through Raft consensus on the leader, replicated to all followers. Reads: served locally on any node (leader uses lease-based quorum reads).

Cluster Isolation

Every node is bound to a ClusterConfig that carries a cluster name/id. Nodes with different cluster ids ignore each other's JOIN requests, which enables parallel integration tests with isolated clusters.

Lifecycle

1. start() — binds TCP port, starts accept loop 2. connectToSeeds — connects to seeds, exchanges cluster ID 3. heartbeat — pings peers every 2s 4. failure detect — marks peers dead after 6s silence 5. stop() — graceful shutdown

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

CacheNode.CpGroupSummary

static final record

CacheNode.ParsedSeed
Field Summary

Fields

Modifier and Type

Field

Description

static final String

PRODUCTION_ALLOW_STANDALONE_PROPERTY

static final String

WAL_IN_MEMORY_FALLBACK_PROPERTY
Constructor Summary

Constructors

Constructor

Description

CacheNode(ClusterConfig config)

Primary constructor — uses ClusterConfig.

CacheNode(ClusterConfig config, @Nullable io.micrometer.core.instrument.MeterRegistry meterRegistry)

Primary constructor with custom MeterRegistry for metrics collection.

CacheNode(String nodeId, String host, int port, int instanceNumber, List<String> seedAddresses)

Legacy constructor — wraps into ClusterConfig with a random cluster UUID.
Method Summary

Modifier and Type

Method

Description

void

addMigrationListener(MigrationListener listener)

Register an operator listener for partition migration lifecycle events.

void

addSecurityInterceptor(SecurityInterceptor interceptor)

Register a global security interceptor invoked around external client operations.

void

addSocketInterceptor(SocketInterceptor interceptor)

Register a layer-4 socket interceptor for accepted inbound TCP connections.

Set<String>

aliveMemberIds()

boolean

awaitStarted(long timeout, TimeUnit unit)

LoomVersion

changeClusterVersion(LoomVersion newVersion)

Change the cluster version gate and return the previous version.

int

connectionCount()

Returns the number of active TCP connections to peer nodes.

boolean

forceCloseCpSession(String sessionId)

Force-closes a CP session and releases its tracked resources.

boolean

forceLeaderStepDown()

Force the local Raft leader to step down for operator-driven failover or maintenance.

LoomVersion

getClusterVersion()

Return the cluster version gate for rolling-upgrade operations.

ClusterState.OperationalState

getOperationalState()

Return the current cluster operational state.

RaftNodeApi

getRaftNodeApi()

Narrow view of raftNode for callers that only need the query/submit RaftNodeApi.

@Nullable Message

handleMessage(Message message, ConnectionContext sender)

Handle an incoming message and optionally return a response.

boolean

isRunning()

List<CacheNode.CpGroupSummary>

listCpGroups()

Lists instantiated CP/Raft groups for operator inspection.

List<ConsistencySubsystem.CpSessionInfo>

listCpSessions()

Lists active CP sessions for operator inspection.

void

onPeerDisconnect(String peerId)

Called by TcpServer when a peer connection is closed.

static CacheNode.ParsedSeed

parseSeed(String raw)

Parse a seed address supporting IPv4 (host:port) and bracketed IPv6 ([2001:db8::1]:port) forms.

RaftNodeApi

raftGroupForKey(String key)

Resolve the RaftNode that should propose mutations for the given key.

void

reconnectToSeeds()

Re-attempt connections to configured seed nodes.

boolean

removeMigrationListener(MigrationListener listener)

Remove a previously registered partition migration listener.

boolean

removeSecurityInterceptor(SecurityInterceptor interceptor)

Remove a previously registered security interceptor.

boolean

removeSocketInterceptor(SocketInterceptor interceptor)

Remove a previously registered layer-4 socket interceptor.

ConsistencySubsystem.ResetResult

resetCpSubsystem()

Reset local CP subsystem state without touching AP data structures.

void

shutdownForTest()

Stop a test-owned node without broadcasting a cluster leave.

void

shutdownGracefully(Duration drainTimeout)

Gracefully shut down the cache node with a configurable drain timeout.

void

start()

void

startRaft()

Start the Raft subsystem after the node has already bootstrapped its networking stack.

void

startWithoutRaft()

Start networking, discovery, and background services without arming the Raft election timer yet.

void

stop()

Gracefully shut down the cache node with a default drain timeout of 5 seconds.

void

transitionOperationalState(ClusterState.OperationalState newState)

Transition this node's view of the cluster operational state.

HotBackupManager.HotBackupMetadata

triggerHotBackup()

Trigger an operator Hot Backup into ClusterConfig.persistenceBackupDir().

int

verifiedClusterConnectionCount()

Returns the number of verified cluster-peer connections.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface MessageHandler
onPeerDisconnect

Field Details
- PRODUCTION_ALLOW_STANDALONE_PROPERTY
  public static final String PRODUCTION_ALLOW_STANDALONE_PROPERTY
  
  See Also:
  
  Constant Field Values
- WAL_IN_MEMORY_FALLBACK_PROPERTY
  public static final String WAL_IN_MEMORY_FALLBACK_PROPERTY
  
  See Also:
  
  Constant Field Values
Constructor Details
- CacheNode
  
  public CacheNode(ClusterConfig config)
  
  Primary constructor — uses ClusterConfig.
- CacheNode
  
  public CacheNode(ClusterConfig config, @Nullable io.micrometer.core.instrument.MeterRegistry meterRegistry)
  
  Primary constructor with custom MeterRegistry for metrics collection.
  
  Parameters:
  
  config - the cluster configuration
  
  meterRegistry - the MeterRegistry for metrics (or null to use SimpleMeterRegistry)
- CacheNode
  
  public CacheNode(String nodeId, String host, int port, int instanceNumber, List<String> seedAddresses)
  
  Legacy constructor — wraps into ClusterConfig with a random cluster UUID.
Method Details
- getRaftNodeApi
  public RaftNodeApi getRaftNodeApi()
  
  Narrow view of raftNode for callers that only need the query/submit RaftNodeApi. Used primarily so tests can depend on the interface (and supply mock(RaftNodeApi.class)) instead of the concrete Raft implementation. Prefer this over
  
  invalid reference
  
  #getRaftNode()
  
  when concrete lifecycle methods are not needed.
- forceLeaderStepDown
  
  public boolean forceLeaderStepDown()
  
  Force the local Raft leader to step down for operator-driven failover or maintenance.
  
  Returns:
  
  true when this node was the leader and step-down was requested
- triggerHotBackup
  
  public HotBackupManager.HotBackupMetadata triggerHotBackup() throws IOException
  
  Trigger an operator Hot Backup into ClusterConfig.persistenceBackupDir(). The live WAL and snapshot stores are not modified or compacted.
  
  Returns:
  
  metadata describing the backup manifest and group snapshot files
  
  Throws:
  
  IOException - if writing the backup fails
  
  IllegalStateException - if no backup directory is configured
- start
  
  public void start() throws IOException
  
  Throws:
  
  IOException
- startWithoutRaft
  
  public void startWithoutRaft() throws IOException
  
  Start networking, discovery, and background services without arming the Raft election timer yet. Intended for tests that need to form the TCP mesh before starting consensus.
  
  Throws:
  
  IOException
- startRaft
  
  public void startRaft()
  
  Start the Raft subsystem after the node has already bootstrapped its networking stack. Safe to call multiple times.
- shutdownGracefully
  public void shutdownGracefully(Duration drainTimeout)
  
  Gracefully shut down the cache node with a configurable drain timeout.
  Shutdown sequence:
  
  Set DRAINING state to reject new connections/requests
  
  Wait for in-flight requests to complete (up to drain timeout)
  
  Stop health checker and discovery mechanisms
  
  Step down from Raft leadership (if leader)
  
  Stop Raft consensus engine
  
  Stop other components (metrics, audit logger)
  
  Broadcast LEAVE message to cluster peers
  
  Close all connections and TCP server
  
  Parameters:
  
  drainTimeout - the timeout to wait for in-flight requests to complete
- shutdownForTest
  
  public void shutdownForTest()
  
  Stop a test-owned node without broadcasting a cluster leave.
  Integration tests create many short-lived clusters in parallel. Treating every teardown as a production graceful leave makes surviving peers run membership-change and Raft-removal paths after the test already ended, which creates background churn that interferes with neighboring test clusters. Test shutdown is intentionally local: stop accepting work, stop Raft and background components, then close sockets.
- stop
  
  public void stop()
  
  Gracefully shut down the cache node with a default drain timeout of 5 seconds.
  This is a convenience method that calls shutdownGracefully(Duration) with a 5-second drain timeout. It's suitable for most rolling restart scenarios.
- awaitStarted
  
  public boolean awaitStarted(long timeout, TimeUnit unit) throws InterruptedException
  
  Throws:
  
  InterruptedException
- raftGroupForKey
  public RaftNodeApi raftGroupForKey(String key)
  
  Resolve the RaftNode that should propose mutations for the given key.
  Routing semantics:
  
  When sharding is disabled (the default, ClusterConfig.shardingEnabled()==false), returns raftNode — the single v1.x consensus group. Callers that route through this helper see byte-for-byte identical behavior to raw raftNode.propose(...).
  
  When sharding is enabled, delegates to PartitionRouter.getRaftNodeForKey(String) which hashes the key into one of numGroups partitions and returns the owning RaftGroupManager group (lazily creating it on first access).
  
  This is the single chokepoint for key-based Raft-group routing. Handler code that needs to propose mutations keyed by a data-structure key should route through this method instead of touching raftNode directly. Reads remain local to the registry and do not go through this helper.
  Per-partition snapshots: snapshots remain at the group level — each Raft group snapshots its own partitions as a single blob via DataStructureRegistry.snapshotAllData(). Per-partition snapshots within a group (which would enable faster rebalancing of individual partitions) are deferred future work.
  
  Parameters:
  
  key - the data-structure key to route (must not be null)
  
  Returns:
  
  the RaftNode that owns this key — never null
  
  Since:
  
  2.0
- addSecurityInterceptor
  
  public void addSecurityInterceptor(SecurityInterceptor interceptor)
  
  Register a global security interceptor invoked around external client operations.
- removeSecurityInterceptor
  
  public boolean removeSecurityInterceptor(SecurityInterceptor interceptor)
  
  Remove a previously registered security interceptor.
- addSocketInterceptor
  
  public void addSocketInterceptor(SocketInterceptor interceptor)
  
  Register a layer-4 socket interceptor for accepted inbound TCP connections.
- removeSocketInterceptor
  
  public boolean removeSocketInterceptor(SocketInterceptor interceptor)
  
  Remove a previously registered layer-4 socket interceptor.
- addMigrationListener
  
  public void addMigrationListener(MigrationListener listener)
  
  Register an operator listener for partition migration lifecycle events.
- removeMigrationListener
  
  public boolean removeMigrationListener(MigrationListener listener)
  
  Remove a previously registered partition migration listener.
- getOperationalState
  
  public ClusterState.OperationalState getOperationalState()
  
  Return the current cluster operational state.
- getClusterVersion
  
  public LoomVersion getClusterVersion()
  
  Return the cluster version gate for rolling-upgrade operations.
- changeClusterVersion
  
  public LoomVersion changeClusterVersion(LoomVersion newVersion)
  
  Change the cluster version gate and return the previous version.
- resetCpSubsystem
  
  public ConsistencySubsystem.ResetResult resetCpSubsystem()
  
  Reset local CP subsystem state without touching AP data structures.
- listCpGroups
  
  public List<CacheNode.CpGroupSummary> listCpGroups()
  
  Lists instantiated CP/Raft groups for operator inspection.
- listCpSessions
  
  public List<ConsistencySubsystem.CpSessionInfo> listCpSessions()
  
  Lists active CP sessions for operator inspection.
- forceCloseCpSession
  
  public boolean forceCloseCpSession(String sessionId)
  
  Force-closes a CP session and releases its tracked resources.
- transitionOperationalState
  
  public void transitionOperationalState(ClusterState.OperationalState newState)
  
  Transition this node's view of the cluster operational state.
- handleMessage
  
  public @Nullable Message handleMessage(Message message, ConnectionContext sender)
  
  Description copied from interface: MessageHandler
  
  Handle an incoming message and optionally return a response.
  
  Specified by:
  
  handleMessage in interface MessageHandler
  
  Parameters:
  
  message - the incoming message
  
  sender - the connection context of the sender
  
  Returns:
  
  response message, or null if no response needed
- onPeerDisconnect
  
  public void onPeerDisconnect(String peerId)
  
  Called by TcpServer when a peer connection is closed. Cleans up subscriptions and listener registrations for the disconnected peer.
  
  Specified by:
  
  onPeerDisconnect in interface MessageHandler
  
  Parameters:
  
  peerId - the peer ID that disconnected
- parseSeed
  
  public static CacheNode.ParsedSeed parseSeed(String raw)
  
  Parse a seed address supporting IPv4 (host:port) and bracketed IPv6 ([2001:db8::1]:port) forms. Raw colon-bearing hosts are rejected to avoid misinterpreting IPv6 literals as host:port.
- reconnectToSeeds
  
  public void reconnectToSeeds()
  
  Re-attempt connections to configured seed nodes. Useful for tests that start nodes sequentially — earlier nodes will have failed to connect to later nodes that were not yet running.
- isRunning
  
  public boolean isRunning()
- connectionCount
  
  public int connectionCount()
  
  Returns the number of active TCP connections to peer nodes. Useful for test harnesses that need to verify mesh connectivity before starting Raft consensus.
- verifiedClusterConnectionCount
  
  public int verifiedClusterConnectionCount()
  
  Returns the number of verified cluster-peer connections. Excludes temporary seed/pending identities that are still converging.
- aliveMemberIds
  
  public Set<String> aliveMemberIds()

Class CacheNode

Replication Model (v1): Full Replication

Cluster Isolation

Lifecycle

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface MessageHandler

Field Details

PRODUCTION_ALLOW_STANDALONE_PROPERTY

WAL_IN_MEMORY_FALLBACK_PROPERTY

Constructor Details

CacheNode

CacheNode

CacheNode

Method Details

getRaftNodeApi

forceLeaderStepDown

triggerHotBackup

start

startWithoutRaft

startRaft

shutdownGracefully

shutdownForTest

stop

awaitStarted

raftGroupForKey

addSecurityInterceptor

removeSecurityInterceptor

addSocketInterceptor

removeSocketInterceptor

addMigrationListener

removeMigrationListener

getOperationalState

getClusterVersion

changeClusterVersion

resetCpSubsystem

listCpGroups

listCpSessions

forceCloseCpSession

transitionOperationalState

handleMessage

onPeerDisconnect

parseSeed

reconnectToSeeds

isRunning

connectionCount

verifiedClusterConnectionCount

aliveMemberIds