Class ClusterHealthMonitor
java.lang.Object
com.loomcache.server.cluster.ClusterHealthMonitor
- All Implemented Interfaces:
AutoCloseable
Cluster-wide health monitor that aggregates health status from all subsystems.
Monitors: - Local node memory usage (heap) - CPU load - Raft consensus health - Network heartbeat status - Quorum reachability - Custom health checks registered by subsystems
Uses virtual threads for background health checks and ReentrantReadWriteLock for thread safety. Health degradation rules: - Memory >80% = Degraded - Memory >95% = Unhealthy - Heartbeat missing >5s = Degraded - Heartbeat missing >30s = Unhealthy
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumAlert severity levels.static final recordCluster-wide health aggregation.static final recordHealth alert record.static final recordHealth snapshot for historical tracking.static interfaceListener for health state changes.static interfaceSealed interface for health status.static final recordHealth information for a single node. -
Constructor Summary
ConstructorsConstructorDescriptionClusterHealthMonitor(String nodeId, int instanceNumber, RaftNode raftNode, HealthChecker healthChecker, long healthCheckIntervalMs) Create a new ClusterHealthMonitor. -
Method Summary
Modifier and TypeMethodDescriptionvoidaddHealthCheck(String name, Supplier<ClusterHealthMonitor.HealthStatus> check) Register a custom health check.voidAdd a listener for health state changes.voidClear all active alerts.voidclose()Shutdown hook.Recompute and return this node's health (for testing/immediate updates).Get all active alerts.Get cluster-wide health by aggregating known nodes.Get results of all health checks (including custom).getHealthHistory(int maxEntries) Get the last N health snapshots from history.Get this node's current health status.longGet the replication lag in commit indices between leader and followers.longGet the monitor's uptime in milliseconds.booleanCheck if a quorum of nodes is reachable.booleanCheck if running.voidRegister a listener for alert events.voidRemove an alert listener.voidRemove a listener.voidsetAlertThreshold(String metric, double threshold) Set the alert threshold for a specific metric.voidstart()Start the background health check thread (virtual thread).voidstop()Stop the background health check thread gracefully.voidUpdate health information for a remote node.
-
Constructor Details
-
ClusterHealthMonitor
public ClusterHealthMonitor(String nodeId, int instanceNumber, RaftNode raftNode, HealthChecker healthChecker, long healthCheckIntervalMs) Create a new ClusterHealthMonitor.- Parameters:
nodeId- unique node identifierinstanceNumber- instance number for loggingraftNode- the Raft node for consensus statehealthChecker- the network health checkerhealthCheckIntervalMs- interval between health checks (milliseconds)
-
-
Method Details
-
start
public void start()Start the background health check thread (virtual thread). -
stop
public void stop()Stop the background health check thread gracefully. -
getNodeHealth
Get this node's current health status. -
computeAndUpdateNodeHealth
Recompute and return this node's health (for testing/immediate updates). -
getClusterHealth
Get cluster-wide health by aggregating known nodes. -
addHealthCheck
Register a custom health check.- Parameters:
name- unique name for this checkcheck- supplier that returns the health status
-
getHealthChecks
Get results of all health checks (including custom). -
isQuorumReached
public boolean isQuorumReached()Check if a quorum of nodes is reachable. A quorum is defined as a majority of known nodes (including this node). -
getReplicationLag
public long getReplicationLag()Get the replication lag in commit indices between leader and followers. Returns the maximum difference found. -
addStateChangeListener
Add a listener for health state changes. -
removeStateChangeListener
Remove a listener. -
updateRemoteNodeHealth
Update health information for a remote node. This would be called when health info is received from other nodes. -
isRunning
public boolean isRunning()Check if running. -
getHealthHistory
Get the last N health snapshots from history.- Parameters:
maxEntries- maximum number of entries to return- Returns:
- list of recent health snapshots (newest first)
-
setAlertThreshold
Set the alert threshold for a specific metric.- Parameters:
metric- the metric name (memory, cpu, heartbeat)threshold- the threshold value
-
getAlerts
Get all active alerts.- Returns:
- list of currently active alerts
-
clearAlerts
public void clearAlerts()Clear all active alerts. -
registerAlertListener
Register a listener for alert events.- Parameters:
listener- callback that receives new alerts
-
removeAlertListener
Remove an alert listener.- Parameters:
listener- the listener to remove
-
getUptimeMillis
public long getUptimeMillis()Get the monitor's uptime in milliseconds.- Returns:
- uptime in milliseconds
-
close
-