Class ClusterHealthMonitor

java.lang.Object
com.loomcache.server.cluster.ClusterHealthMonitor
All Implemented Interfaces:
AutoCloseable

public class ClusterHealthMonitor extends Object implements AutoCloseable
Cluster-wide health monitor that aggregates health status from all subsystems.

Monitors: - Local node memory usage (heap) - CPU load - Raft consensus health - Network heartbeat status - Quorum reachability - Custom health checks registered by subsystems

Uses virtual threads for background health checks and ReentrantReadWriteLock for thread safety. Health degradation rules: - Memory >80% = Degraded - Memory >95% = Unhealthy - Heartbeat missing >5s = Degraded - Heartbeat missing >30s = Unhealthy

  • Constructor Details

    • ClusterHealthMonitor

      public ClusterHealthMonitor(String nodeId, int instanceNumber, RaftNode raftNode, HealthChecker healthChecker, long healthCheckIntervalMs)
      Create a new ClusterHealthMonitor.
      Parameters:
      nodeId - unique node identifier
      instanceNumber - instance number for logging
      raftNode - the Raft node for consensus state
      healthChecker - the network health checker
      healthCheckIntervalMs - interval between health checks (milliseconds)
  • Method Details

    • start

      public void start()
      Start the background health check thread (virtual thread).
    • stop

      public void stop()
      Stop the background health check thread gracefully.
    • getNodeHealth

      public ClusterHealthMonitor.NodeHealth getNodeHealth()
      Get this node's current health status.
    • computeAndUpdateNodeHealth

      public ClusterHealthMonitor.NodeHealth computeAndUpdateNodeHealth()
      Recompute and return this node's health (for testing/immediate updates).
    • getClusterHealth

      public ClusterHealthMonitor.ClusterHealth getClusterHealth()
      Get cluster-wide health by aggregating known nodes.
    • addHealthCheck

      public void addHealthCheck(String name, Supplier<ClusterHealthMonitor.HealthStatus> check)
      Register a custom health check.
      Parameters:
      name - unique name for this check
      check - supplier that returns the health status
    • getHealthChecks

      public Map<String, ClusterHealthMonitor.HealthStatus> getHealthChecks()
      Get results of all health checks (including custom).
    • isQuorumReached

      public boolean isQuorumReached()
      Check if a quorum of nodes is reachable. A quorum is defined as a majority of known nodes (including this node).
    • getReplicationLag

      public long getReplicationLag()
      Get the replication lag in commit indices between leader and followers. Returns the maximum difference found.
    • addStateChangeListener

      public void addStateChangeListener(ClusterHealthMonitor.HealthStateChangeListener listener)
      Add a listener for health state changes.
    • removeStateChangeListener

      public void removeStateChangeListener(ClusterHealthMonitor.HealthStateChangeListener listener)
      Remove a listener.
    • updateRemoteNodeHealth

      public void updateRemoteNodeHealth(ClusterHealthMonitor.NodeHealth nodeHealth)
      Update health information for a remote node. This would be called when health info is received from other nodes.
    • isRunning

      public boolean isRunning()
      Check if running.
    • getHealthHistory

      public List<ClusterHealthMonitor.HealthSnapshot> getHealthHistory(int maxEntries)
      Get the last N health snapshots from history.
      Parameters:
      maxEntries - maximum number of entries to return
      Returns:
      list of recent health snapshots (newest first)
    • setAlertThreshold

      public void setAlertThreshold(String metric, double threshold)
      Set the alert threshold for a specific metric.
      Parameters:
      metric - the metric name (memory, cpu, heartbeat)
      threshold - the threshold value
    • getAlerts

      Get all active alerts.
      Returns:
      list of currently active alerts
    • clearAlerts

      public void clearAlerts()
      Clear all active alerts.
    • registerAlertListener

      public void registerAlertListener(Consumer<ClusterHealthMonitor.HealthAlert> listener)
      Register a listener for alert events.
      Parameters:
      listener - callback that receives new alerts
    • removeAlertListener

      public void removeAlertListener(Consumer<ClusterHealthMonitor.HealthAlert> listener)
      Remove an alert listener.
      Parameters:
      listener - the listener to remove
    • getUptimeMillis

      public long getUptimeMillis()
      Get the monitor's uptime in milliseconds.
      Returns:
      uptime in milliseconds
    • close

      public void close()
      Shutdown hook.
      Specified by:
      close in interface AutoCloseable