Class HealthChecker

java.lang.Object
com.loomcache.server.network.HealthChecker

public class HealthChecker extends Object
Health checking loop for distributed cache cluster.

Runs in a virtual thread and: - Sends PING to all connected peers every heartbeat interval (default 2s) - Expects PONG responses within timeout (default 6s) - Integrates with PhiAccrualFailureDetector for adaptive failure detection - Logs peer health status (INFO when suspected, DEBUG otherwise)

Non-blocking: uses virtual threads for concurrent health checks across many peers. Graceful shutdown: respects a volatile running flag.

  • Constructor Details

    • HealthChecker

      public HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections)
      Creates a new health checker for monitoring peer connectivity.
      Parameters:
      nodeId - the local node ID (non-null)
      instanceNumber - the instance number (non-negative)
      heartbeatIntervalMs - PING interval in milliseconds (positive)
      heartbeatTimeoutMs - PONG timeout in milliseconds (positive, should be >= interval)
      connections - map of peer connections to monitor (non-null)
      Throws:
      NullPointerException - if nodeId or connections is null
      IllegalArgumentException - if intervals are non-positive
    • HealthChecker

      public HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections, IcmpFailureDetectorConfig icmpConfig)
    • HealthChecker

      public HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections, IcmpFailureDetectorConfig icmpConfig, HealthChecker.IcmpProbe icmpProbe)
  • Method Details

    • start

      public void start()
      Start the health check loop on a scheduled virtual thread. Non-blocking — returns immediately.
    • stop

      public void stop()
      Stop the health check loop gracefully.
    • recordPong

      public void recordPong(String peerId)
      Record a PONG response from a peer. Calls heartbeat() on the phi-accrual failure detector to update suspicion level.
      Parameters:
      peerId - the peer that sent the PONG (non-null)
      Throws:
      NullPointerException - if peerId is null
    • removePeer

      public void removePeer(String peerId)
      Remove a peer from health checking (e.g., when it disconnects).
      Parameters:
      peerId - the peer to remove (non-null)
      Throws:
      NullPointerException - if peerId is null
    • registerPeer

      public void registerPeer(String peerId)
      Register a peer for health checking. Call this when a peer joins or reconnects.
      Parameters:
      peerId - the peer to register (non-null)
      Throws:
      NullPointerException - if peerId is null
    • getDetector

      public @Nullable PhiAccrualFailureDetector getDetector(String peerId)
      Get the detector for a specific peer (for testing/metrics).
      Parameters:
      peerId - the peer ID (non-null)
      Returns:
      the phi-accrual detector, or null if unknown
      Throws:
      NullPointerException - if peerId is null
    • getTrackedPeers

      public String[] getTrackedPeers()
      Get list of tracked peers.
      Returns:
      array of peer IDs being health-checked
    • getPhi

      public double getPhi(String peerId)
      Get current phi value for a peer.
      Parameters:
      peerId - the peer ID
      Returns:
      phi value, or -1 if unknown
    • isSuspected

      public boolean isSuspected(String peerId)
      Check if a peer is currently suspected (phi >= threshold).
      Parameters:
      peerId - the peer ID
      Returns:
      true if suspected, false if healthy or unknown
    • probeIcmp

      public boolean probeIcmp(String peerId)
    • lastIcmpReachable

      public boolean lastIcmpReachable(String peerId)
    • isRunning

      public boolean isRunning()