Class HealthChecker
java.lang.Object
com.loomcache.server.network.HealthChecker
Health checking loop for distributed cache cluster.
Runs in a virtual thread and: - Sends PING to all connected peers every heartbeat interval (default 2s) - Expects PONG responses within timeout (default 6s) - Integrates with PhiAccrualFailureDetector for adaptive failure detection - Logs peer health status (INFO when suspected, DEBUG otherwise)
Non-blocking: uses virtual threads for concurrent health checks across many peers. Graceful shutdown: respects a volatile running flag.
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionHealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections) Creates a new health checker for monitoring peer connectivity.HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections, IcmpFailureDetectorConfig icmpConfig) HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections, IcmpFailureDetectorConfig icmpConfig, HealthChecker.IcmpProbe icmpProbe) -
Method Summary
Modifier and TypeMethodDescription@Nullable PhiAccrualFailureDetectorgetDetector(String peerId) Get the detector for a specific peer (for testing/metrics).doubleGet current phi value for a peer.String[]Get list of tracked peers.booleanbooleanisSuspected(String peerId) Check if a peer is currently suspected (phi >= threshold).booleanlastIcmpReachable(String peerId) booleanvoidrecordPong(String peerId) Record a PONG response from a peer.voidregisterPeer(String peerId) Register a peer for health checking.voidremovePeer(String peerId) Remove a peer from health checking (e.g., when it disconnects).voidstart()Start the health check loop on a scheduled virtual thread.voidstop()Stop the health check loop gracefully.
-
Constructor Details
-
HealthChecker
public HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections) Creates a new health checker for monitoring peer connectivity.- Parameters:
nodeId- the local node ID (non-null)instanceNumber- the instance number (non-negative)heartbeatIntervalMs- PING interval in milliseconds (positive)heartbeatTimeoutMs- PONG timeout in milliseconds (positive, should be >= interval)connections- map of peer connections to monitor (non-null)- Throws:
NullPointerException- if nodeId or connections is nullIllegalArgumentException- if intervals are non-positive
-
HealthChecker
public HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections, IcmpFailureDetectorConfig icmpConfig) -
HealthChecker
public HealthChecker(String nodeId, int instanceNumber, long heartbeatIntervalMs, long heartbeatTimeoutMs, Map<String, ConnectionContext> connections, IcmpFailureDetectorConfig icmpConfig, HealthChecker.IcmpProbe icmpProbe)
-
-
Method Details
-
start
public void start()Start the health check loop on a scheduled virtual thread. Non-blocking — returns immediately. -
stop
public void stop()Stop the health check loop gracefully. -
recordPong
Record a PONG response from a peer. Calls heartbeat() on the phi-accrual failure detector to update suspicion level.- Parameters:
peerId- the peer that sent the PONG (non-null)- Throws:
NullPointerException- if peerId is null
-
removePeer
Remove a peer from health checking (e.g., when it disconnects).- Parameters:
peerId- the peer to remove (non-null)- Throws:
NullPointerException- if peerId is null
-
registerPeer
Register a peer for health checking. Call this when a peer joins or reconnects.- Parameters:
peerId- the peer to register (non-null)- Throws:
NullPointerException- if peerId is null
-
getDetector
Get the detector for a specific peer (for testing/metrics).- Parameters:
peerId- the peer ID (non-null)- Returns:
- the phi-accrual detector, or null if unknown
- Throws:
NullPointerException- if peerId is null
-
getTrackedPeers
Get list of tracked peers.- Returns:
- array of peer IDs being health-checked
-
getPhi
Get current phi value for a peer.- Parameters:
peerId- the peer ID- Returns:
- phi value, or -1 if unknown
-
isSuspected
Check if a peer is currently suspected (phi >= threshold).- Parameters:
peerId- the peer ID- Returns:
- true if suspected, false if healthy or unknown
-
probeIcmp
-
lastIcmpReachable
-
isRunning
public boolean isRunning()
-