Class PartitionDetector
java.lang.Object
com.loomcache.server.cluster.PartitionDetector
Network partition detector for cluster health monitoring.
Detects three types of network partitions:
- NO_PARTITION: All nodes are reachable and healthy
- PARTIAL: Asymmetric partitions where some nodes can reach others but not vice versa
- FULL_PARTITION: Complete network split (majority vs minority partitions)
Uses phi-accrual failure detection (based on inter-arrival times of heartbeats) to adaptively detect node failures without fixed timeouts.
Configuration
Configured viaPartitionDetector.DetectorConfig record, with optional history retention override:
- phiThreshold: phi value above which a node is suspected dead (default: 8.0)
- heartbeatIntervalMs: expected interval between heartbeats (used for startup grace and metrics)
- windowSize: sliding window size for inter-arrival times (default: 100)
- partitionHistoryLimit: max partition events retained in memory (default: 1024)
Thread Safety
Uses ReentrantReadWriteLock for concurrent access to detector state. All operations are virtual-thread friendly.- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordConfiguration record for the partition detector.static final recordDetector statistics for monitoring.static final recordDiagnostics snapshot for partition status.static final recordEvent representing a partition change (detected or healed).static final recordRecovery plan for an asymmetric partial network partition.static interfaceSealed interface for partition status. -
Constructor Summary
ConstructorsConstructorDescriptionPartitionDetector(String nodeId) Create a detector with default configuration.PartitionDetector(String nodeId, PartitionDetector.DetectorConfig config) PartitionDetector(String nodeId, PartitionDetector.DetectorConfig config, int partitionHistoryLimit) Create a detector with explicit partition history retention. -
Method Summary
Modifier and TypeMethodDescriptionGet detector statistics.Build a max-clique recovery plan from known heartbeat and peer reachability data.Get current partition diagnostics.getPartitionHistory(int maxEntries) Get recent partition history events.Get the current partition status.Get a snapshot of all detector states for monitoring.Get the set of suspected nodes (nodes with high phi values).Get all tracked nodes.booleanisAvailable(String nodeId) Check if a node is available (not suspected dead).doubleGet the current phi value for a node.planPartialPartitionRecovery(Set<String> clusterMembers) Build a max-clique recovery plan over an explicit cluster membership set.voidrecordHeartbeat(String nodeId) Record a heartbeat arrival from a node.voidrecordReachability(String observerNodeId, String targetNodeId, boolean reachable) Record one directed reachability observation.voidrecordReachabilityReport(String observerNodeId, Map<String, Boolean> reachableByNode) Record a directed reachability report from one peer.voidRegister a callback to be invoked when a partition heals.voidreset()Reset all detector state.toString()
-
Constructor Details
-
PartitionDetector
-
PartitionDetector
public PartitionDetector(String nodeId, PartitionDetector.DetectorConfig config, int partitionHistoryLimit) Create a detector with explicit partition history retention.- Parameters:
nodeId- the ID of the local nodeconfig- detector threshold and heartbeat settingspartitionHistoryLimit- max partition events retained in memory
-
PartitionDetector
Create a detector with default configuration.- Parameters:
nodeId- the ID of the local node
-
-
Method Details
-
recordHeartbeat
Record a heartbeat arrival from a node.Updates the phi-accrual detector for that node. If the node is not yet tracked, it is automatically registered.
- Parameters:
nodeId- the ID of the node sending the heartbeat
-
recordReachability
-
recordReachabilityReport
-
phi
Get the current phi value for a node.Higher phi values indicate greater suspicion that the node has failed. If the node is not yet tracked, returns 0.0 (no suspicion).
- Parameters:
nodeId- the node ID- Returns:
- the phi value (typically 0-15), or 0.0 if node not yet tracked
-
isAvailable
Check if a node is available (not suspected dead).- Parameters:
nodeId- the node ID- Returns:
- true if the node is available, false if suspected dead
-
getPartitionStatus
Get the current partition status.Analyzes all tracked nodes to determine if: - All nodes are healthy (NO_PARTITION) - Some nodes are suspected (PARTIAL or FULL_PARTITION)
Also tracks partition state changes for history and healing callbacks.
- Returns:
- the current PartitionStatus
-
getPartialPartitionRecoveryPlan
Build a max-clique recovery plan from known heartbeat and peer reachability data.- Returns:
- recovery plan over all known nodes
-
planPartialPartitionRecovery
public PartitionDetector.PartitionRecoveryPlan planPartialPartitionRecovery(Set<String> clusterMembers) Build a max-clique recovery plan over an explicit cluster membership set.- Parameters:
clusterMembers- cluster members that must be considered by the plan- Returns:
- recovery plan over the supplied members plus the local node
-
getSuspectedNodes
-
getTrackedNodes
-
getPartitionHistory
Get recent partition history events.- Parameters:
maxEntries- maximum number of entries to return (most recent first)- Returns:
- list of recent PartitionEvent records
-
getPartitionDiagnostics
Get current partition diagnostics.- Returns:
- PartitionDiagnostics with current state
-
registerHealingCallback
Register a callback to be invoked when a partition heals.- Parameters:
callback- consumer function to invoke with PartitionEvent
-
getDetectorStats
Get detector statistics.- Returns:
- DetectorStats with cumulative statistics
-
reset
public void reset()Reset all detector state. Useful when reinitializing a network or recovering from a partition. -
getPhiSnapshot
-
toString
-