Class TwoPhaseCoordinator
Responsibilities:
- Split a
ReplicatedTransactionCommandinto per-participant-group slices viaCrossGroupTransactionContext. - Scatter
TwoPhaseCommands.PrepareGroupto each participant group's leader (resolved throughraftGroupManager) and collect the votes. - Broadcast the resulting
TwoPhaseCommands.DecideGroupto each participant and complete the client-visible future once every participant has either acknowledged the decision or the decide-timeout has fired. - Serve
TwoPhaseCommands.DecideQueryreplies from a recovering participant (coordinator looks the txId up inpendingand replies with the recorded decision, defaulting to ABORT if unknown).
Durability model
Coordinator-side intent and decision records are replicated through raft-0 as TX_COORD_PREPARE/TX_COORD_DECIDE entries. The coordinator waits for each control-plane entry to commit and apply before it scatters participant PREPARE or DECIDE messages, so a raft-0 leader crash cannot leave participants ahead of the durable coordinator state.If a new raft-0 leader replays a prepared-but-undecided transaction, it completes recovery by broadcasting ABORT. If it replays a decided record, it re-broadcasts the recorded decision to let participants finish idempotently.
Thread-safety
All public methods are safe for concurrent invocation. Inbound ACKs arrive from the protocol handler thread; timeouts fire on the scheduled-executor thread. Per-txId state is protected with a dedicated object monitor to serialize vote collection, decision derivation, and state cleanup.-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final DurationDefault decide-phase ack timeout.static final DurationHow long to retain a completed coordinator record so that a straggling participant'sTwoPhaseCommands.DecideQuerycan still see the correct decision.static final intMaximum dispatch attempts per (txId, groupId, phase).static final DurationDelay between coordinator dispatch retry attempts when a participant group has no known leader (freshly-elected raft-N where the coordinator node has not yet processed raft-N's first AppendEntries) or when the TCP send fails (transient NOT_LEADER bounce / connection not-yet-ready).static final DurationDefault vote-collection timeout — bounded by the client TX timeout in practice.static final DurationBrief grace window used during raft-0 leader recovery before an undecided prepared transaction is defaulted to ABORT. -
Constructor Summary
ConstructorsConstructorDescriptionTwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler) TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention) TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention, @NonNull Duration dispatchRetryDelay, int dispatchMaxAttempts) -
Method Summary
Modifier and TypeMethodDescriptionvoidapplyCoordDecide(@NonNull TwoPhaseCommands.CoordDecide cmd) Apply a TX_COORD_DECIDE entry from raft-0's state machine.voidapplyCoordPrepare(@NonNull TwoPhaseCommands.CoordPrepare cmd) Apply a TX_COORD_PREPARE entry from raft-0's state machine.booleanvoidCalled fromCacheNode's raft-0 leader-change listener when THIS node becomes raft-0 leader.voidonDecideAck(@NonNull TwoPhaseCommands.DecideAck ack) Handle an inboundTwoPhaseCommands.DecideAckfrom a participant.voidonDecideQuery(@NonNull TwoPhaseCommands.DecideQuery query, @NonNull String sourceNodeId) Respond to a recovering participant'sTwoPhaseCommands.DecideQueryby resending the durable decision.voidonPrepareAck(@NonNull TwoPhaseCommands.PrepareAck ack) Handle an inboundTwoPhaseCommands.PrepareAckfrom a participant.intvoidsetMetrics(@Nullable LoomMetrics metrics) Install the Micrometer metrics facade.@NonNull CompletableFuture<TransactionResult> start(@NonNull ReplicatedTransactionCommand command) Start a new cross-group transaction.
-
Field Details
-
DEFAULT_PREPARE_TIMEOUT
Default vote-collection timeout — bounded by the client TX timeout in practice. -
DEFAULT_DECIDE_TIMEOUT
Default decide-phase ack timeout. ABORT can finish when this fires; COMMIT with missing participant ACKs completes fail-closed for the client while retaining the durable decision for recovery queries. -
DEFAULT_DECIDED_RETENTION
How long to retain a completed coordinator record so that a straggling participant'sTwoPhaseCommands.DecideQuerycan still see the correct decision. A shorter TTL saves memory at the cost of potential divergence if a participant is offline longer than this window; the default mirrors the orphaned-intent timeout inTwoPhaseParticipant. -
DEFAULT_DISPATCH_RETRY_DELAY
Delay between coordinator dispatch retry attempts when a participant group has no known leader (freshly-elected raft-N where the coordinator node has not yet processed raft-N's first AppendEntries) or when the TCP send fails (transient NOT_LEADER bounce / connection not-yet-ready). The retry gives in-flight leadership propagation a chance to complete before the phase timeout fires — without it, raft-1's first election race would deterministically abort every cross-group TX until leader heartbeats stabilise (BLK-2026-04-22-001 Phase C S4b). -
DEFAULT_RECOVERY_DECISION_REPLAY_GRACE
Brief grace window used during raft-0 leader recovery before an undecided prepared transaction is defaulted to ABORT. A newly elected leader can observe commitIndex caught up just before a previously committed TX_COORD_DECIDE entry is applied locally; this window lets that durable decision replay win instead of racing an incorrect recovered ABORT. -
DEFAULT_DISPATCH_MAX_ATTEMPTS
public static final int DEFAULT_DISPATCH_MAX_ATTEMPTSMaximum dispatch attempts per (txId, groupId, phase). At the default delay this bounds total retry wall time at 5s — well under the 30s phase timeout, so a retry exhaustion still lets the phase timeout fire with the same semantics as before.- See Also:
-
-
Constructor Details
-
TwoPhaseCoordinator
public TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler) -
TwoPhaseCoordinator
public TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention) -
TwoPhaseCoordinator
public TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention, @NonNull Duration dispatchRetryDelay, int dispatchMaxAttempts)
-
-
Method Details
-
setMetrics
Install the Micrometer metrics facade. Called fromCacheNode.initializeMetrics()onceLoomMetricsis ready; prior to that, recording sites are no-ops. Session 9 ESC-9. -
start
public @NonNull CompletableFuture<TransactionResult> start(@NonNull ReplicatedTransactionCommand command) Start a new cross-group transaction. Returns a future that completes with the commit outcome once every participant has acknowledged — or the decide timeout fires — whichever comes first.- Parameters:
command- the replicated transaction payload (non-null). The command'sReplicatedTransactionCommand.mapNamesByGroup()must include an entry for every group the transaction touches.- Returns:
- a future that never completes exceptionally; failures are encoded
as
TransactionResult.failure(List).
-
applyCoordPrepare
Apply a TX_COORD_PREPARE entry from raft-0's state machine. Runs on every raft-0 replica (leader AND followers) so each replica's coord has the in-flight TX roster available for recovery if the current leader dies.HARD SAFETY RAIL 5 invariant: this method MUST be idempotent across snapshot install + log replay. Re-applying a COORD_PREPARE for a txId that already has pending state is a no-op. Re-applying before local
start(ReplicatedTransactionCommand)added the state creates it from the record's contents. -
applyCoordDecide
Apply a TX_COORD_DECIDE entry from raft-0's state machine. Records the coordinator's decision durably so a new leader can re-broadcast DECIDE_GROUP honouring the original decision instead of defaulting to ABORT.HARD SAFETY RAIL 5 invariant: idempotent. Re-applying a COORD_DECIDE with the same decision leaves state unchanged; if the in-memory state is missing (e.g. completed-and-cleaned window) we just log and return.
-
onBecomeLeader
public void onBecomeLeader()Called fromCacheNode's raft-0 leader-change listener when THIS node becomes raft-0 leader. Iterates any pending CoordinatorState carried over from the prior leader's log-replicated TX_COORD_PREPARE entries and (a) for states WITH a decision (TX_COORD_DECIDE already replicated): re-broadcasts TX_DECIDE_GROUP to all participants — idempotent on the participant side, (b) for states WITHOUT a decision (old coord died in PREPARE phase): records an ABORT decision via TX_COORD_DECIDE + broadcasts TX_DECIDE_GROUP(ABORT). The ABORT default is safe because we cannot tell under the failed coord whether participants voted, and an ABORT preserves correctness (participants with prepared intent apply the ABORT and release locks). -
onPrepareAck
Handle an inboundTwoPhaseCommands.PrepareAckfrom a participant. No-ops if the transaction has already been decided or cleaned up. -
onDecideAck
Handle an inboundTwoPhaseCommands.DecideAckfrom a participant. -
onDecideQuery
public void onDecideQuery(@NonNull TwoPhaseCommands.DecideQuery query, @NonNull String sourceNodeId) Respond to a recovering participant'sTwoPhaseCommands.DecideQueryby resending the durable decision. If the coordinator still has an undecided prepared record, the query itself is treated as a recovery signal: the coordinator durably decides ABORT and broadcasts it through the normal decision path. Unknown transactions are deliberately not answered because a stale participant can outlive the coordinator's retention window; replying ABORT there could conflict with a previously committed decision.- Parameters:
query- the query from a participantsourceNodeId- the node that sent the query (reply target)
-
pendingCount
public int pendingCount()- Returns:
- the number of in-flight or pending-cleanup transactions. Primarily for metrics and tests.
-
hasUnresolvedVisibilityBarrier
public boolean hasUnresolvedVisibilityBarrier()
-