com.loomcache.server.transaction.twopc.TwoPhaseCoordinator

public final class TwoPhaseCoordinator extends Object

Coordinator side of the cross-group 2-phase commit protocol introduced in BLK-2026-04-22-001 Phase C.

Responsibilities:

Split a ReplicatedTransactionCommand into per-participant-group slices via CrossGroupTransactionContext.
Scatter TwoPhaseCommands.PrepareGroup to each participant group's leader (resolved through raftGroupManager) and collect the votes.
Broadcast the resulting TwoPhaseCommands.DecideGroup to each participant and complete the client-visible future once every participant has either acknowledged the decision or the decide-timeout has fired.
Serve TwoPhaseCommands.DecideQuery replies from a recovering participant (coordinator looks the txId up in pending and replies with the recorded decision, defaulting to ABORT if unknown).

Durability model

Coordinator-side intent and decision records are replicated through raft-0 as TX_COORD_PREPARE/TX_COORD_DECIDE entries. The coordinator waits for each control-plane entry to commit and apply before it scatters participant PREPARE or DECIDE messages, so a raft-0 leader crash cannot leave participants ahead of the durable coordinator state.

If a new raft-0 leader replays a prepared-but-undecided transaction, it completes recovery by broadcasting ABORT. If it replays a decided record, it re-broadcasts the recorded decision to let participants finish idempotently.

Thread-safety

All public methods are safe for concurrent invocation. Inbound ACKs arrive from the protocol handler thread; timeouts fire on the scheduled-executor thread. Per-txId state is protected with a dedicated object monitor to serialize vote collection, decision derivation, and state cleanup.

Field Summary

Fields

Modifier and Type

Field

Description

static final Duration

DEFAULT_DECIDE_TIMEOUT

Default decide-phase ack timeout.

static final Duration

DEFAULT_DECIDED_RETENTION

How long to retain a completed coordinator record so that a straggling participant's TwoPhaseCommands.DecideQuery can still see the correct decision.

static final int

DEFAULT_DISPATCH_MAX_ATTEMPTS

Maximum dispatch attempts per (txId, groupId, phase).

static final Duration

DEFAULT_DISPATCH_RETRY_DELAY

Delay between coordinator dispatch retry attempts when a participant group has no known leader (freshly-elected raft-N where the coordinator node has not yet processed raft-N's first AppendEntries) or when the TCP send fails (transient NOT_LEADER bounce / connection not-yet-ready).

static final Duration

DEFAULT_PREPARE_TIMEOUT

Default vote-collection timeout — bounded by the client TX timeout in practice.

static final Duration

DEFAULT_RECOVERY_DECISION_REPLAY_GRACE

Brief grace window used during raft-0 leader recovery before an undecided prepared transaction is defaulted to ABORT.
Constructor Summary

Constructors

Constructor

Description

TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler)

TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention)

TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention, @NonNull Duration dispatchRetryDelay, int dispatchMaxAttempts)
Method Summary

Modifier and Type

Method

Description

void

applyCoordDecide(@NonNull TwoPhaseCommands.CoordDecide cmd)

Apply a TX_COORD_DECIDE entry from raft-0's state machine.

void

applyCoordPrepare(@NonNull TwoPhaseCommands.CoordPrepare cmd)

Apply a TX_COORD_PREPARE entry from raft-0's state machine.

boolean

hasUnresolvedVisibilityBarrier()

void

onBecomeLeader()

Called from CacheNode's raft-0 leader-change listener when THIS node becomes raft-0 leader.

void

onDecideAck(@NonNull TwoPhaseCommands.DecideAck ack)

Handle an inbound TwoPhaseCommands.DecideAck from a participant.

void

onDecideQuery(@NonNull TwoPhaseCommands.DecideQuery query, @NonNull String sourceNodeId)

Respond to a recovering participant's TwoPhaseCommands.DecideQuery by resending the durable decision.

void

onPrepareAck(@NonNull TwoPhaseCommands.PrepareAck ack)

Handle an inbound TwoPhaseCommands.PrepareAck from a participant.

int

pendingCount()

void

setMetrics(@Nullable LoomMetrics metrics)

Install the Micrometer metrics facade.

@NonNull CompletableFuture<TransactionResult>

start(@NonNull ReplicatedTransactionCommand command)

Start a new cross-group transaction.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- DEFAULT_PREPARE_TIMEOUT
  
  public static final Duration DEFAULT_PREPARE_TIMEOUT
  
  Default vote-collection timeout — bounded by the client TX timeout in practice.
- DEFAULT_DECIDE_TIMEOUT
  
  public static final Duration DEFAULT_DECIDE_TIMEOUT
  
  Default decide-phase ack timeout. ABORT can finish when this fires; COMMIT with missing participant ACKs completes fail-closed for the client while retaining the durable decision for recovery queries.
- DEFAULT_DECIDED_RETENTION
  
  public static final Duration DEFAULT_DECIDED_RETENTION
  
  How long to retain a completed coordinator record so that a straggling participant's TwoPhaseCommands.DecideQuery can still see the correct decision. A shorter TTL saves memory at the cost of potential divergence if a participant is offline longer than this window; the default mirrors the orphaned-intent timeout in TwoPhaseParticipant.
- DEFAULT_DISPATCH_RETRY_DELAY
  
  public static final Duration DEFAULT_DISPATCH_RETRY_DELAY
  
  Delay between coordinator dispatch retry attempts when a participant group has no known leader (freshly-elected raft-N where the coordinator node has not yet processed raft-N's first AppendEntries) or when the TCP send fails (transient NOT_LEADER bounce / connection not-yet-ready). The retry gives in-flight leadership propagation a chance to complete before the phase timeout fires — without it, raft-1's first election race would deterministically abort every cross-group TX until leader heartbeats stabilise (BLK-2026-04-22-001 Phase C S4b).
- DEFAULT_RECOVERY_DECISION_REPLAY_GRACE
  
  public static final Duration DEFAULT_RECOVERY_DECISION_REPLAY_GRACE
  
  Brief grace window used during raft-0 leader recovery before an undecided prepared transaction is defaulted to ABORT. A newly elected leader can observe commitIndex caught up just before a previously committed TX_COORD_DECIDE entry is applied locally; this window lets that durable decision replay win instead of racing an incorrect recovered ABORT.
- DEFAULT_DISPATCH_MAX_ATTEMPTS
  public static final int DEFAULT_DISPATCH_MAX_ATTEMPTS
  
  Maximum dispatch attempts per (txId, groupId, phase). At the default delay this bounds total retry wall time at 5s — well under the 30s phase timeout, so a retry exhaustion still lets the phase timeout fire with the same semantics as before.
  
  See Also:
  
  Constant Field Values
Constructor Details
- TwoPhaseCoordinator
  
  public TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler)
- TwoPhaseCoordinator
  
  public TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention)
- TwoPhaseCoordinator
  
  public TwoPhaseCoordinator(@NonNull String nodeId, @NonNull RaftGroupManagerApi raftGroupManager, @NonNull PartitionRouter partitionRouter, @NonNull PeerMessageDispatch dispatch, @NonNull ScheduledExecutorService scheduler, @NonNull Duration prepareTimeout, @NonNull Duration decideTimeout, @NonNull Duration decidedRetention, @NonNull Duration dispatchRetryDelay, int dispatchMaxAttempts)
Method Details
- setMetrics
  
  public void setMetrics(@Nullable LoomMetrics metrics)
  
  Install the Micrometer metrics facade. Called from CacheNode.initializeMetrics() once LoomMetrics is ready; prior to that, recording sites are no-ops. Session 9 ESC-9.
- start
  
  public @NonNull CompletableFuture<TransactionResult> start(@NonNull ReplicatedTransactionCommand command)
  
  Start a new cross-group transaction. Returns a future that completes with the commit outcome once every participant has acknowledged — or the decide timeout fires — whichever comes first.
  
  Parameters:
  
  command - the replicated transaction payload (non-null). The command's ReplicatedTransactionCommand.mapNamesByGroup() must include an entry for every group the transaction touches.
  
  Returns:
  
  a future that never completes exceptionally; failures are encoded as TransactionResult.failure(List).
- applyCoordPrepare
  
  public void applyCoordPrepare(@NonNull TwoPhaseCommands.CoordPrepare cmd)
  
  Apply a TX_COORD_PREPARE entry from raft-0's state machine. Runs on every raft-0 replica (leader AND followers) so each replica's coord has the in-flight TX roster available for recovery if the current leader dies.
  HARD SAFETY RAIL 5 invariant: this method MUST be idempotent across snapshot install + log replay. Re-applying a COORD_PREPARE for a txId that already has pending state is a no-op. Re-applying before local start(ReplicatedTransactionCommand) added the state creates it from the record's contents.
- applyCoordDecide
  
  public void applyCoordDecide(@NonNull TwoPhaseCommands.CoordDecide cmd)
  
  Apply a TX_COORD_DECIDE entry from raft-0's state machine. Records the coordinator's decision durably so a new leader can re-broadcast DECIDE_GROUP honouring the original decision instead of defaulting to ABORT.
  HARD SAFETY RAIL 5 invariant: idempotent. Re-applying a COORD_DECIDE with the same decision leaves state unchanged; if the in-memory state is missing (e.g. completed-and-cleaned window) we just log and return.
- onBecomeLeader
  
  public void onBecomeLeader()
  
  Called from CacheNode's raft-0 leader-change listener when THIS node becomes raft-0 leader. Iterates any pending CoordinatorState carried over from the prior leader's log-replicated TX_COORD_PREPARE entries and (a) for states WITH a decision (TX_COORD_DECIDE already replicated): re-broadcasts TX_DECIDE_GROUP to all participants — idempotent on the participant side, (b) for states WITHOUT a decision (old coord died in PREPARE phase): records an ABORT decision via TX_COORD_DECIDE + broadcasts TX_DECIDE_GROUP(ABORT). The ABORT default is safe because we cannot tell under the failed coord whether participants voted, and an ABORT preserves correctness (participants with prepared intent apply the ABORT and release locks).
- onPrepareAck
  
  public void onPrepareAck(@NonNull TwoPhaseCommands.PrepareAck ack)
  
  Handle an inbound TwoPhaseCommands.PrepareAck from a participant. No-ops if the transaction has already been decided or cleaned up.
- onDecideAck
  
  public void onDecideAck(@NonNull TwoPhaseCommands.DecideAck ack)
  
  Handle an inbound TwoPhaseCommands.DecideAck from a participant.
- onDecideQuery
  
  public void onDecideQuery(@NonNull TwoPhaseCommands.DecideQuery query, @NonNull String sourceNodeId)
  
  Respond to a recovering participant's TwoPhaseCommands.DecideQuery by resending the durable decision. If the coordinator still has an undecided prepared record, the query itself is treated as a recovery signal: the coordinator durably decides ABORT and broadcasts it through the normal decision path. Unknown transactions are deliberately not answered because a stale participant can outlive the coordinator's retention window; replying ABORT there could conflict with a previously committed decision.
  
  Parameters:
  
  query - the query from a participant
  
  sourceNodeId - the node that sent the query (reply target)
- pendingCount
  
  public int pendingCount()
  
  Returns:
  
  the number of in-flight or pending-cleanup transactions. Primarily for metrics and tests.
- hasUnresolvedVisibilityBarrier
  
  public boolean hasUnresolvedVisibilityBarrier()

Class TwoPhaseCoordinator

Durability model

Thread-safety

Field Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Field Details

DEFAULT_PREPARE_TIMEOUT

DEFAULT_DECIDE_TIMEOUT

DEFAULT_DECIDED_RETENTION

DEFAULT_DISPATCH_RETRY_DELAY

DEFAULT_RECOVERY_DECISION_REPLAY_GRACE

DEFAULT_DISPATCH_MAX_ATTEMPTS

Constructor Details

TwoPhaseCoordinator

TwoPhaseCoordinator

TwoPhaseCoordinator

Method Details

setMetrics

start

applyCoordPrepare

applyCoordDecide

onBecomeLeader

onPrepareAck

onDecideAck

onDecideQuery

pendingCount

hasUnresolvedVisibilityBarrier