Class TwoPhaseCoordinator

java.lang.Object
com.loomcache.server.transaction.twopc.TwoPhaseCoordinator

public final class TwoPhaseCoordinator extends Object
Coordinator side of the cross-group 2-phase commit protocol introduced in BLK-2026-04-22-001 Phase C.

Responsibilities:

Durability model

Coordinator-side intent and decision records are replicated through raft-0 as TX_COORD_PREPARE/TX_COORD_DECIDE entries. The coordinator waits for each control-plane entry to commit and apply before it scatters participant PREPARE or DECIDE messages, so a raft-0 leader crash cannot leave participants ahead of the durable coordinator state.

If a new raft-0 leader replays a prepared-but-undecided transaction, it completes recovery by broadcasting ABORT. If it replays a decided record, it re-broadcasts the recorded decision to let participants finish idempotently.

Thread-safety

All public methods are safe for concurrent invocation. Inbound ACKs arrive from the protocol handler thread; timeouts fire on the scheduled-executor thread. Per-txId state is protected with a dedicated object monitor to serialize vote collection, decision derivation, and state cleanup.
  • Field Details

    • DEFAULT_PREPARE_TIMEOUT

      public static final Duration DEFAULT_PREPARE_TIMEOUT
      Default vote-collection timeout — bounded by the client TX timeout in practice.
    • DEFAULT_DECIDE_TIMEOUT

      public static final Duration DEFAULT_DECIDE_TIMEOUT
      Default decide-phase ack timeout. ABORT can finish when this fires; COMMIT with missing participant ACKs completes fail-closed for the client while retaining the durable decision for recovery queries.
    • DEFAULT_DECIDED_RETENTION

      public static final Duration DEFAULT_DECIDED_RETENTION
      How long to retain a completed coordinator record so that a straggling participant's TwoPhaseCommands.DecideQuery can still see the correct decision. A shorter TTL saves memory at the cost of potential divergence if a participant is offline longer than this window; the default mirrors the orphaned-intent timeout in TwoPhaseParticipant.
    • DEFAULT_DISPATCH_RETRY_DELAY

      public static final Duration DEFAULT_DISPATCH_RETRY_DELAY
      Delay between coordinator dispatch retry attempts when a participant group has no known leader (freshly-elected raft-N where the coordinator node has not yet processed raft-N's first AppendEntries) or when the TCP send fails (transient NOT_LEADER bounce / connection not-yet-ready). The retry gives in-flight leadership propagation a chance to complete before the phase timeout fires — without it, raft-1's first election race would deterministically abort every cross-group TX until leader heartbeats stabilise (BLK-2026-04-22-001 Phase C S4b).
    • DEFAULT_RECOVERY_DECISION_REPLAY_GRACE

      public static final Duration DEFAULT_RECOVERY_DECISION_REPLAY_GRACE
      Brief grace window used during raft-0 leader recovery before an undecided prepared transaction is defaulted to ABORT. A newly elected leader can observe commitIndex caught up just before a previously committed TX_COORD_DECIDE entry is applied locally; this window lets that durable decision replay win instead of racing an incorrect recovered ABORT.
    • DEFAULT_DISPATCH_MAX_ATTEMPTS

      public static final int DEFAULT_DISPATCH_MAX_ATTEMPTS
      Maximum dispatch attempts per (txId, groupId, phase). At the default delay this bounds total retry wall time at 5s — well under the 30s phase timeout, so a retry exhaustion still lets the phase timeout fire with the same semantics as before.
      See Also:
  • Constructor Details

  • Method Details

    • setMetrics

      public void setMetrics(@Nullable LoomMetrics metrics)
      Install the Micrometer metrics facade. Called from CacheNode.initializeMetrics() once LoomMetrics is ready; prior to that, recording sites are no-ops. Session 9 ESC-9.
    • start

      public @NonNull CompletableFuture<TransactionResult> start(@NonNull ReplicatedTransactionCommand command)
      Start a new cross-group transaction. Returns a future that completes with the commit outcome once every participant has acknowledged — or the decide timeout fires — whichever comes first.
      Parameters:
      command - the replicated transaction payload (non-null). The command's ReplicatedTransactionCommand.mapNamesByGroup() must include an entry for every group the transaction touches.
      Returns:
      a future that never completes exceptionally; failures are encoded as TransactionResult.failure(List).
    • applyCoordPrepare

      public void applyCoordPrepare(@NonNull TwoPhaseCommands.CoordPrepare cmd)
      Apply a TX_COORD_PREPARE entry from raft-0's state machine. Runs on every raft-0 replica (leader AND followers) so each replica's coord has the in-flight TX roster available for recovery if the current leader dies.

      HARD SAFETY RAIL 5 invariant: this method MUST be idempotent across snapshot install + log replay. Re-applying a COORD_PREPARE for a txId that already has pending state is a no-op. Re-applying before local start(ReplicatedTransactionCommand) added the state creates it from the record's contents.

    • applyCoordDecide

      public void applyCoordDecide(@NonNull TwoPhaseCommands.CoordDecide cmd)
      Apply a TX_COORD_DECIDE entry from raft-0's state machine. Records the coordinator's decision durably so a new leader can re-broadcast DECIDE_GROUP honouring the original decision instead of defaulting to ABORT.

      HARD SAFETY RAIL 5 invariant: idempotent. Re-applying a COORD_DECIDE with the same decision leaves state unchanged; if the in-memory state is missing (e.g. completed-and-cleaned window) we just log and return.

    • onBecomeLeader

      public void onBecomeLeader()
      Called from CacheNode's raft-0 leader-change listener when THIS node becomes raft-0 leader. Iterates any pending CoordinatorState carried over from the prior leader's log-replicated TX_COORD_PREPARE entries and (a) for states WITH a decision (TX_COORD_DECIDE already replicated): re-broadcasts TX_DECIDE_GROUP to all participants — idempotent on the participant side, (b) for states WITHOUT a decision (old coord died in PREPARE phase): records an ABORT decision via TX_COORD_DECIDE + broadcasts TX_DECIDE_GROUP(ABORT). The ABORT default is safe because we cannot tell under the failed coord whether participants voted, and an ABORT preserves correctness (participants with prepared intent apply the ABORT and release locks).
    • onPrepareAck

      public void onPrepareAck(@NonNull TwoPhaseCommands.PrepareAck ack)
      Handle an inbound TwoPhaseCommands.PrepareAck from a participant. No-ops if the transaction has already been decided or cleaned up.
    • onDecideAck

      public void onDecideAck(@NonNull TwoPhaseCommands.DecideAck ack)
      Handle an inbound TwoPhaseCommands.DecideAck from a participant.
    • onDecideQuery

      public void onDecideQuery(@NonNull TwoPhaseCommands.DecideQuery query, @NonNull String sourceNodeId)
      Respond to a recovering participant's TwoPhaseCommands.DecideQuery by resending the durable decision. If the coordinator still has an undecided prepared record, the query itself is treated as a recovery signal: the coordinator durably decides ABORT and broadcasts it through the normal decision path. Unknown transactions are deliberately not answered because a stale participant can outlive the coordinator's retention window; replying ABORT there could conflict with a previously committed decision.
      Parameters:
      query - the query from a participant
      sourceNodeId - the node that sent the query (reply target)
    • pendingCount

      public int pendingCount()
      Returns:
      the number of in-flight or pending-cleanup transactions. Primarily for metrics and tests.
    • hasUnresolvedVisibilityBarrier

      public boolean hasUnresolvedVisibilityBarrier()