Persistence Design
Persistence makes committed Raft and state-machine data survive crashes, node replacement, and controlled
backup/restore operations. It is local to each member and depends on a stable nodeId and durable dataDir.
Components
Section titled “Components”WalWriter,WalReader, andWalCompactorown append-only records, CRC32 checksums, rotation, fsync, and compaction.PersistentRaftLogstores Raft log entries when persistence is enabled.RaftMetadataStorepersists term, voted-for, and commit-index metadata.SnapshotManager,SnapshotStore,SnapshotScheduler,DeltaSnapshot, andSnapshotChainmanage snapshots.StateMachineSnapshotManagercaptures registered data-structure state.HotBackupManagerandHotBackupSchedulerproduce group snapshots and manifests under the backup directory.
Write path
Section titled “Write path”- The Raft leader appends a command to its log.
- Durable log and metadata updates use the configured fsync mode.
- The committed command applies to the in-memory state machine.
- WAL compaction waits until snapshots cover the compacted range.
Recovery path
Section titled “Recovery path”- Validate metadata, WAL segments, sidecar checksums, and snapshot metadata.
- Load the newest valid snapshot chain.
- Replay records newer than the snapshot index.
- Reject startup when the selected recovery policy forbids the local data shape.
Invariants
Section titled “Invariants”- CRC and sidecar checksum validation run before replay.
- Snapshot install cannot silently skip registered data structures.
- WAL compaction cannot remove records not covered by a durable snapshot.
- Quorum-loss restore must be explicit.
Failure behavior
Section titled “Failure behavior”A single failed machine can be replaced from its durable dataDir or documented backup path if quorum survived.
Corrupt local files fail closed unless the selected recovery policy allows partial local recovery. Hot Backup is
point-in-time; it does not replace per-node WAL durability.
Verification
Section titled “Verification”WAL durability, crash recovery, CRC validation, compaction, disk-fault, graceful restart, snapshot store, and Hot Backup tests cover this layer. Operators watch fsync latency, segment age, snapshot duration, validation errors, backup age, and startup recovery logs.