Chaos Testing
LoomCache ships a self-contained Java chaos-testing framework under loom-server/src/test/java/com/loomcache/server/chaos.
There is no external dependency or separate runtime — histories, checkers, and nemeses are all pure Java.
Methodology
Section titled “Methodology”A run:
- Generates operations — concurrent clients drive reads/writes/CAS against the cluster (
ChaosWorkloadproduces Register / Counter / Queue / Set / Mixed workloads;ChaosClient/ChaosRealClientdrive them). - Injects faults —
ChaosNemesisis a sealed fault family: symmetric partition, node isolation, message reorder, kill node / kill leader / pause node, clock skew, slow disk (message-send delay), memory pressure, and CPU contention, composable viaChaosNemesis.Combined. - Records a history —
ChaosHistorycaptures invocation and completion timestamps. - Verifies consistency —
WglLinearizabilityChecker(WGL = Wing-Gong-Lowe search) or the per-modelChaosCheckercheckers validate the history.
Framework
Section titled “Framework”ChaosTestHarness— wires workload + nemesis + cluster into a single executable test.ChaosCluster/ChaosRealCluster— both start realCacheNodeclusters (Raft + TCP) with fault hooks.ChaosClient/ChaosRealClient— op generators.ChaosWorkload— op generators: Register / Counter / Queue / Set / Mixed workloads.ChaosNemesis— sealed fault family (see Methodology for the fault list).ChaosHistory/ChaosReport— history and reporting.ChaosChecker/WglLinearizabilityChecker—ChaosCheckerexposes the per-model checkers;WglLinearizabilityCheckeris the WGL (Wing-Gong-Lowe) search overregister/counter/queue/set.model/—Operation,CounterModel,QueueModel,LockModel,SetModel.
Test coverage
Section titled “Test coverage”There is no per-scenario file tree under tests/. The harness is exercised by exactly two test classes, both in
loom-server test sources:
ChaosFrameworkEnhancedTest(no chaos tag) — drives the framework primitives: WGL register/counter/queue/set checks (positive, violation, and malformed-input cases), the per-modelChaosCheckercheckers, everyChaosWorkloadgenerator,ChaosHistoryrecording/concurrency,ChaosReportsummary/latency/fault-timeline output, theChaosNemesisfault types against a recording cluster double, and one end-to-endChaosTestHarnessrun that starts a real 3-nodeCacheNodecluster.tests/RealClusterLinearizabilityTest(@Tag("chaos")) — starts a real 3-nodeChaosRealClusterand asserts register linearizability viaWglLinearizabilityCheckerover concurrent leader-local / per-node map operations.
Fault types
Section titled “Fault types”ChaosNemesis is a sealed interface; all faults compose via ChaosNemesis.Combined:
- Network —
SymmetricPartition,IsolateNode,MessageReorder. - Process —
KillNode,KillLeader,PauseNode. - Resource / timing —
ClockSkew(simulates clock-skew effects by callingcluster.pauseNode()— no actual system clock manipulation),SlowDisk(message-send delay),MemoryPressure,CpuContention.
Scope of guarantees
Section titled “Scope of guarantees”- The real-cluster runs (
ChaosTestHarness/ChaosRealCluster) exercise actual Raft and TCP via realCacheNodeinstances, which run the WAL and snapshot machinery as part of normal startup. - The framework-primitive tests feed hand-built histories straight into the checkers — they verify checker/history logic without touching the network stack.
WglLinearizabilityCheckeris a Wing-Gong-Lowe linearization search supporting theregister,counter,queue, andsetdata types only (it rejects any other type) and carries its own internal model state. The search is bounded byMAX_SEARCH_STATES = 500_000; when that limit is reached the checker conservatively reports a violation rather than returning a heuristic pass.- The per-model
ChaosCheckercheckers cover the rest:LinearizableRegister,LinearizableCounter(CounterModel),LinearizableQueue(QueueModel),LinearizableSet(SetModel), andMutualExclusion(LockModel, fence-token + double-lock checks). RealClusterLinearizabilityTestlives inloom-servertest sources (notloom-integration-tests); it boots three realCacheNodeinstances per test, reserving fork-scoped TCP ports to stay parallel-safe underforkCount=64 / threadCount=4.
Running
Section titled “Running”# Framework-primitive + one end-to-end harness run (no chaos tag, runs by default):mvn -pl loom-server test -Dut.forkCount=64 -Dut.threadCount=4 -Dit.forkCount=64 -Dit.threadCount=4 -Dtest=ChaosFrameworkEnhancedTest# Real 3-node cluster register-linearizability run. RealClusterLinearizabilityTest is @Tag("chaos"),# which the default unit lane excludes (ut.excludedGroups=benchmark,chaos), so opt in with -Dgroups=chaos# and override excludedGroups to keep only benchmark excluded:mvn -pl loom-server test -Dut.forkCount=64 -Dut.threadCount=4 -Dit.forkCount=64 -Dit.threadCount=4 -Dgroups=chaos -Dut.excludedGroups=benchmark -Dtest=RealClusterLinearizabilityTestThe framework-primitive checks complete in seconds. The real-cluster runs (ChaosTestHarness inside
ChaosFrameworkEnhancedTest, and RealClusterLinearizabilityTest) boot real CacheNode instances and elect a Raft
leader, so they take longer; per-client op joins use a 30 s ceiling.
Reporting
Section titled “Reporting”ChaosReport emits a human-readable summary plus a full history trace for failing runs. Replay the trace through
WglLinearizabilityChecker to reproduce and debug locally.
For the broader correctness story (Raft invariants, durability, near-cache coherence), see the architecture overview.