Cassian Gate v79 — Operator Cheat Sheet
(Operator reference — supporting surface; execution and artifacts remain authoritative)
This document describes the user-facing execution model for Cassian Gate.
It reflects implemented CLI behavior only and does not replace deterministic execution or authoritative artifacts.
Cassian Gate is a deterministic, artifact-authoritative network change-validation gate.
It is for: - network engineers validating planned changes before production - platform and infrastructure engineers using a CI-safe network gate - operators who need explicit pass/fail artifacts and deterministic execution
It is not yet for: - users seeking a broad network automation platform - users expecting generic multi-vendor feature parity - users wanting exploratory labs or AI output to act as deployment authority
Cassian Gate is a:
Deterministic Network Change Validation Gate
Execution is:
- deterministic
- reproducible
- artifact-backed
- CI-safe
- non-heuristic
1️⃣ What Cassian Gate Is (and Is Not)
Cassian Gate IS
- a network change validation gate
- a deterministic execution engine
- a CI pipeline safety check
- a behavior validation system
Cassian Gate IS NOT
- a general network lab builder
- a chaos framework
- a retry system
- a configuration merge engine
- an AI decision system
2️⃣ Command Index
Environment
Execution (Validation)
Inspection
DevOps Integration
AI Assistance (optional / advisory only)
AI never affects execution or verdicts.
3️⃣ Two Execution Modes (CRITICAL)
Understanding this distinction is mandatory.
🔷 Gate Mode (Authoritative Validation)
Command:
Gate mode automatically performs:
- Clean-state destroy (if needed)
- Deploy
- Provision
- Execute tests
- Collect artifacts
- Destroy lab
Returns deterministic exit codes.
Gate mode is used for:
- production validation
- CI pipelines
- change validation
- baseline vs candidate comparison
You do NOT run cassian up first.
Gate mode owns the lifecycle.
Important summary boundary
The human-readable results.summary.txt file is not the verdict authority.
Use:
results.jsonfor authoritative verdict sharing in CI, tickets, and PRsresults.summary.txtfor human-readable explanation only
The summary now explicitly states:
- what PASS means
- what PASS does not mean
- what FAIL means
- which artifact to share
Zero-assertion gate runs are rejected
If the topology contains:
- no
tests - no
scenarios
then:
Meaning:
cassian test <topology.yaml>requires at least one declared assertion- a zero-assertion topology is not a valid validation gate
- no PASS or FAIL verdict is produced
- no lifecycle execution begins
- no lab/artifacts are created
- exit code is
2(usage / contract error)
Important boundary:
- this rule applies to authoritative gate execution with
cassian test <topology.yaml> - it does not block
cassian run <topology.yaml> - it does not change replay behavior
cassian replay — Deterministic replay of prior artifacts
Replay re-executes a previous Cassian Gate run from previously generated artifacts.
Replay is a reproduction/analysis surface, not a new authority path.
Authority is preserved from the replayed source context.
Inputs
Replay consumes artifacts from a previous run:
These are generated replay inputs.
Important boundary:
- artifact reuse for replay does not make replay a new source of authority
- shared artifact shape does not imply shared authority
- authority still depends on the replay mode and source context
Gate replay (authoritative context preserved)
Replay a prior authoritative gate run:
This preserves gate / authoritative context.
Meaning:
- authoritative validation path
- clean-state lifecycle context is preserved from the source gate run
- CI-safe verdict semantics remain tied to the original authoritative context
You can also verify deterministic result equivalence:
With --verify-results, replay checks deterministic result equivalence against the source artifacts and fails on mismatch.
Non-gate replay (non-authoritative context preserved)
Replay without --gate keeps replay in a non-authoritative exploration context.
Example:
Meaning:
- replayed exploration context remains non-authoritative
- useful for inspection/debugging only
This path is useful for:
- inspection
- investigation
- iterative debugging
- bringing replayed runtime up for manual follow-up commands
This does not upgrade exploration artifacts into gate proof.
When to use replay
Use replay when you want deterministic reproduction of a prior run.
Typical uses:
- reproducing a prior authoritative gate result
- replaying a prior exploration run for investigation
- checking deterministic stability
- debugging unexpected behavior from existing artifacts
Replay summary boundary
Replay preserves the same authority boundary messaging in results.summary.txt.
Meaning:
- replay does not create a new authority model
results.jsonremains authoritativeresults.summary.txtremains explanatory only
Important boundary
Replay:
- preserves prior context
- does not create a parallel authority model
- does not make exploration authoritative
- does not change verdict/exit semantics by itself
🔷 Exploration Mode (Non-Authoritative)
Used for interactive debugging and inspection.
Two approaches exist.
Option A — run
Meaning:
- exploration only
- non-authoritative
- useful for debugging, not for proof
Typical workflow shape:
By default the lab is destroyed.
Keep the lab running:
Exploration summary boundary
Even when run mode produces results artifacts, results.summary.txt remains explanatory only.
Use results.json as the authoritative verdict artifact when you need the exact recorded result.
Run mode itself remains non-authoritative as a workflow mode.
Option B — Explicit Lifecycle
Use this when you want:
- a persistent exploratory lab
- manual inspection
- iterative debugging
Important boundary:
- this path is for exploration and inspection
- authoritative validation still runs through
cassian test <topology.yaml>
Lifecycle Comparison
| Feature | Gate Mode | Exploration |
|---|---|---|
| Clean-state enforced | Yes | Optional |
| Auto destroy | Yes | Optional |
| CI-safe | Yes | No |
| Interactive inspection | No | Yes |
| Authoritative verdict | Yes | No |
4️⃣ Topology vs Lab Name
Many commands accept different inputs.
Commands That Use a Topology File
Commands That Use a Lab Name
Where does lab name come from?
Defined inside topology:
Displayed during execution:
5️⃣ Topology Authoring
Cassian Gate consumes YAML topology definitions.
Minimal Example
Required Keys
Required:
namenodeslinks
Optional:
testsscenariospacksfabriccandidate_changesvlans
Invariant Packs (Loaded and Expanded During Resolve)
Cassian Gate supports declarative invariant packs that are loaded from the supported local pack surface, compatibility-checked, and then expanded into explicit invariant declarations during Resolve.
Packs are optional authoring shortcuts. The authoritative validation still comes later from the expanded invariant verdicts.
Packs are:
- declarative only
- loaded locally and deterministically
- compatibility-checked before expansion
- expanded deterministically during Resolve
- written as explicit tests in
topology.resolved.yaml - non-authoritative by themselves
Packs do not:
- execute code
- change lifecycle behavior
- introduce runtime-only semantics
- change authority boundaries
- load from remote registries
- use fallback or best-match lookup
Later validation still comes from the resulting invariant verdicts.
Pack Declaration
Example:
Rules:
packsmust be a list- each pack entry must be a non-empty string
- pack lookup is deterministic and local only
- unknown pack names fail fast with exit code
2 - incompatible pack contents fail fast with exit code
2 - pack expansion must be deterministic
Current Supported Pack
Typical behavior for supported pack usage:
- loads from the supported local pack surface
- undergoes compatibility checks before expansion
- expands during Resolve into explicit invariant tests
- later phases consume the expanded invariants
Example
Operator Commands
Validate local pack loading and compatibility enforcement:
Run authoritative gate execution of the accepted expanded invariants:
Negative misuse proofs:
Typical outcomes:
- valid local pack topology is accepted
- unknown pack references are rejected
- incompatible pack contents are rejected
Artifact Note
After Resolve, the expanded invariant list appears explicitly in:
These expanded tests are generated inputs for later execution only.
Authority still comes from the later invariant verdicts in:
6️⃣ Nodes
Supported node types:
| Type | Description |
|---|---|
| frr | FRR router |
| host | Linux host |
| nft-fw | nftables firewall |
| sonic-vm | SONiC VM runtime |
7️⃣ Links
Example:
If ipv4 is omitted:
/31addresses auto-assigned
View assigned addresses:
8️⃣ EVPN Runtime Substrate (Generation Support)
Cassian Gate supports a deterministic EVPN topology/config generation substrate for a limited, explicit proof shape.
This support exists to produce runtime EVPN control-plane state for later validation work.
It does not make EVPN generation itself authoritative.
Generated EVPN state is supporting runtime substrate only.
Truth still comes from:
- tests
- invariants
Supported EVPN Intent Surface
Declare EVPN only under:
Required EVPN fields:
fabric.evpn.enabledfabric.evpn.modefabric.evpn.asn
Supported mode:
vlan-aware
Supported Node Shape
EVPN participants currently use frr nodes with explicit roles.
Example:
Rules:
- EVPN participant nodes must use
type: frr - spine nodes must declare
evpn_rr: true - leaf nodes must not declare
evpn_rr: true - EVPN participant nodes require
router_id - leaves must have an explicit direct link to at least one RR spine
VLAN ↔ VNI Mapping
EVPN requires a top-level vlans mapping.
Example:
Rules:
- each VLAN must map to exactly one VNI
- duplicate VNI reuse is rejected
- invalid or missing VNI fails fast
Host Attachment Requirements
Host attachment must be explicit.
Example:
Required host fields for EVPN proof substrate:
attachvlanipmac
Rules:
- attached host must connect explicitly to an EVPN leaf
- host VLAN must exist in the declared VLAN/VNI map
- host MAC must be explicit
- host must have exactly one explicit link to its attached leaf
Minimal Supported Proof Shape
Supported proof shape is intentionally narrow:
- leaf/spine only
- explicit RR spine
- explicit host attachment
- one VLAN is sufficient
- deterministic MAC/IP declarations required
This support is intended to produce:
- EVPN BGP control-plane sessions
- deterministic VLAN/VNI configuration
- deterministic host attachment semantics
- deterministic runtime substrate for later MAC-route observation
Unsupported / Rejected Shapes
Cassian Gate fails fast on unsupported EVPN topology intent.
Examples include:
- EVPN declared outside
fabric.evpn - ambiguous EVPN participant selection
- unsupported node role combinations
- missing RR spine
- missing or invalid VNI
- missing explicit host attachment semantics
- shapes requiring out-of-band configuration
- heuristic peer inference
These are misuse / invalid-topology errors.
Example EVPN Runtime Generation Topology
Operator Commands
Validate the EVPN topology:
Bring up EVPN runtime substrate:
Run authoritative gate proof:
Replay deterministically:
Negative misuse proofs:
Artifact Note
topology.resolved.yaml may include additive EVPN-resolved fields for the generated proof substrate.
These fields remain generated and non-authoritative.
They support deterministic execution only.
Important Boundary
EVPN topology/config generation support:
- configures deterministic EVPN runtime substrate
- does not prove EVPN correctness by itself
- does not validate dataplane forwarding
- does not validate EVPN invariants by itself
- does not change authority semantics
Use later tests/invariants to establish truth.
9️⃣ Tests and Invariants
Cassian Gate supports both:
- active behavior tests
- deterministic invariant checks
Both produce standard authoritative results in gate mode.
Standard test kinds
Supported kinds:
pingtcpinvariant— see "Invariant tests" below for the supported invarianttypevalues
Ping Example
Required fields:
namekindsrcdst
TCP Example
Required fields:
namekindsrcdst
Invariant tests
Invariant tests use:
They validate declared truth conditions and return authoritative pass/fail results like any other test.
Blocked declared validation items
If a declared test or selected scenario reaches authoritative execution scope but cannot execute normally because execution is blocked later in the gate path, Cassian Gate records that item explicitly in results.json.
This prevents omission from being misread as success.
Typical blocked representation:
observed: blockedverdict: failerror: blocked before execution
Example meaning:
- the declared validation item existed
- it was in authoritative scope
- it did not run normally
- the result was recorded explicitly rather than omitted
Routing Invariants
Routing invariants validate specific routing truth on a named node.
They are useful when you need to prove policy outcome, path preference, route advertisement boundaries, or route attributes.
BGP Local Preference Invariant
Invariant type:
Purpose:
Verify that a BGP route installed on a node has the expected LOCAL_PREF value.
This is useful for validating routing policy behavior such as:
- inbound route-maps
- outbound policy manipulation
- policy-based path preference
- iBGP policy consistency
Typical required fields:
| Field | Description |
|---|---|
| node | Node where the route must be observed |
| prefix | Prefix being validated |
| expected | Expected BGP local preference value |
Example:
Behavior:
- The invariant inspects the routing information on the specified node.
- The route must exist and contain the declared LOCAL_PREF value.
- If the route is present but the LOCAL_PREF differs from the expected value, the invariant fails.
- If the invariant definition itself is invalid, the run fails with misuse exit code
2.
Typical exit semantics follow the standard Cassian Gate model:
- satisfied invariant → passing gate outcome
- invariant mismatch → validation failure
- invalid invariant declaration → usage / contract error
Artifacts produced:
The invariant result is recorded in the standard artifacts:
Example result entry:
Determinism notes:
- invariant evaluation occurs during the TEST phase
- replay is intended to preserve the same authority semantics as the source gate context
Route Advertised To Invariant
Invariant type:
Purpose:
Verify that a specific route is being advertised from the specified node to the specified peer.
This is useful for validating routing advertisement boundaries such as:
- expected route export to a peer
- intended prefix propagation across a boundary
- prevention of missing outbound advertisements
- verification that a route is actually being sent to a named neighbor
Required fields:
| Field | Description |
|---|---|
| node | Node where the route advertisement is checked |
| peer | Named peer that must receive the route |
| prefix | Prefix being validated |
Example:
Behavior:
- The invariant inspects supported structured advertisement evidence on the specified node.
- It passes when the specified prefix is observed as advertised to the named peer.
- It fails when the prefix is not observed as advertised to that peer.
- If the invariant definition itself is invalid, the run fails with misuse exit code
2.
Typical exit semantics follow the standard Cassian Gate model:
- satisfied invariant → passing gate outcome
- invariant mismatch → validation failure
- invalid invariant declaration → usage / contract error
Artifacts produced:
The invariant result is recorded in the standard artifacts:
Replay:
This invariant can be checked again using standard gate replay workflows.
Scope boundary:
This invariant validates only peer-scoped route advertisement presence.
It does not by itself prove:
- generic routing policy correctness
- attribute correctness
- community / AS-path behavior
- broader route-map intent
Route Not Advertised To Invariant
Invariant type:
Purpose:
Verify that a specific route is not being advertised from the specified node to the specified peer.
This is useful for validating routing advertisement boundaries such as:
- expected suppression of a prefix to a peer
- prevention of route leaks
- verification that a route is withheld from a named neighbor
- confirming that local route presence does not imply outbound advertisement
Required fields:
| Field | Description |
|---|---|
| node | Node where the route advertisement is checked |
| peer | Named peer that must not receive the route |
| prefix | Prefix being validated |
Example:
Behavior:
- The invariant inspects supported structured advertisement evidence on the specified node.
- It passes when the specified prefix is not observed as advertised to the named peer.
- It fails when the prefix is observed as advertised to that peer.
- If the invariant definition itself is invalid, the run fails with misuse exit code
2.
Typical exit semantics follow the standard Cassian Gate model:
- satisfied invariant → passing gate outcome
- invariant mismatch → validation failure
- invalid invariant declaration → usage / contract error
Artifacts produced:
The invariant result is recorded in the standard artifacts:
Replay:
This invariant can be checked again using standard gate replay workflows.
Scope boundary:
This invariant validates only peer-scoped route advertisement absence.
It does not by itself prove:
- generic routing policy correctness
- attribute correctness
- community / AS-path behavior
- broader route-map intent
BGP Session Up Invariant
Invariant type:
Purpose:
Verify that an IPv4-AFI BGP session from the specified node to a declared neighbor IPv4 address is in the FRR Established state.
This is useful for validating BGP session establishment such as:
- iBGP session presence to a known neighbor
- eBGP session presence to a known neighbor
- post-change BGP session re-establishment
- guarded assertion of session up before further routing-policy invariants
Required fields:
| Field | Description |
|---|---|
| node | Node where the BGP session is checked (FRR-typed) |
| neighbor | IPv4 literal of the BGP neighbor on that node (canonical alias dst accepted) |
Example:
Behavior:
- The invariant runs
vtysh -c 'show bgp summary json'on the specified node and parses the structured output. - It passes when the queried neighbor is present in FRR's BGP summary and its session state is
Established. - It fails when the session is in any other FRR FSM state (
Idle,Active,Connect,OpenSent,OpenConfirm), when the neighbor is not configured (engine-synthesized state literalNotConfigured), or when vtysh fails or its output is not parseable as JSON (engine-synthesized state literalUnknown). - If the invariant definition itself is invalid (missing or malformed
dstIPv4 literal), the run fails with misuse exit code2. - The retry policy mirrors the existing
bgp_neighbortest surface: retries are bounded by the test'stimeout_s(default 15 seconds) andretry_interval_s(default 1.0 seconds); the loop terminates on first vtysh-rc success and the post-retry block reads the parsed state.
Typical exit semantics follow the standard Cassian Gate model:
- satisfied invariant → passing gate outcome
- invariant mismatch → validation failure
- invalid invariant declaration → usage / contract error
Artifacts produced:
The invariant result is recorded in the standard artifacts:
When verdict: fail, the test record carries a structured observed_state payload with deterministic keys (type, peer, state, last_error, source_node). See "Failed-Invariant Observed State" below and docs/topology-schema-v1.5.md §4.1 for the full per-type schema.
Positive proof example:
Replay:
This invariant can be checked again using standard gate replay workflows.
Scope boundary:
This invariant validates only IPv4-AFI BGP session-Established truth from one node to one neighbor IPv4.
It does not by itself prove:
- EVPN-AFI session state (use
evpn_bgp_session_up) - route presence or attribute correctness
- route advertisement boundaries
- generic routing policy correctness
Route Present Invariant
Invariant type:
Purpose:
Verify that a specific route is present on the specified node's IPv4 routing table.
This is useful for validating route installation such as:
- expected RIB presence after policy or session establishment
- verification that a prefix is actually installed on the node
- guarded assertion of route presence before further routing-policy invariants
Required fields:
| Field | Description |
|---|---|
| node | Node where the route presence is checked |
| prefix | Prefix being validated (canonical IPv4 CIDR) |
Example:
Behavior:
- The invariant inspects the IPv4 routing table on the specified node.
- It passes when the queried prefix is observed in the routing table.
- It fails when the queried prefix is not observed.
- If the invariant definition itself is invalid, the run fails with misuse exit code
2.
Artifacts produced:
When verdict: fail, the test record carries a structured observed_state payload (type, prefix, routes, source_node). See docs/topology-schema-v1.5.md §4.3 for the full per-type schema.
Replay:
Route Absent Invariant
Invariant type:
Purpose:
Verify that a specific route is not present on the specified node's IPv4 routing table.
This is useful for validating intentional route absence such as:
- prefix-blackhole effectiveness
- expected suppression after withdrawal
- verification that a route is genuinely not installed
- negative-complement of
route_present
Required fields:
| Field | Description |
|---|---|
| node | Node where the route absence is checked |
| prefix | Prefix being validated (canonical IPv4 CIDR) |
Example:
Behavior:
- The invariant inspects the IPv4 routing table on the specified node.
- It passes when the queried prefix is not observed in the routing table.
- It fails when the queried prefix is observed.
- If the invariant definition itself is invalid, the run fails with misuse exit code
2.
Artifacts produced:
When verdict: fail, the test record carries a structured observed_state payload (type, prefix, routes, source_node). See docs/topology-schema-v1.5.md §4.3 for the full per-type schema.
BGP MED Equals Invariant
Invariant type:
Purpose:
Verify that a BGP route installed on a node has the expected MED (Multi-Exit Discriminator) value.
This is useful for validating routing policy behavior such as:
- inbound MED-rewriting policy
- expected MED preservation across boundaries
- iBGP MED propagation consistency
- companion to
bgp_localpref_equalsfor full attribute coverage
Required fields:
| Field | Description |
|---|---|
| node | Node where the route must be observed |
| prefix | Prefix being validated |
| expected | Expected BGP MED value (integer) |
Example:
Behavior:
- The invariant inspects the BGP route entry on the specified node.
- The route must exist and contain the declared MED value.
- If the route is present but the MED differs from the expected value, the invariant fails.
- If the invariant definition itself is invalid, the run fails with misuse exit code
2.
Artifacts produced:
When verdict: fail, the test record carries a structured observed_state payload (type, prefix, peer, actual, expected, source_node). See docs/topology-schema-v1.5.md §4.5 for the full per-type schema.
OSPF Neighbor Up Invariant
Invariant type:
Purpose:
Verify that an OSPF neighbor adjacency from the specified node to a declared peer router-ID has reached the expected FSM state (default Full).
This is useful for validating OSPF adjacency establishment such as:
- backbone-area neighbor convergence
- post-change OSPF re-adjacency
- guarded assertion of OSPF Full adjacency before further routing-policy invariants
This invariant is FRR-only; declaring ospf_neighbor_up against a non-FRR src node is rejected at validation with exit code 2.
Required fields:
| Field | Description |
|---|---|
| src | Node where the OSPF neighbor table is checked (must be type: frr) |
| neighbor | IPv4 literal of the peer's OSPF router-ID (NOT a node name) |
Optional fields:
| Field | Description |
|---|---|
| state | Expected FSM state literal; one of Down, Attempt, Init, 2-Way, ExStart, Exchange, Loading, Full. Default Full materialised at Resolve. |
The companion node-level ospf: block (declared on FRR nodes) carries area (int ≥ 0, required) and networks (non-empty list of canonical IPv4 CIDRs, required); declaring ospf: requires the node to also declare top-level router_id. Timer customization (hello-interval, dead-interval, spf-delay) and passive-interface posture are out of scope; FRR defaults govern. See docs/topology-schema-v1.md §3.1 (Optional ospf: block) for the topology-side schema.
Example:
Behavior:
- The invariant runs
vtysh -c 'show ip ospf neighbor json'on the specifiedsrcnode and parses the structured output. - It passes when the queried neighbor's router-ID is present in FRR's neighbor table and its FSM state matches
state(defaultFull). - It fails when the FSM state differs (engine-synthesized state literals
NotConfiguredandUnknownmay also appear on the FAIL path). - If the invariant definition is invalid (non-FRR
src, non-IPv4neighbor, undeclaredstateliteral), the run fails with misuse exit code2. - Retry policy: bounded by the test's
timeout_s(default60seconds — pragmatic to OSPF dead-interval reality) andretry_interval_s(default1.0seconds) whenexpect: pass. Single attempt forexpect: fail.
Artifacts produced:
When verdict: fail, the test record carries a structured six-key observed_state payload (type, neighbor, state, expected_state, last_error, source_node). See docs/topology-schema-v1.5.md §4.8 for the full per-type schema, including the comprehensive 10-FSM-literal closed-set documentation (8 declarable + 2 observed-only).
Positive proof example:
Replay:
Scope boundary:
This invariant validates only OSPFv2 single-area FRR adjacency from one node to one neighbor router-ID.
It does not by itself prove:
- OSPFv3 / IPv6 OSPF adjacency
- multi-area OSPF design correctness
- OSPF LSA-level inspection
- non-FRR (SONiC, Arista) OSPF —
srcmust betype: frr - area-mismatch as an invariant (the negative proof topology demonstrates the FAIL pathology, but no
ospf_area_matchinvariant exists in v1.5) - routing policy or attribute correctness
Interface State Invariant
Invariant type:
Purpose:
Verify that an interface declared by a links: endpoint has the expected administrative/operational state inside its node's network namespace.
This is useful for validating interface posture such as:
- post-deploy confirmation that all topology interfaces came up
- post-fault confirmation that a
fault: interface_downstep actually brought the interface down - pre-test posture gate before subsequent reachability invariants
This invariant is NOS-agnostic: it uses the Linux primitive ip -j link show <iface> and works on any node type with a Linux network namespace (frr, host, nft-fw).
Required fields:
| Field | Description |
|---|---|
| node | Node whose namespace is probed |
| interface | Interface name as seen inside the node namespace (e.g. eth1) |
Optional fields:
| Field | Description |
|---|---|
| state | Expected state literal; one of up, down. Default up materialised at Resolve. |
The verdict predicate is asymmetric:
state: uprequiresadmin_state == "up"ANDoperstate == "UP"(conjunction; both must hold).state: downrequiresadmin_state == "down"ORoperstate != "UP"(disjunction; either suffices).
Carrier (link-layer signal) is reported in observed_state for diagnostic clarity but does NOT participate in the verdict.
iproute2 capability dependency: the probe requires an ip binary supporting the -j JSON flag. BusyBox ip (the default in alpine:latest, the engine default for host and nft-fw) does NOT support -j. Topologies exercising interface_state on host or nft-fw nodes MUST pin a compatible image (e.g. nicolaka/netshoot:v0.15) explicitly in the node declaration. FRR's default image already includes full iproute2.
Example:
Behavior:
- The invariant runs
ip -j link show <iface>on the specifiednodeand parses the JSON output. - It passes when the kernel-reported
admin_stateandoperstatetogether satisfy the asymmetric predicate above. - It fails when the predicate does not hold OR when the probe itself fails (closed-set
last_errorliteral indicates which path: capability-probe failure, interface-not-present, ip-command-failure, JSON parse failure, structural surprise, missing field). - A per-(lab, node) capability probe runs at most once per gate run on first use of
interface_stateagainst that node; capability-probe failures short-circuit withlast_error: "ip -j flag not supported by node's iproute2". - If the invariant definition is invalid (missing
node, missinginterface, unknownnodereference, invalidstateliteral, unknown key), the run fails with misuse exit code2. - Retry policy: bounded by the test's
timeout_s(default10seconds) andretry_interval_s(default0.5seconds) whenexpect: pass. Single attempt forexpect: fail.
Artifacts produced:
When verdict: fail, the test record carries a structured eight-key observed_state payload (type, interface, expected_state, admin_state, operstate, carrier, last_error, source_node). See docs/topology-schema-v1.5.md §4.9 for the full per-type schema, including the closed-set documentation for all four state-axis fields (admin_state, operstate, carrier, last_error).
Positive proof example:
Replay:
Scope boundary:
This invariant validates only kernel-reported interface administrative/operational state inside a node's Linux network namespace.
It does not by itself prove:
- L2 reachability across the link (use
pingfor that) - L3 reachability or routing-table correctness (use
ping,route_present, or BGP invariants) - MTU, speed, duplex, error counters, or other interface-level metrics
- vendor NOS-specific interface state (the probe is a Linux primitive; SONiC/Arista VM nodes are out of scope)
- carrier-level signal —
carrieris reported inobserved_statefor diagnostic clarity but is NOT part of the verdict predicate
EVPN Invariants
Cassian Gate supports deterministic EVPN invariant checks as standard authoritative test results.
EVPN MAC Route Present
Validates that a specific MAC route is present for the specified VNI on the specified node.
Example:
Required fields:
kind: invarianttype: evpn_mac_route_presentnodemacvni
EVPN MAC Route Absent
Validates that a specific MAC route is absent for the specified VNI on the specified node.
Example:
Required fields:
kind: invarianttype: evpn_mac_route_absentnodemacvni
EVPN VNI Route Present
Validates that EVPN control-plane route presence exists for the specified VNI on the specified node.
Example:
Required fields:
kind: invarianttype: evpn_vni_route_presentnodevni
EVPN BGP Session Up
Validates that the EVPN BGP session to the specified peer is up on the specified node.
Example:
Required fields:
kind: invarianttype: evpn_bgp_session_upnodepeer
Expected outcomes
These invariants behave like other authoritative test results:
expect: passmeans the declared invariant should be observed as true- mismatch leads to a validation failure
- invalid invariant declarations are treated as usage / contract errors
Evidence and authority
For EVPN invariants:
- runtime EVPN route/session data is supporting evidence
- the invariant verdict in
results.jsonis authoritative
The check is intended to preserve deterministic authority semantics and replay consistency.
Positive proof examples
Negative validation example
Negative misuse example
Replay
These invariants can be checked again using standard gate replay workflows:
Scope boundary
EVPN invariants validate only the declared EVPN truth being tested.
They do not by themselves prove:
- full dataplane forwarding
- broader EVPN feature correctness
- non-EVPN control-plane behavior
Failed-Invariant Observed State
When an invariant test produces verdict: fail, the test record in results.json carries a structured observed_state payload alongside the existing observed string. This is the authoritative deterministic failure-reason artifact.
Where it appears:
- on records in
results["tests"]whosekind == "invariant"ANDverdict == "fail" - on records in
results["events"]whosetype == "scenario_test_run"ANDkind == "invariant"ANDverdict == "fail"
Where it does NOT appear:
- on passing-invariant records
- on non-invariant test kinds (
ping,tcp) - on
prereqfailure paths (those surface ashard_failure:in the summary) - on records with
observed: blocked, verdict: fail, error: blocked before execution(those are recorded explicitly per the Blocked declared validation items rules above)
Determinism contract:
- every value in
observed_stateis derived from declared topology / test inputs or from deterministically-computable scalars in parsedvtyshJSON - environmental nondeterminism (host clocks, container IDs, runtime PIDs, hostnames-of-the-runner, containerlab-allocated veth MAC addresses) MUST NOT enter
observed_state - two clean runs of the same topology produce byte-identical
observed_statepayloads
Per-record byte ceiling:
- a single record's
observed_stateis bounded at 8192 bytes of canonical JSON - when a payload would exceed the ceiling, the engine deterministically suffix-drops trailing entries from the longest list field until it fits and sets
observed_state_truncated: trueon the record - the supporting
evidencefield still carries the full pre-truncation list
Example failed-invariant record shape in results.json:
Summary rendering in results.summary.txt:
Each failed-invariant line in the failed_tests: block is followed by an indented observed: block. Indentation is fixed: header at 4-space, key/value lines at 6-space, list entries at 8-space. List values are capped at 5 entries with a trailing (+<N> more) over-cap line. When the record carries observed_state_truncated: true, the renderer emits a literal trailing line (observed_state truncated; full payload in results.json) at 6-space indent.
Example summary block:
Authority boundary unchanged:
results.jsonobserved_statefield = authoritative structured failure reasonresults.summary.txtobserved:block = explanatory rendering only
For the per-type observed_state schema (which keys are required for each invariant type) see docs/topology-schema-v1.5.md §4.
🔟 Scenarios (Failure Choreography)
Scenarios define ordered fault injection sequences.
Example:
Step Types
Currently implemented scenario step types:
runfaultwaitwait_forwait_for_bgppcap_startpcap_stop
wait (explicit elapsed-time pause)
Canonical form:
Rules:
- payload must be a mapping
- payload must contain exactly one field:
seconds secondsmust be a positive integer- scalar form such as
- wait: 5is invalid - extra keys are invalid
waitexecutes only as an explicit elapsed-time pausewaitdoes not prove readiness, BGP convergence, reachability, or service health
Use wait_for or wait_for_bgp when you want condition-based convergence checks.
No implicit retries. Timeout = failure.
wait_for (condition-based convergence)
Polls a deterministic predicate until satisfied or timeout. Records a scenario step verdict (no test verdict).
Required keys (every wait_for step):
type— one of the nine condition types listed belowfrom— source node nameexpect—passorfailtimeout— int (seconds)interval_s— number (polling cadence)
Optional: per_attempt_timeout_s.
Accepted condition types:
ping— ICMP fromfromtoto(node name or IPv4 literal)tcp— TCP fromfromtoto:portroute_prefix—prefix(CIDR) present in RIB onfrombgp_session_up— BGP session todst(IPv4 neighbor) reaches Establishedroute_present—prefix(CIDR) present in BGP RIB onfromroute_advertised_to—prefix(CIDR) advertised towardpeer(node name)evpn_bgp_session_up— EVPN BGP session topeer(node name) reaches Establishedevpn_vni_route_present— EVPN type-2/3 route present forvni(integer)evpn_mac_route_present— EVPN type-2 route formac+vniis present
Per-type parameter requirements: see docs/topology-schema-v1.md §6 (### wait_for) and docs/topology-schema-v1.5.md §2.
Notes:
- A successful
wait_forstep does not count as a passing test. Verdicts come only from items intests:. expect: failinverts the convergence semantics (succeeds if the condition does not become satisfied withintimeout).wait_for: bgp_session_upis a single-neighbor check (explicitdst);wait_for_bgpis a coarse all-neighbors-of-one-node readiness check. Both remain available.
Prefer wait_for with an invariant condition over fixed wait: { seconds: N } for convergence purposes.
Grey Failures (Deterministic Degradation)
Grey failures are scenario-only capabilities, not standalone CLI commands.
Scenarios can model partial network degradation, not only full outages.
Supported grey-failure actions:
packet_losslatencybandwidth_capprefix_blackhole
These actions are:
- deterministic
- explicit
- replay-stable
- recorded in
results.json
Grey failures affect the network condition, not the verdict logic.
Verdicts still come from the test results that run after the fault step.
Example: Packet Loss
Meaning:
Apply 5% packet loss on
h1:eth1, then run the declared test.
Example: Latency
Example: Bandwidth Cap
Example: Prefix Blackhole
Target Forms
Grey failures support two target styles.
Interface target
Link target
Useful when you want to degrade both ends of a declared link.
If multiple links exist between the same nodes, explicit interfaces are required.
Parameter Rules
packet_loss
lossorloss_percent- integer
- valid range:
0..100
latency
latency_ms- integer
- must be
>= 0
bandwidth_cap
bandwidth_mbps- integer
- must be
>= 1
prefix_blackhole
nodeprefix
Invalid values fail fast with exit code 2.
How to Run
Replay deterministically:
Artifact Evidence
Grey failures are recorded in results.json as scenario_fault events.
Example shape:
This provides deterministic evidence that the degradation was applied before the test step ran.
1️⃣1️⃣ Candidate Configuration (Gate Only)
Apply candidate changes during validation.
Directory layout:
Currently proven supported examples:
Rules:
- full replacement
- no merge
- atomic apply
- failure aborts gate
- candidate config is non-authoritative input only
- verdicts still come only from tests / scenarios / invariants
Important current boundary for vendor NOS VM nodes:
- candidate-config for supported
sonic-vm/ NOS VM nodes is not currently a supported candidate-config surface - unsupported or undefined NOS VM candidate-config input is rejected explicitly
- current truthful behavior for unsupported NOS VM candidate-config input is:
- misuse / invalid candidate-config surface
- exit code
2
Example of current unsupported behavior:
Expected outcome:
Meaning: this candidate-config surface is unsupported or malformed for the current command/topology.
Support boundary:
- supported current surfaces: generated FRR and nft-fw candidate files only
- unsupported current surfaces: vendor NOS / sonic-vm candidate-config input
Scope boundary:
- candidate config support is currently proven only for the existing supported candidate-apply surfaces
- this does not currently establish candidate-config support for
sonic-vmor other vendor NOS VM node types - any future NOS VM candidate-config support requires an explicit contract surface and proof
1️⃣2️⃣ Status Command
Inspect running labs.
Useful options:
--summary--interfaces--bgp--bgp-verbose--routes--routes-verbose--json--strict
Example:
1️⃣3️⃣ Cleanup & Lab Management
Destroy a running lab:
Clean up abandoned labs:
Dry-run occurs unless --yes is provided.
Example cleanup flow:
Meaning:
cassian down <lab>tears down the named labcassian cleanup --all --yesremoves any remaining Cassian Gate-owned labs discovered by the cleanup plan- cleanup stays explicit because dry-run remains the default without
--yes
1️⃣4️⃣ DevOps Integration
Generate adapter artifacts.
Terraform
Input:
Ansible
Adapters are advisory only.
1️⃣5️⃣ AI Assistance (Optional)
AI is assistive only.
It never affects:
- execution
- verdicts
- exit codes
AI Advisory (Optional, Non-Authoritative)
Purpose:
- Provides advisory explanations and guidance based on artifacts
- Helps interpret:
- failures
- coverage gaps
- missing tests/scenarios
- control-plane intent
Authority:
- advisory only
- does not affect:
- verdicts
- exit codes
- execution
- artifacts
Input:
topology.resolved.yamlresults.json
Unified AI Assistance
Use the same conversational entrypoint for failure explanation, coverage review, topology review, scenario interpretation, invariant explanation, and blast-radius explanation.
Common human path
Uses the most recent valid artifact context when available.
Explicit lab path
Uses the specified lab when it contains the required artifacts.
Explicit artifacts path
This is the most explicit override and is useful for proof/debug workflows.
Optional online-enriched rendering
Enable online-enriched advisory rendering explicitly:
Rules:
- online-enriched rendering is explicit opt-in only
- local advisory rendering remains the baseline behavior
- online rendering does not change authority, verdicts, or execution behavior
- unavailable online rendering should be treated as a non-authoritative advisory-path failure, not a change in execution authority
Rendering modes
cassian ai may indicate whether local or online-enriched advisory rendering was used.
Both remain advisory-only.
Context selection
When possible, prefer explicit artifact or lab selection for clarity.
Required artifacts include:
If the required artifacts are missing, the advisory path should not be treated as available.
Important boundary
cassian ai:
- reads artifacts only
- does not execute lifecycle actions
- does not modify topology, tests, scenarios, or configs
- does not affect verdicts
- remains advisory-only
AI Output Structure
cassian ai is intended to present grounded, advisory explanations based on artifacts.
Typical output includes:
Treat the exact wording and formatting as supporting guidance rather than release-surface authority.
Draft Format (Copy-Paste Ready)
Drafts are structured and labeled:
Common draft types:
topology guidancetest blockscenario blockfirewall-side fixtest-side fix
Example:
Notes:
- drafts are safe to copy/paste
- drafts are non-authoritative
- drafts require human review
Supported Question Styles (Flexible)
cassian ai supports multiple phrasings for the same intent.
Scenario Questions
- "what scenario am I missing"
- "what scenario should I add"
- "how would you test failover here"
Invariant Questions
- "what invariant would help here"
- "what invariant should I add first"
Coverage / Validation
- "what tests should I add next"
- "give me a concrete validation plan"
Failure Analysis
- "why did this fail"
- "what should I change first"
- "what should I prove first"
Topology Improvement
- "how would you improve this topology"
- "provide an improved topology"
Behavior:
- different phrasings can still target the same advisory intent
- AI remains advisory-only regardless of phrasing
Local vs Online Rendering
Local (default)
- deterministic, built-in reasoning
- no external dependency
- always available
Online (optional)
Requirements depend on the configured online AI path.
Behavior:
- online rendering is optional
- it may provide richer explanations or phrasing
- it does not change verdicts, artifacts, or execution
How to Use AI Effectively
Best practice flow:
- Run deterministic gate:
- If failure:
- Improve coverage:
- Expand validation:
Key Insight
- passing tests ≠ proven design
- AI helps identify:
- missing positive proofs
- missing failure scenarios
- missing control-plane invariants
AI Guardrails
- AI is never authoritative
- AI cannot:
- run commands
- modify topology
- change configs
- alter results
- AI output must always be:
- human-reviewed
- explicitly applied
Verification Behavior
AI verification details belong to the implementation and verification surfaces. For operator use, keep the important boundary clear: AI remains optional and advisory-only.
Example: AI Identifies Missing Invariant
AI may suggest:
Meaning:
- you are not proving control-plane correctness yet
- add route-level proof before expanding scenarios
1️⃣6️⃣ Artifacts
Artifacts are written to:
Artifacts are typically written under:
Interpret them using the authority boundary already established in the project:
topology.resolved.yamlis generated execution inputresults.jsonis the authoritative verdict artifactresults.summary.txtis explanatory only
Key files:
topology.resolved.yamlresults.jsonresults.summary.txtartifacts/artifacts/blast-radius/blast_radius.json
results.json
results.json is the authoritative verdict artifact.
It explicitly records declared validation items that executed, and when materially relevant, declared validation items that were blocked after entering authoritative execution scope.
Important boundary:
- omission does not mean success
- a blocked declared item should appear explicitly in
results.json - failed-invariant records carry a structured
observed_statepayload — see §9 "Failed-Invariant Observed State" anddocs/topology-schema-v1.5.md§4 for the schema
topology.resolved.yaml
Contains the fully expanded deterministic model used for execution.
Includes:
- resolved defaults
- auto IP assignments
- normalized topology
- explicit invariant expansion from declared
packs - additive EVPN-resolved fields when EVPN runtime substrate is used
Structured State Diff (Advisory Only)
Cassian Gate can produce a structured pre/post operational state diff when state capture is explicitly enabled for both phases.
This artifact is:
- advisory only
- non-authoritative
- deterministic
- generated only from the explicitly captured state
It does not:
- change verdicts
- change exit codes
- replace
results.json - score differences as good or bad
How it works
When enabled, Cassian Gate captures the declared command/profile state:
- once before tests (
pre) - once after tests (
post)
It then compares those two captured state sets and writes a structured diff artifact.
This is a diff between:
- pre-state captured command output
- post-state captured command output
for the same run.
It is not a diff between:
- two different runs
- two different topologies
- baseline vs candidate config directories
- intended config vs actual config
Command Example
Phase 1a expanded the built-in FRR profile set (now: frr-routing-basic, frr-bgp-basic, frr-ospf-basic, frr-interfaces-basic, frr-comprehensive) and switched FRR probes to JSON form (vtysh -c "show ... json") with Linux iproute2 primitives for the interfaces profile. See docs/cli-reference-v1.md for the full --state-capture / --state-profile flag reference and per-profile descriptions.
Artifact Path
What to inspect
Inspect the structured diff for the captured objects, changed elements, and supporting evidence relevant to your review.
Operator meaning
Use this artifact when you want to understand:
- what operational state changed during the run
- which captured command surfaces changed between pre and post
- supporting evidence for review or explanation
Keep the authority boundary clear:
results.json= authoritative verdict surfacestate_diff.json= supporting evidence only
Blast Radius (Advisory Only)
Cassian Gate can produce a blast radius artifact that shows:
- what the executed tests/scenarios directly covered
- what additional nodes/links are potentially affected based on deterministic topology connectivity
This artifact is:
- advisory only
- non-authoritative
- deterministic
- generated during Collect
It does not:
- change verdicts
- change exit codes
- replace
results.json - score severity or risk
- infer live routing/runtime behavior
Artifact Path
Supporting results.json Surface
results.json may also include a clearly labeled non-authoritative supporting section:
This remains:
- supporting evidence only
- non-authoritative
- not part of verdict logic
Keep the authority boundary clear:
results.jsonverdict fields = authoritativeresults.jsonblast_radiussection = supporting evidence onlyartifacts/blast-radius/blast_radius.json= detailed advisory artifact
What it contains
Inspect the blast-radius artifact for the covered scope, potentially affected objects, and other supporting evidence relevant to your review.
Operator meaning
Use this artifact when you want to understand:
- what your declared tests directly touched
- what else is connected to that tested scope
- where additional coverage may be useful
Example
Important Boundary
Blast radius currently reflects:
- resolved topology structure
- declared coverage surfaces
- deterministic conservative graph expansion
It does not currently prove:
- live routing impact
- actual traffic path usage
- runtime failure propagation
- business severity
1️⃣7️⃣ Common Operator Tasks
Validate a topology:
Validate contrib content structurally:
Run validation gate:
Note:
cassian test <topology.yaml>now requires at least one declared test or scenario- if you only want to prove deploy/provision smoke behavior, use exploration mode instead of gate mode
Validate invariant-pack compatibility:
Run invariant-pack gate proof:
Validate invalid pack misuse handling:
Replay a previous gate deterministically:
Explore a lab interactively:
Bring up EVPN runtime substrate:
Run a routing attribute invariant proof:
Run a route advertisement invariant proof:
Run an EVPN invariant proof:
Replay an EVPN proof deterministically:
Clean up labs:
Run scenario testing:
Run a specific scenario in exploration mode:
Run a scenario with an explicit wait step:
Run a grey-failure scenario:
Replay the same grey-failure scenario deterministically:
Run a blast radius proof:
Inspect blast radius output:
Inspect structured state diff output:
Inspect a blocked declared-item result:
Look for:
- the declared test present in
tests observed: blockedverdict: fail- summary counts reflecting the blocked item
Use AI to explain a failure from the most recent run:
Use AI against a specific lab:
Use AI to expand validation coverage:
Use optional online-enriched AI rendering:
cassian validate-contrib — Structural validation for contrib content
Validate supported contrib content without running any lifecycle phases.
Command:
Purpose:
- checks contrib content structurally
- rejects malformed or unsupported contrib layout
- does not deploy anything
- does not create lab artifacts
- does not affect verdicts, replay, or authority
Important boundary:
validate-contrib is:
- structural only
- non-authoritative
- explicit only
It does not:
- run resolve → deploy → test lifecycle phases
- produce PASS / FAIL validation verdicts
- generate
results.json - validate runtime behavior
- score content quality
- infer meaning or intent
Supported contrib surfaces are limited to the contrib content types documented by the current project documentation.
Typical behavior:
- validates only the path you explicitly pass
- checks for supported contrib layout and required structure
- rejects unsupported or malformed contrib content
Examples:
Typical exit semantics follow the standard structural-validation pattern:
- accepted supported contrib content returns success
- invalid or unsupported contrib content is rejected as a usage / contract error
1️⃣8️⃣ Exit Codes
| Code | Meaning |
|---|---|
| 0 | PASS |
| 1 | Test failure |
| 2 | Usage / contract error |
Examples:
- invariant truth mismatch →
1 - validation failure after declared proof ran →
1 - unsupported EVPN topology shape →
2 - invalid invariant declaration →
2 - incompatible pack contents →
2 - zero-assertion gate run (
cassian test <topology.yaml>with no tests/scenarios) →2 - valid contrib validation (
cassian validate-contrib contrib/) →0 - invalid contrib structure (
cassian validate-contrib <broken-path>) →2
Misuse / usage / contract error example:
Typical outcome:
- the command is rejected before validation runs
- the failure is treated as a usage / contract error
Meaning:
- this is misuse / invalid invocation
- validation did not run
- exit code remains
2
Validation failure example:
Meaning:
- the system ran validation correctly
- the declared proof failed
- exit code remains
1
A FAIL can also mean a declared validation item was blocked after authoritative execution began.
In that case, inspect results.json.
Typical blocked-result shape:
These UX clarifications do not change:
- lifecycle order
- authority model
- verdict semantics
- artifact schema
- replay authority
- deterministic execution
- exit code contract
1️⃣9️⃣ First 10 Minutes
Recommended onboarding workflow:
Note:
cassian test <topology.yaml>now requires at least one declared test or scenario- if you only want to prove deploy/provision smoke behavior, use exploration mode instead of gate mode
For exploration:
For EVPN runtime + proof:
For AI-assisted explanation after a gate run: