Topology lifecycle
Deploy, monitor, stop, recover, and redeploy — the day-to-day of running apps on Samoza OS.
A topology is the unit of deployment. This page walks through what happens to one across its lifetime — from topo submit to topo stop, with everything that can happen in between.
Submit
zshell[0] >> topo submit ./my-app.mex
Submitting topology from ./my-app.mex...
Topology created:
ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Name: my-app
State: pending
Spaces: 3
zrms validates the MEX, checks signatures, and queues the topology for placement. It’s pending until every space has a zhost.
Placement
zrms picks a zhost for each space. The decision considers:
- Available capacity (CPU, memory).
- Declared capabilities the space needs (e.g. an IO space wanting a camera lands on a node with one).
- Spatial constraints from the MEX (a space pinned to a region or a specific kind of zhost).
- Affinity / anti-affinity rules.
When every space has a placement and zrun confirms instantiation, the topology becomes running.
States
| State | What it means |
|---|---|
pending | Submitted, awaiting placement. |
running | All spaces are alive on their assigned zhosts. |
degraded | One or more spaces have failed; recovery is in progress. |
stopped | Manually stopped; spaces unloaded. |
failed | Recovery couldn’t place all spaces; manual intervention needed. |
The transitions you’ll commonly see:
pending → running (normal startup)
running → degraded → running (a zhost failed; recovery succeeded)
running → degraded → failed (recovery couldn't find a home)
running → stopped (operator-initiated stop)
Inspect
zshell[0] >> topo status <id>
Topology: my-app
ID: a1b2c3d4-...
State: running
Spaces: 3
Created: 2026-04-29T10:30:00Z
Updated: 2026-04-30T14:22:11Z
Placements:
SPACE ZHOST STATE
--------------------------------------------
dashboard edge-node-1 running
api edge-node-2 running
store edge-node-3 running
For a single space’s deeper detail:
zshell[1] >> spaces status dashboard
For mesh-wide visibility — useful if you suspect a routing problem:
zshell[2] >> mesh spaces
Stop
zshell[0] >> topo stop a1b2c3d4-...
Stopping topology a1b2c3d4-...
Topology a1b2c3d4-... stopped.
A graceful stop:
zrmsmarks the topologystopped.zruncallson_close()on each space.- Each space gets a moment to flush state, finish in-flight messages, etc.
- WASM instances are unloaded; UI assets are removed from
zedge. - Spaces unregister from the mesh;
mesh spacesno longer lists them.
Redeploy
Two patterns.
Quick iteration (dev). Stop, rebuild, submit:
zshell[0] >> topo stop <id>
# rebuild your .mex
zshell[1] >> topo submit ./build/my-app.mex
Versioned deploy (prod). Bump the version field in your MEX manifest.yaml, sign, submit:
zshell[0] >> topo submit ./build/my-app-v1.2.0.mex
zrms will route the new traffic according to the rollout policy in the manifest. (Today: replace; future versions will support canary and blue/green.)
Recovery in detail
When a zhost stops sending heartbeats:
ClusterManager: zhost-delta lastHeartbeat > 30s
│
▼
Mark zhost-delta unavailable
│
▼
Find topologies with spaces on zhost-delta
│
▼
For each: state = degraded
│
▼
Recovery: try to re-place those spaces on healthy zhosts
│
├── Successful re-placement → state = running
│
└── No suitable zhost → state = failed
Recovery respects the MEX’s placement constraints. If a space requires a camera and there’s no other zhost with a camera, recovery cannot succeed automatically — that topology becomes failed, and you’ll see topo status explain why.
Failed topologies — what to do
A failed topology is one the runtime gave up on. Investigate:
zshell[0] >> topo status <id>
# Look at "FailureReason" — it'll say something like:
# "no zhost satisfies capability requirement 'sensors.camera'"
# "insufficient capacity on remaining zhosts"
Common fixes:
- Add a zhost that meets the constraint. Brings capacity or hardware online.
- Loosen the constraint in the MEX. If the original requirement was overly strict.
- Stop and re-submit elsewhere. If the original placement was correct but conditions changed.
See also
- Cluster operations — adding/removing zhosts, capacity planning.
- Troubleshooting — common problems and diagnostics.
- MEX format — what your topology declared.