OPERATE

Topology lifecycle

Deploy, monitor, stop, recover, and redeploy — the day-to-day of running apps on Samoza OS.

A topology is the unit of deployment. This page walks through what happens to one across its lifetime — from topo submit to topo stop, with everything that can happen in between.

Submit

zshell[0] >> topo submit ./my-app.mex
Submitting topology from ./my-app.mex...

Topology created:
  ID:     a1b2c3d4-e5f6-7890-abcd-ef1234567890
  Name:   my-app
  State:  pending
  Spaces: 3

zrms validates the MEX, checks signatures, and queues the topology for placement. It’s pending until every space has a zhost.

Placement

zrms picks a zhost for each space. The decision considers:

  • Available capacity (CPU, memory).
  • Declared capabilities the space needs (e.g. an IO space wanting a camera lands on a node with one).
  • Spatial constraints from the MEX (a space pinned to a region or a specific kind of zhost).
  • Affinity / anti-affinity rules.

When every space has a placement and zrun confirms instantiation, the topology becomes running.

States

StateWhat it means
pendingSubmitted, awaiting placement.
runningAll spaces are alive on their assigned zhosts.
degradedOne or more spaces have failed; recovery is in progress.
stoppedManually stopped; spaces unloaded.
failedRecovery couldn’t place all spaces; manual intervention needed.

The transitions you’ll commonly see:

pending → running                    (normal startup)
running → degraded → running         (a zhost failed; recovery succeeded)
running → degraded → failed          (recovery couldn't find a home)
running → stopped                    (operator-initiated stop)

Inspect

zshell[0] >> topo status <id>
Topology: my-app
  ID:        a1b2c3d4-...
  State:     running
  Spaces:    3
  Created:   2026-04-29T10:30:00Z
  Updated:   2026-04-30T14:22:11Z

Placements:
  SPACE                ZHOST          STATE
  --------------------------------------------
  dashboard            edge-node-1    running
  api                  edge-node-2    running
  store                edge-node-3    running

For a single space’s deeper detail:

zshell[1] >> spaces status dashboard

For mesh-wide visibility — useful if you suspect a routing problem:

zshell[2] >> mesh spaces

Stop

zshell[0] >> topo stop a1b2c3d4-...
Stopping topology a1b2c3d4-...
Topology a1b2c3d4-... stopped.

A graceful stop:

  1. zrms marks the topology stopped.
  2. zrun calls on_close() on each space.
  3. Each space gets a moment to flush state, finish in-flight messages, etc.
  4. WASM instances are unloaded; UI assets are removed from zedge.
  5. Spaces unregister from the mesh; mesh spaces no longer lists them.

Redeploy

Two patterns.

Quick iteration (dev). Stop, rebuild, submit:

zshell[0] >> topo stop <id>
# rebuild your .mex
zshell[1] >> topo submit ./build/my-app.mex

Versioned deploy (prod). Bump the version field in your MEX manifest.yaml, sign, submit:

zshell[0] >> topo submit ./build/my-app-v1.2.0.mex

zrms will route the new traffic according to the rollout policy in the manifest. (Today: replace; future versions will support canary and blue/green.)

Recovery in detail

When a zhost stops sending heartbeats:

ClusterManager: zhost-delta lastHeartbeat > 30s


Mark zhost-delta unavailable


Find topologies with spaces on zhost-delta


For each: state = degraded


Recovery: try to re-place those spaces on healthy zhosts

        ├── Successful re-placement → state = running

        └── No suitable zhost → state = failed

Recovery respects the MEX’s placement constraints. If a space requires a camera and there’s no other zhost with a camera, recovery cannot succeed automatically — that topology becomes failed, and you’ll see topo status explain why.

Failed topologies — what to do

A failed topology is one the runtime gave up on. Investigate:

zshell[0] >> topo status <id>
# Look at "FailureReason" — it'll say something like:
#   "no zhost satisfies capability requirement 'sensors.camera'"
#   "insufficient capacity on remaining zhosts"

Common fixes:

  • Add a zhost that meets the constraint. Brings capacity or hardware online.
  • Loosen the constraint in the MEX. If the original requirement was overly strict.
  • Stop and re-submit elsewhere. If the original placement was correct but conditions changed.

See also