Skip to content

Instance ↔ Console Integration Guide

This is the definitive reference for how an instance integrates with the Console. It covers both directions of communication, authentication mechanisms, and all request/response contracts.


Overview

There are three directions of communication:

DirectionAuth MechanismPurpose
Instance → ConsoleInstance Registration Token (Bearer)Startup signal, heartbeat, usage event ingestion
Admin → ConsoleZitadel JWT (Bearer)Token rotation, maintenance mode, maintenance poster, decommission
Console → InstanceZitadel Worker Token (Bearer)Health polling, tenant/user provisioning

Part 1: Instance → Console

Authentication

When the Console creates an instance, it generates a 256-bit random token, writes the raw token to the configured key vault (Azure Key Vault), and stores only the SHA-256 hash in PostgreSQL. The raw token never appears in Console API responses or logs.

How it works end-to-end:

Console creates instance
  → generates raw token + SHA-256 hash
  → writes raw token to vault:  instance-{instanceID}
  → stores hash in PostgreSQL:  instance_token_hash
  → returns secretRef to admin: "instance-507f1f77bcf86cd799439011"

Instance boots
  → reads raw token from vault using its Managed Identity
  → sends token as Bearer on every request to Console

Console verifies request
  → SHA-256 hashes the incoming token
  → looks up hash in PostgreSQL → match = authenticated

Request header:

Authorization: Bearer <raw-token-from-vault>

What is stored where:

LocationValueWho reads it
Azure Key VaultRaw token (plaintext)Instance pods only (Managed Identity)
Console DBSHA-256 hashConsole API only (never exposed)

Token expiry: Tokens do not expire on a schedule. They are invalidated only by admin rotation or instance deletion. Routine rotation is handled via the rotate-token endpoint (see Part 2: Admin → Console).

DR / new pod: A replacement instance pod reads the token from the vault on startup — no manual injection, no k8s secret coordination.

Vault Setup

Azure RBAC required:

PrincipalRolePurpose
Console API / Worker podsKey Vault Secrets OfficerRead + write + delete secrets
Instance podsKey Vault Secrets UserRead secrets only

Secret naming convention: instance-{instanceID} (e.g., instance-507f1f77bcf86cd799439011)

Console configuration:

env
SECRET_STORE_PROVIDER=azure-keyvault
AZURE_KEY_VAULT_URL=https://my-vault.vault.azure.net/

Auth uses DefaultAzureCredential — resolves automatically via Managed Identity on AKS, or AZURE_CLIENT_ID / AZURE_CLIENT_SECRET / AZURE_TENANT_ID env vars for non-AKS deployments.

Local development:

env
SECRET_STORE_PROVIDER=noop

Tokens are not persisted — Console logs a warning. Use noop only for local dev.


Endpoint: Startup Signal

Every instance pod calls this on boot. The response always includes the current instance status — the instance must read this and apply its availability middleware accordingly.

POST /api/v1/server/instances/:id/startup
Authorization: Bearer <instance-registration-token>
Content-Type: application/json

Request body (optional):

FieldTypeRequiredDescription
podNamestringnoKubernetes pod name (from HOSTNAME env var)
versionstringnoDeployed version string (e.g., v1.2.3)
json
{
  "podName": "instance-1-abc123",
  "version": "v1.2.3"
}

The body is optional. An empty body is accepted.

Success Responses (200 OK)

First boot — instance was in provisioning state, now activated:

json
{
  "status": "active",
  "firstBoot": true,
  "message": "Instance is now active."
}

Subsequent boot — instance already active, boot event recorded:

json
{
  "status": "active",
  "firstBoot": false,
  "message": "Boot event recorded."
}

Boot while in maintenance — boot event recorded, instance remains in maintenance:

json
{
  "status": "maintenance",
  "firstBoot": false,
  "message": "Boot event recorded."
}

The instance middleware must return 503 to tenant traffic when status is maintenance.

Boot while degraded — boot event recorded, Console will re-evaluate on next heartbeat:

json
{
  "status": "degraded",
  "firstBoot": false,
  "message": "Boot event recorded."
}

Boot while decommissioned — Console acknowledges but tells instance to self-block:

json
{
  "status": "decommissioned",
  "firstBoot": false,
  "message": "Instance is decommissioned. Tenant traffic must be blocked."
}

Returned as 200, not 4xx — so the instance can read the status and block traffic. The instance must not serve tenant requests when status is decommissioned.

Error Responses

ConditionHTTPBody
Invalid instance ID format400{"error": "Invalid id"}
Token doesn't match this instance403{"error": "Token does not match instance"}
Instance not found404{"error": "Instance not found"}
Internal error fetching instance500{"error": "Failed to get instance"}
Internal error activating instance500{"error": "Failed to activate instance"}

Side Effects

  • First boot only: Transitions instance status provisioningactive.
  • Every boot (except decommissioned): Inserts a boot event row with instance_id, pod_name, version, booted_at. Boot events auto-expire after 30 days.
  • Decommissioned: No state change, no boot event recorded.

What Console infers from boot events

SignalHow
Crash loopSame podName appearing repeatedly in a short window
RolloutNew version appearing across different pod names
Replica countCount of distinct podName values in a recent time window
RestartSame podName + same version appearing again

The first boot is the only way to transition out of provisioning. The Console does not poll health or provision tenants until the instance is active.


Endpoint: Heartbeat

Instances push resource metrics and receive their current Console-side status in every response. The instance must update its cached status from the response and apply availability middleware accordingly.

POST /api/v1/server/instances/:id/heartbeat
Authorization: Bearer <instance-registration-token>
Content-Type: application/json

Request body:

FieldTypeRequiredDescription
statusstringyesInstance-reported status: active, degraded
cpuPercentfloat64yesCurrent CPU utilization (0–100)
memoryPercentfloat64yesCurrent memory utilization (0–100)
diskPercentfloat64yesCurrent disk utilization (0–100)
activeTenantCountintyesNumber of tenants currently active on this instance
versionstringyesDeployed version string (e.g., v1.2.3)
json
{
  "status": "active",
  "cpuPercent": 45.2,
  "memoryPercent": 62.8,
  "diskPercent": 78.5,
  "activeTenantCount": 3,
  "version": "v1.2.3"
}

Success Responses (200 OK)

Normal heartbeat — metrics recorded, Console-side status returned:

json
{
  "recorded": true,
  "status": "active",
  "timestamp": "2026-03-08T10:30:00Z"
}

Heartbeat when Console-side status is maintenance — metrics recorded, instance should block tenant traffic:

json
{
  "recorded": true,
  "status": "maintenance",
  "timestamp": "2026-03-08T10:30:00Z"
}

Heartbeat when Console-side status is degraded — metrics recorded (Console may have auto-set degraded due to threshold breach):

json
{
  "recorded": true,
  "status": "degraded",
  "timestamp": "2026-03-08T10:30:00Z"
}

Heartbeat when Console-side status is decommissioned — metrics NOT recorded, instance must self-block:

json
{
  "recorded": false,
  "status": "decommissioned",
  "timestamp": "2026-03-08T10:30:00Z"
}

When recorded: false, the Console has acknowledged the signal but written nothing. The instance should stop serving tenant traffic immediately.

Error Responses

ConditionHTTPBody
Invalid instance ID format400{"error": "Invalid id"}
Token doesn't match this instance403{"error": "Token does not match instance"}
Internal error fetching instance500{"error": "Failed to get instance"}
Internal error recording metrics500{"error": "Failed to record heartbeat"}

Side Effects (non-decommissioned only)

  • Updates last_heartbeat_at on the instance record.
  • Updates last_cpu_percent, last_memory_percent, last_disk_percent, last_active_tenant_count, last_version.
  • If any metric exceeds configured thresholds, the Console overrides status to degraded regardless of the instance-reported value.

Degraded auto-detection thresholds (configurable per instance)

MetricDefault
CPU80%
Memory85%
Disk90%

Instance middleware behaviour by status

status in responseTenant trafficNotes
activeAllow
degradedAllowConsole auto-set; instance continues serving
maintenanceBlock (503)Optionally surface maintenance poster message
decommissionedBlock (503)recorded: false; instance should not re-activate

Recommendation: Send a heartbeat every 30–60 seconds. The degraded watcher marks an instance degraded if no heartbeat is received within the configured timeout.


Endpoint: Usage Event Ingestion

Instances call this endpoint to push billable usage events to the Console. The Console aggregates these hourly and applies them against the customer's billing plan.

POST /api/v1/server/instances/:id/usage
Authorization: Bearer <instance-registration-token>
Content-Type: application/json

Request body:

FieldTypeRequiredDescription
tenantIdstringyesID of the tenant
meterstringyesBillable dimension name (see well-known meters below)
valuenumberyesUsage quantity
unitstringyesUnit label (e.g., count, gb)
periodStartstring (RFC3339)yesStart of the usage period
periodEndstring (RFC3339)yesEnd of the usage period
json
{
  "tenantId": "507f1f77bcf86cd799439011",
  "meter": "api_calls",
  "value": 1500,
  "unit": "count",
  "periodStart": "2026-03-08T00:00:00Z",
  "periodEnd": "2026-03-08T01:00:00Z"
}

Response (202 Accepted — new event):

json
{
  "status": "accepted"
}

Response (200 OK — duplicate, already recorded):

json
{
  "status": "duplicate, ignored"
}

Idempotency: Requests are deduplicated on the composite key (instanceId, tenantId, meter, periodStart). Retrying the same event is safe.

Well-known meter names:

MeterDescription
active_projectsNumber of active projects
active_usersNumber of active users
api_callsTotal API calls made
storage_gbStorage consumed in GB
workflow_executionsNumber of workflow runs
data_migrationsNumber of data migrations executed
integrationsNumber of active integrations

Custom meters are supported — any string value is accepted, but only well-known meters are subject to plan limit enforcement.


For real-time tenant status updates (suspend/resume while the instance is running), the Console pushes changes directly to the instance via POST /internal/tenant-status. The instance writes the change to its local DB and updates its in-memory cache. On restart, the instance rebuilds the cache from its own local DB — no Console call is needed. See Tenant Suspension for full details.


Part 2: Admin → Console

These endpoints are called by Exto operators via the Console admin API. Auth is a Zitadel JWT — the instance registration token is not used here.


Token Rotation

Use when a token is suspected compromised. The new token is written to the vault first, then the hash is updated in the database. The old token is immediately invalid once the database hash is updated.

POST /api/v1/instances/:id/rotate-token
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json
{
  "secretRef": "instance-507f1f77bcf86cd799439011",
  "message": "Token rotated and written to vault. The instance will use the new token on its next vault secret refresh. Previous token is immediately invalid."
}

What happens (vault-first ordering):

  1. New 256-bit random token generated.
  2. Raw token written to vault (instance-{id}) — vault auto-versions the old value.
  3. New SHA-256 hash written to PostgreSQL — old token is invalidated at this point.
  4. The instance picks up the new token on its next vault secret refresh (no restart needed if using CSI driver or secret refresh).

The raw token is never returned to the caller — it goes directly to the vault. This is intentional for SOC 2 compliance.


Decommission

Permanently retires an instance. All Console worker outbound calls stop immediately. All non-archived tenants on the instance are suspended — they must be manually restored after migrating to a new instance.

POST /api/v1/instances/:id/decommission
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json
{
  "status": "decommissioned",
  "tenantsSuspended": 4,
  "message": "Instance decommissioned. All tenant access suspended. Console worker calls stopped."
}
FieldDescription
statusAlways "decommissioned"
tenantsSuspendedNumber of tenants that were active/provisioning and are now suspended
messageHuman-readable summary

Error responses:

ConditionHTTPBody
Instance not found404{"error": "Instance not found"}
Already decommissioned409{"error": "Instance is already decommissioned"}
Internal error500{"error": "Failed to decommission instance"}

What happens immediately:

  1. Instance status set to decommissioned in the database.
  2. All non-archived, non-suspended tenants on this instance are set to suspended.
  3. On the instance's next heartbeat or startup, Console returns status: "decommissioned" — the instance middleware must block all tenant traffic.

What Console stops doing:

Worker jobBehaviour after decommission
Health PollerSkips this instance
Degraded WatcherSkips this instance (no timeout enforcement)
Tenant ProvisionerSkips tenants assigned to this instance
User ProvisionerSkips users for tenants on this instance
HeartbeatsStill accepted — returns status: "decommissioned" so the instance self-blocks

What is preserved:

  • Instance record and connection config
  • All usage events and aggregates
  • All tenant records (status changed to suspended, not deleted)
  • All boot events (expire naturally after 30 days)

To restore tenants after migrating to a new instance:

  1. Reassign each tenant to the new instance via PUT /api/v1/tenants/:id.
  2. Restore each tenant via POST /api/v1/tenants/:id/restore.
  3. The Console worker will re-provision them to the new instance automatically.

Decommission does not delete the instance. To delete, use DELETE /api/v1/instances/:id (blocked until all tenants are removed).


Set Maintenance Mode

Places an instance into maintenance mode. The instance should block tenant traffic while in this state. Console worker jobs (health polling, heartbeat recording) continue — provisioning of new tenants and users is paused until maintenance is cleared.

POST /api/v1/instances/:id/maintenance
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json
{
  "status": "maintenance",
  "message": "Instance is now in maintenance mode."
}

Error responses:

ConditionStatusBody
Instance is decommissioned409{"error": "cannot set maintenance on a decommissioned instance"}
Already in maintenance409{"error": "instance is already in maintenance"}

Clear Maintenance Mode

Returns an instance from maintenance mode to active.

DELETE /api/v1/instances/:id/maintenance
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json
{
  "status": "active",
  "message": "Maintenance lifted. Instance is now active."
}

Error responses:

ConditionStatusBody
Not in maintenance409{"error": "instance is not in maintenance"}

Set Maintenance Poster

Attaches a user-facing maintenance message to the instance. This is independent of maintenance mode — a poster can be set while the instance is still active (e.g., to announce upcoming maintenance).

PUT /api/v1/instances/:id/maintenance-poster
Authorization: Bearer <zitadel-admin-jwt>
Content-Type: application/json

Request body:

FieldTypeRequiredDescription
messagestringyesMarkdown-formatted maintenance message
scheduledStartstring (ISO8601)noWhen maintenance is scheduled to begin
scheduledEndstring (ISO8601)noWhen maintenance is scheduled to end
json
{
  "message": "Scheduled maintenance for database migration.",
  "scheduledStart": "2026-03-15T02:00:00Z",
  "scheduledEnd": "2026-03-15T04:00:00Z"
}

Response (200 OK):

json
{
  "message": "Scheduled maintenance for database migration.",
  "scheduledStart": "2026-03-15T02:00:00Z",
  "scheduledEnd": "2026-03-15T04:00:00Z",
  "setBy": "zitadel-subject-id",
  "setAt": "2026-03-10T12:00:00Z"
}

Side effects:

  • Saves the poster to the instance record. Does not change the instance status.
  • setBy is populated from the caller's Zitadel subject claim.

Clear Maintenance Poster

Removes the maintenance poster from the instance. Does not affect the instance status.

DELETE /api/v1/instances/:id/maintenance-poster
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json
{
  "cleared": true
}

Part 3: Console → Instance

The Console's background worker makes outbound HTTP calls to instances for health monitoring, tenant provisioning, and user lifecycle management.

Authentication

The Console authenticates to instances using a Zitadel Worker Token, obtained via the OAuth 2.0 client_credentials flow.

Request header (all Console → Instance calls):

Authorization: Bearer <zitadel-worker-token>
Content-Type: application/json

The token is issued by Zitadel using:

  • CONSOLE_SERVICE_CLIENT_ID
  • CONSOLE_SERVICE_CLIENT_SECRET

Important: The console-worker machine user in Zitadel must have its Access Token Type set to JWT (not opaque). This is configured during bootstrap (ACCESS_TOKEN_TYPE_JWT). If the token type is changed to opaque, instances will reject the token with a 401 because they validate via JWKS (not introspection).

Instances must validate this JWT against the Zitadel JWKS endpoint:

GET {ZITADEL_ISSUER}/oauth/v2/keys

Instance Connection Config

When an instance is registered in the Console, an InstanceConnection record is stored alongside it. This holds the URLs the Console uses to reach the instance, plus vault key references for sensitive credentials.

FieldDescription
apiBaseUrlInternal base URL for provisioning calls (e.g., https://instance-1.internal)
healthCheckUrlFull URL for health polling (e.g., https://instance-1.internal/internal/health)
dbURIRefVault key name for the database connection URI (e.g., conn-{connId}-dburi)
oidcClientIdZitadel OIDC client ID — not sensitive, stored directly in the database
oidcClientSecretRefVault key name for the OIDC client secret (e.g., conn-{connId}-oidc-secret)
zitadelAppIdZitadel app ID (used for rotation/deletion)

How connection secrets are stored:

Admin calls UpsertConnection with raw dbURI + oidcSecret
  → Console writes dbURI      to vault: conn-{connId}-dburi
  → Console writes oidcSecret to vault: conn-{connId}-oidc-secret
  → Database stores key references only — never the raw values

Migration runner needs DB URI
  → reads conn.db_uri_ref from database        ("conn-{connId}-dburi")
  → calls vault.Get("conn-{connId}-dburi") → raw URI returned in memory only

Raw credentials are never written to the database or returned in API responses.


Instance Ingress Requirements

The Console Worker calls /internal/* endpoints on each instance via the public ingress. The instance's ingress must route /internal to the Go backend service (exto-go), not to the web frontend (SPA).

Common problem: Many instance ingresses use nginx.ingress.kubernetes.io/rewrite-target: /$1 with regex path captures (e.g., /api/(.*)). If /internal is added to the same ingress, the rewrite strips the prefix — /internal/health becomes /health, which doesn't match the Go route. The request either hits the SPA catch-all (/(.*)) and returns HTML, or hits the Go backend with a wrong path and returns 404.

Solution: Create a separate ingress resource for /internal without the rewrite-target annotation, so the path passes through to the Go backend as-is:

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: go-internal-ingress
  namespace: <instance-namespace>
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "250m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "660"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "660"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "660"
    # No rewrite-target — /internal/health passes through as-is
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - <instance-domain>
      secretName: <tls-secret>
  rules:
    - host: <instance-domain>
      http:
        paths:
          - path: /internal
            pathType: Prefix
            backend:
              service:
                name: <exto-go-service>
                port:
                  number: 80

Verify it works:

bash
# Should return 401 (auth required) — confirms the Go backend is receiving the request
curl -s -o /dev/null -w "%{http_code}" https://<instance-domain>/internal/health
# Expected: 401

# If you get HTML or 404, the ingress is not routing correctly

Endpoint: Health Check

The Console polls this endpoint on a configurable interval (default: every 5 minutes) to record performance metrics.

GET {healthCheckUrl}
Authorization: Bearer <zitadel-worker-token>

Your instance must respond with:

FieldTypeDescription
latency50msfloat64p50 response latency in milliseconds
latency95msfloat64p95 response latency in milliseconds
latency99msfloat64p99 response latency in milliseconds
errorRatefloat64Error rate as a fraction (e.g., 0.005 = 0.5%)
uptimeSecsint64Total uptime in seconds since last restart
json
{
  "latency50ms": 25.5,
  "latency95ms": 45.2,
  "latency99ms": 78.9,
  "errorRate": 0.001,
  "uptimeSecs": 864000
}

Expected response: HTTP 2xx. The Console times out after 10 seconds.

Polling interval: Configurable via HEALTH_POLL_INTERVAL_SECS (default: 300 seconds).


Endpoint: Provision Tenant

When a new tenant is assigned to an instance, the Console calls this endpoint to initialize the tenant workspace on the instance. The Console retries every 2 minutes until it gets a success response.

POST {apiBaseUrl}/internal/provision-tenant
Authorization: Bearer <zitadel-worker-token>
Content-Type: application/json

Request body:

FieldTypeDescription
tenantIdstringID of the tenant
namestringHuman-readable tenant slug/name
envstringEnvironment: production, staging, dev
json
{
  "tenantId": "507f1f77bcf86cd799439011",
  "name": "acme-corp",
  "env": "production"
}

Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure and will be retried.

Side effects on success:

  • The TenantInstanceBinding state transitions from pendingactive.
  • A EventTenantProvisioningComplete notification is dispatched.

Your instance must be idempotent on this endpoint — it may be called more than once for the same tenantId.


Endpoint: Provision User

When a user is invited or assigned to a tenant, the Console calls this endpoint to create the user account on the instance. The Console retries every 5 minutes for unprovisioned users.

POST {apiBaseUrl}/internal/provision-user
Authorization: Bearer <zitadel-worker-token>
Content-Type: application/json

Request body:

FieldTypeDescription
zitadelUserIdstringThe user's unique Zitadel ID
tenantIdstringID of the tenant
rolesstring[]Roles assigned to this user (e.g., ["admin", "viewer"])
emailstringUser's email address
displayNamestringUser's display name
json
{
  "zitadelUserId": "user-zitadel-id",
  "tenantId": "507f1f77bcf86cd799439011",
  "roles": ["admin", "viewer"],
  "email": "user@example.com",
  "displayName": "John Doe"
}

Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure and retried.

Your instance must be idempotent on this endpoint — the same user may be sent multiple times.


Endpoint: Deprovision User

When a user is removed from a tenant, the Console calls this endpoint to delete or deactivate the user on the instance.

DELETE {apiBaseUrl}/internal/provision-user/{zitadelUserId}?tenantId={tenantId}
Authorization: Bearer <zitadel-worker-token>

Path / query parameters:

ParameterLocationDescription
zitadelUserIdpathThe user's Zitadel ID
tenantIdquery stringID of the tenant

Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure.


Part 4: Summary of All Endpoints

Instance → Console (Instance Token)

MethodPathAuthPurpose
POST/api/v1/server/instances/:id/startupInstance TokenBoot signal (activates + logs boot event)
POST/api/v1/server/instances/:id/heartbeatInstance TokenPush resource metrics
POST/api/v1/server/instances/:id/usageInstance TokenIngest usage events

Admin → Console (Zitadel JWT)

MethodPathAuthPurpose
POST/api/v1/instances/:id/rotate-tokenZitadel JWTRotate instance token (old token invalidated immediately)
POST/api/v1/instances/:id/decommissionZitadel JWTRetire instance, stop all worker calls, preserve data
POST/api/v1/instances/:id/maintenanceZitadel JWTEnter maintenance mode
DELETE/api/v1/instances/:id/maintenanceZitadel JWTExit maintenance mode, return to active
PUT/api/v1/instances/:id/maintenance-posterZitadel JWTSet maintenance message and schedule
DELETE/api/v1/instances/:id/maintenance-posterZitadel JWTClear maintenance message

Console → Instance (Zitadel Worker Token)

MethodPathAuthPurpose
GET{healthCheckUrl}Zitadel Worker TokenPoll health metrics
POST{apiBaseUrl}/internal/provision-tenantZitadel Worker TokenInitialize tenant workspace
POST{apiBaseUrl}/internal/provision-userZitadel Worker TokenCreate user on instance
DELETE{apiBaseUrl}/internal/provision-user/{userId}Zitadel Worker TokenRemove user from instance
POST{apiBaseUrl}/internal/tenant-statusZitadel Worker TokenPush tenant suspend/resume to instance

Part 5: Console Worker Schedule

The Console background worker runs these jobs automatically:

JobIntervalSkips instances in statusOutbound calls
Health PollerEvery 5 min (configurable)decommissionedGET {healthCheckUrl} per active instance
Degraded WatcherEvery 60 secdecommissioned, maintenance, provisioningNone (heartbeat timeout detection only)
Tenant ProvisionerEvery 30 secdecommissioned, maintenance, provisioningPOST /internal/provision-tenant for pending tenants
User ProvisionerEvery 5 mindecommissionedPOST /internal/provision-user for pending users
Usage AggregationEvery hourNone (internal Console processing)
DunningEvery 6 hrNone (email dispatch only)
Migration RunnerEvery 30 secMigration step calls (see migration docs)
Customer PurgeEvery 6 hrNone (internal Console cleanup)

Part 6: Instance Lifecycle

Instance Created

  ├─ Console generates token → writes raw token to Azure Key Vault → stores hash in PostgreSQL
  ├─ Console creates Zitadel OIDC app → stores clientId + clientSecret ref in instance_connections
  ├─ Console returns secretRef: "instance-{id}" (vault key name, not the token)


Instance Boots (every pod start — first boot, restarts, new replicas, DR)

  ├─ Reads raw token from vault using Managed Identity
  ├─ Stores token in memory
  ├─ Calls POST /startup with {podName, version}
  │    ├─ Response always includes "status" field — instance caches this
  │    ├─ status = "provisioning"     → first boot: transitions to "active"
  │    ├─ status = "active"           → subsequent boot: boot event recorded
  │    ├─ status = "maintenance"      → boot event recorded, middleware blocks tenant traffic
  │    ├─ status = "degraded"         → boot event recorded, middleware allows tenant traffic
  │    └─ status = "decommissioned"   → no boot event, middleware blocks all tenant traffic


Instance Active

  ├─ Heartbeat loop (every 30–60s): POST /heartbeat → cache "status" from response
  │    ├─ status = "active"           → serve tenant traffic normally
  │    ├─ status = "maintenance"      → middleware returns 503 to tenants
  │    ├─ status = "degraded"         → serve tenant traffic (Console auto-set, not admin action)
  │    └─ status = "decommissioned"   → recorded: false → middleware blocks all tenant traffic

  ├─ Console polls health → GET {healthCheckUrl} (every 5 min)

  ├─ Tenant assigned to instance
  │    └─ Console worker calls POST /internal/provision-tenant
  │         ├─ Skipped if instance is: maintenance | decommissioned | provisioning
  │         └─ On success: tenant status transitions provisioning → active

  ├─ User invited to tenant
  │    └─ Console worker calls POST /internal/provision-user

  ├─ User removed from tenant
  │    └─ Console worker calls DELETE /internal/provision-user/{id}

  ├─ Instance pushes usage events → POST /usage (per billing period)

  ├─ Admin sets maintenance mode
  │    ├─ POST /maintenance → status: active → maintenance
  │    ├─ Optional: PUT /maintenance-poster (sets user-facing message, independent of status)
  │    ├─ Instance learns on next heartbeat → middleware returns 503
  │    └─ DELETE /maintenance → status: maintenance → active
  │         └─ Instance learns on next heartbeat → middleware resumes serving

  └─ Admin decommissions instance
       ├─ POST /decommission → status: * → decommissioned
       ├─ All non-archived tenants suspended immediately
       ├─ Instance learns on next heartbeat (recorded: false) or startup
       └─ Instance middleware blocks all tenant traffic

            └─ To delete the instance:
                 ├─ Reassign all tenants to a new instance
                 ├─ Restore tenants: POST /tenants/:id/restore
                 └─ DELETE /instances/:id  (blocked until zero active tenants)

Part 7: Configuration Reference

Variables the Console requires (in .env):

env
# Console API server
API_ADDR=:8000

# PostgreSQL
DATABASE_URL=postgres://console:password@localhost:5432/console?sslmode=disable

# Zitadel — JWT validation (Console API)
ZITADEL_ISSUER=https://auth.example.com
ZITADEL_API_URL=https://auth.example.com
ZITADEL_PAT=<personal-access-token>
ZITADEL_ADMIN_ORG_ID=<console-admin-org-id>
EXTOID_PROJECT_ID=<project-id>

# Zitadel — Console service account credentials (used by both API and worker)
CONSOLE_SERVICE_CLIENT_ID=<machine-user-client-id>
CONSOLE_SERVICE_CLIENT_SECRET=<machine-user-client-secret>

# Zitadel — webhook verification (Zitadel → Console)
# Signing key returned by Zitadel when creating the webhook target (from .bootstrap.env)
ZITADEL_WEBHOOK_SECRET=<signing-key-from-bootstrap>

# Worker
HEALTH_POLL_INTERVAL_SECS=300

# Secret store (SOC 2 required: use azure-keyvault in all non-local environments)
SECRET_STORE_PROVIDER=azure-keyvault        # or "noop" for local dev only
AZURE_KEY_VAULT_URL=https://my-vault.vault.azure.net/

# Email
EMAIL_PROVIDER=sendgrid
EMAIL_API_KEY=<api-key>
EMAIL_FROM_ADDR=noreply@exto360.com

Part 8: Security Notes

  • Instance tokens are hashed (SHA-256) at rest. The Console never stores the raw token.
  • Console → Instance calls use short-lived Zitadel JWTs. Instances should validate the JWT signature using Zitadel's JWKS endpoint and check token expiry.
  • Webhook calls from Zitadel to Console are HMAC-SHA256 verified using ZITADEL_WEBHOOK_SECRET.
  • All Console → Instance endpoints (/internal/*) should only be accessible from the Console worker — consider network-level restrictions in addition to token validation.
  • The instance token is shown exactly once at creation. It cannot be retrieved. Rotate by creating a new instance record if compromised.