Instance ↔ Console Integration Guide

This is the definitive reference for how an instance integrates with the Console. It covers both directions of communication, authentication mechanisms, and all request/response contracts.

Overview

There are three directions of communication:

Direction	Auth Mechanism	Purpose
Instance → Console	Instance Registration Token (Bearer)	Startup signal, heartbeat, usage event ingestion
Admin → Console	Zitadel JWT (Bearer)	Token rotation, maintenance mode, maintenance poster, decommission
Console → Instance	Zitadel Worker Token (Bearer)	Health polling, tenant/user provisioning

Part 1: Instance → Console

Authentication

When the Console creates an instance, it generates a 256-bit random token, writes the raw token to the configured key vault (Azure Key Vault), and stores only the SHA-256 hash in PostgreSQL. The raw token never appears in Console API responses or logs.

How it works end-to-end:

Console creates instance
  → generates raw token + SHA-256 hash
  → writes raw token to vault:  instance-{instanceID}
  → stores hash in PostgreSQL:  instance_token_hash
  → returns secretRef to admin: "instance-507f1f77bcf86cd799439011"

Instance boots
  → reads raw token from vault using its Managed Identity
  → sends token as Bearer on every request to Console

Console verifies request
  → SHA-256 hashes the incoming token
  → looks up hash in PostgreSQL → match = authenticated

Request header:

Authorization: Bearer <raw-token-from-vault>

What is stored where:

Location	Value	Who reads it
Azure Key Vault	Raw token (plaintext)	Instance pods only (Managed Identity)
Console DB	SHA-256 hash	Console API only (never exposed)

Token expiry: Tokens do not expire on a schedule. They are invalidated only by admin rotation or instance deletion. Routine rotation is handled via the rotate-token endpoint (see Part 2: Admin → Console).

DR / new pod: A replacement instance pod reads the token from the vault on startup — no manual injection, no k8s secret coordination.

Vault Setup

Azure RBAC required:

Principal	Role	Purpose
Console API / Worker pods	`Key Vault Secrets Officer`	Read + write + delete secrets
Instance pods	`Key Vault Secrets User`	Read secrets only

Secret naming convention: instance-{instanceID} (e.g., instance-507f1f77bcf86cd799439011)

Console configuration:

env

SECRET_STORE_PROVIDER=azure-keyvault
AZURE_KEY_VAULT_URL=https://my-vault.vault.azure.net/

Auth uses DefaultAzureCredential — resolves automatically via Managed Identity on AKS, or AZURE_CLIENT_ID / AZURE_CLIENT_SECRET / AZURE_TENANT_ID env vars for non-AKS deployments.

Local development:

env

SECRET_STORE_PROVIDER=noop

Tokens are not persisted — Console logs a warning. Use noop only for local dev.

Endpoint: Startup Signal

Every instance pod calls this on boot. The response always includes the current instance status — the instance must read this and apply its availability middleware accordingly.

POST /api/v1/server/instances/:id/startup
Authorization: Bearer <instance-registration-token>
Content-Type: application/json

Request body (optional):

Field	Type	Required	Description
`podName`	string	no	Kubernetes pod name (from `HOSTNAME` env var)
`version`	string	no	Deployed version string (e.g., `v1.2.3`)

json

{
  "podName": "instance-1-abc123",
  "version": "v1.2.3"
}

The body is optional. An empty body is accepted.

Success Responses (200 OK)

First boot — instance was in provisioning state, now activated:

json

{
  "status": "active",
  "firstBoot": true,
  "message": "Instance is now active."
}

Subsequent boot — instance already active, boot event recorded:

json

{
  "status": "active",
  "firstBoot": false,
  "message": "Boot event recorded."
}

Boot while in maintenance — boot event recorded, instance remains in maintenance:

json

{
  "status": "maintenance",
  "firstBoot": false,
  "message": "Boot event recorded."
}

The instance middleware must return 503 to tenant traffic when status is maintenance.

Boot while degraded — boot event recorded, Console will re-evaluate on next heartbeat:

json

{
  "status": "degraded",
  "firstBoot": false,
  "message": "Boot event recorded."
}

Boot while decommissioned — Console acknowledges but tells instance to self-block:

json

{
  "status": "decommissioned",
  "firstBoot": false,
  "message": "Instance is decommissioned. Tenant traffic must be blocked."
}

Returned as 200, not 4xx — so the instance can read the status and block traffic. The instance must not serve tenant requests when status is decommissioned.

Error Responses

Condition	HTTP	Body
Invalid instance ID format	400	`{"error": "Invalid id"}`
Token doesn't match this instance	403	`{"error": "Token does not match instance"}`
Instance not found	404	`{"error": "Instance not found"}`
Internal error fetching instance	500	`{"error": "Failed to get instance"}`
Internal error activating instance	500	`{"error": "Failed to activate instance"}`

Side Effects

First boot only: Transitions instance status provisioning → active.
Every boot (except decommissioned): Inserts a boot event row with instance_id, pod_name, version, booted_at. Boot events auto-expire after 30 days.
Decommissioned: No state change, no boot event recorded.

What Console infers from boot events

Signal	How
Crash loop	Same `podName` appearing repeatedly in a short window
Rollout	New `version` appearing across different pod names
Replica count	Count of distinct `podName` values in a recent time window
Restart	Same `podName` + same `version` appearing again

The first boot is the only way to transition out of provisioning. The Console does not poll health or provision tenants until the instance is active.

Endpoint: Heartbeat

Instances push resource metrics and receive their current Console-side status in every response. The instance must update its cached status from the response and apply availability middleware accordingly.

POST /api/v1/server/instances/:id/heartbeat
Authorization: Bearer <instance-registration-token>
Content-Type: application/json

Request body:

Field	Type	Required	Description
`status`	string	yes	Instance-reported status: `active`, `degraded`
`cpuPercent`	float64	yes	Current CPU utilization (0–100)
`memoryPercent`	float64	yes	Current memory utilization (0–100)
`diskPercent`	float64	yes	Current disk utilization (0–100)
`activeTenantCount`	int	yes	Number of tenants currently active on this instance
`version`	string	yes	Deployed version string (e.g., `v1.2.3`)

json

{
  "status": "active",
  "cpuPercent": 45.2,
  "memoryPercent": 62.8,
  "diskPercent": 78.5,
  "activeTenantCount": 3,
  "version": "v1.2.3"
}

Success Responses (200 OK)

Normal heartbeat — metrics recorded, Console-side status returned:

json

{
  "recorded": true,
  "status": "active",
  "timestamp": "2026-03-08T10:30:00Z"
}

Heartbeat when Console-side status is maintenance — metrics recorded, instance should block tenant traffic:

json

{
  "recorded": true,
  "status": "maintenance",
  "timestamp": "2026-03-08T10:30:00Z"
}

Heartbeat when Console-side status is degraded — metrics recorded (Console may have auto-set degraded due to threshold breach):

json

{
  "recorded": true,
  "status": "degraded",
  "timestamp": "2026-03-08T10:30:00Z"
}

Heartbeat when Console-side status is decommissioned — metrics NOT recorded, instance must self-block:

json

{
  "recorded": false,
  "status": "decommissioned",
  "timestamp": "2026-03-08T10:30:00Z"
}

When recorded: false, the Console has acknowledged the signal but written nothing. The instance should stop serving tenant traffic immediately.

Error Responses

Condition	HTTP	Body
Invalid instance ID format	400	`{"error": "Invalid id"}`
Token doesn't match this instance	403	`{"error": "Token does not match instance"}`
Internal error fetching instance	500	`{"error": "Failed to get instance"}`
Internal error recording metrics	500	`{"error": "Failed to record heartbeat"}`

Side Effects (non-decommissioned only)

Updates last_heartbeat_at on the instance record.
Updates last_cpu_percent, last_memory_percent, last_disk_percent, last_active_tenant_count, last_version.
If any metric exceeds configured thresholds, the Console overrides status to degraded regardless of the instance-reported value.

Degraded auto-detection thresholds (configurable per instance)

Metric	Default
CPU	80%
Memory	85%
Disk	90%

Instance middleware behaviour by status

`status` in response	Tenant traffic	Notes
`active`	Allow
`degraded`	Allow	Console auto-set; instance continues serving
`maintenance`	Block (503)	Optionally surface maintenance poster message
`decommissioned`	Block (503)	`recorded: false`; instance should not re-activate

Recommendation: Send a heartbeat every 30–60 seconds. The degraded watcher marks an instance degraded if no heartbeat is received within the configured timeout.

Endpoint: Usage Event Ingestion

Instances call this endpoint to push billable usage events to the Console. The Console aggregates these hourly and applies them against the customer's billing plan.

POST /api/v1/server/instances/:id/usage
Authorization: Bearer <instance-registration-token>
Content-Type: application/json

Request body:

Field	Type	Required	Description
`tenantId`	string	yes	ID of the tenant
`meter`	string	yes	Billable dimension name (see well-known meters below)
`value`	number	yes	Usage quantity
`unit`	string	yes	Unit label (e.g., `count`, `gb`)
`periodStart`	string (RFC3339)	yes	Start of the usage period
`periodEnd`	string (RFC3339)	yes	End of the usage period

json

{
  "tenantId": "507f1f77bcf86cd799439011",
  "meter": "api_calls",
  "value": 1500,
  "unit": "count",
  "periodStart": "2026-03-08T00:00:00Z",
  "periodEnd": "2026-03-08T01:00:00Z"
}

Response (202 Accepted — new event):

json

{
  "status": "accepted"
}

Response (200 OK — duplicate, already recorded):

json

{
  "status": "duplicate, ignored"
}

Idempotency: Requests are deduplicated on the composite key (instanceId, tenantId, meter, periodStart). Retrying the same event is safe.

Well-known meter names:

Meter	Description
`active_projects`	Number of active projects
`active_users`	Number of active users
`api_calls`	Total API calls made
`storage_gb`	Storage consumed in GB
`workflow_executions`	Number of workflow runs
`data_migrations`	Number of data migrations executed
`integrations`	Number of active integrations

Custom meters are supported — any string value is accepted, but only well-known meters are subject to plan limit enforcement.

For real-time tenant status updates (suspend/resume while the instance is running), the Console pushes changes directly to the instance via POST /internal/tenant-status. The instance writes the change to its local DB and updates its in-memory cache. On restart, the instance rebuilds the cache from its own local DB — no Console call is needed. See Tenant Suspension for full details.

Part 2: Admin → Console

These endpoints are called by Exto operators via the Console admin API. Auth is a Zitadel JWT — the instance registration token is not used here.

Token Rotation

Use when a token is suspected compromised. The new token is written to the vault first, then the hash is updated in the database. The old token is immediately invalid once the database hash is updated.

POST /api/v1/instances/:id/rotate-token
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json

{
  "secretRef": "instance-507f1f77bcf86cd799439011",
  "message": "Token rotated and written to vault. The instance will use the new token on its next vault secret refresh. Previous token is immediately invalid."
}

What happens (vault-first ordering):

New 256-bit random token generated.
Raw token written to vault (instance-{id}) — vault auto-versions the old value.
New SHA-256 hash written to PostgreSQL — old token is invalidated at this point.
The instance picks up the new token on its next vault secret refresh (no restart needed if using CSI driver or secret refresh).

The raw token is never returned to the caller — it goes directly to the vault. This is intentional for SOC 2 compliance.

Decommission

Permanently retires an instance. All Console worker outbound calls stop immediately. All non-archived tenants on the instance are suspended — they must be manually restored after migrating to a new instance.

POST /api/v1/instances/:id/decommission
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json

{
  "status": "decommissioned",
  "tenantsSuspended": 4,
  "message": "Instance decommissioned. All tenant access suspended. Console worker calls stopped."
}

Field	Description
`status`	Always `"decommissioned"`
`tenantsSuspended`	Number of tenants that were active/provisioning and are now suspended
`message`	Human-readable summary

Error responses:

Condition	HTTP	Body
Instance not found	404	`{"error": "Instance not found"}`
Already decommissioned	409	`{"error": "Instance is already decommissioned"}`
Internal error	500	`{"error": "Failed to decommission instance"}`

What happens immediately:

Instance status set to decommissioned in the database.
All non-archived, non-suspended tenants on this instance are set to suspended.
On the instance's next heartbeat or startup, Console returns status: "decommissioned" — the instance middleware must block all tenant traffic.

What Console stops doing:

Worker job	Behaviour after decommission
Health Poller	Skips this instance
Degraded Watcher	Skips this instance (no timeout enforcement)
Tenant Provisioner	Skips tenants assigned to this instance
User Provisioner	Skips users for tenants on this instance
Heartbeats	Still accepted — returns `status: "decommissioned"` so the instance self-blocks

What is preserved:

Instance record and connection config
All usage events and aggregates
All tenant records (status changed to suspended, not deleted)
All boot events (expire naturally after 30 days)

To restore tenants after migrating to a new instance:

Reassign each tenant to the new instance via PUT /api/v1/tenants/:id.
Restore each tenant via POST /api/v1/tenants/:id/restore.
The Console worker will re-provision them to the new instance automatically.

Decommission does not delete the instance. To delete, use DELETE /api/v1/instances/:id (blocked until all tenants are removed).

Set Maintenance Mode

Places an instance into maintenance mode. The instance should block tenant traffic while in this state. Console worker jobs (health polling, heartbeat recording) continue — provisioning of new tenants and users is paused until maintenance is cleared.

POST /api/v1/instances/:id/maintenance
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json

{
  "status": "maintenance",
  "message": "Instance is now in maintenance mode."
}

Error responses:

Condition	Status	Body
Instance is decommissioned	409	`{"error": "cannot set maintenance on a decommissioned instance"}`
Already in maintenance	409	`{"error": "instance is already in maintenance"}`

Clear Maintenance Mode

Returns an instance from maintenance mode to active.

DELETE /api/v1/instances/:id/maintenance
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json

{
  "status": "active",
  "message": "Maintenance lifted. Instance is now active."
}

Error responses:

Condition	Status	Body
Not in maintenance	409	`{"error": "instance is not in maintenance"}`

Set Maintenance Poster

Attaches a user-facing maintenance message to the instance. This is independent of maintenance mode — a poster can be set while the instance is still active (e.g., to announce upcoming maintenance).

PUT /api/v1/instances/:id/maintenance-poster
Authorization: Bearer <zitadel-admin-jwt>
Content-Type: application/json

Request body:

Field	Type	Required	Description
`message`	string	yes	Markdown-formatted maintenance message
`scheduledStart`	string (ISO8601)	no	When maintenance is scheduled to begin
`scheduledEnd`	string (ISO8601)	no	When maintenance is scheduled to end

json

{
  "message": "Scheduled maintenance for database migration.",
  "scheduledStart": "2026-03-15T02:00:00Z",
  "scheduledEnd": "2026-03-15T04:00:00Z"
}

Response (200 OK):

json

{
  "message": "Scheduled maintenance for database migration.",
  "scheduledStart": "2026-03-15T02:00:00Z",
  "scheduledEnd": "2026-03-15T04:00:00Z",
  "setBy": "zitadel-subject-id",
  "setAt": "2026-03-10T12:00:00Z"
}

Side effects:

Saves the poster to the instance record. Does not change the instance status.
setBy is populated from the caller's Zitadel subject claim.

Clear Maintenance Poster

Removes the maintenance poster from the instance. Does not affect the instance status.

DELETE /api/v1/instances/:id/maintenance-poster
Authorization: Bearer <zitadel-admin-jwt>

No request body.

Response (200 OK):

json

{
  "cleared": true
}

Part 3: Console → Instance

The Console's background worker makes outbound HTTP calls to instances for health monitoring, tenant provisioning, and user lifecycle management.

Authentication

The Console authenticates to instances using a Zitadel Worker Token, obtained via the OAuth 2.0 client_credentials flow.

Request header (all Console → Instance calls):

Authorization: Bearer <zitadel-worker-token>
Content-Type: application/json

The token is issued by Zitadel using:

CONSOLE_SERVICE_CLIENT_ID
CONSOLE_SERVICE_CLIENT_SECRET

Important: The console-worker machine user in Zitadel must have its Access Token Type set to JWT (not opaque). This is configured during bootstrap (ACCESS_TOKEN_TYPE_JWT). If the token type is changed to opaque, instances will reject the token with a 401 because they validate via JWKS (not introspection).

Instances must validate this JWT against the Zitadel JWKS endpoint:

GET {ZITADEL_ISSUER}/oauth/v2/keys

Instance Connection Config

When an instance is registered in the Console, an InstanceConnection record is stored alongside it. This holds the URLs the Console uses to reach the instance, plus vault key references for sensitive credentials.

Field	Description
`apiBaseUrl`	Internal base URL for provisioning calls (e.g., `https://instance-1.internal`)
`healthCheckUrl`	Full URL for health polling (e.g., `https://instance-1.internal/internal/health`)
`dbURIRef`	Vault key name for the database connection URI (e.g., `conn-{connId}-dburi`)
`oidcClientId`	Zitadel OIDC client ID — not sensitive, stored directly in the database
`oidcClientSecretRef`	Vault key name for the OIDC client secret (e.g., `conn-{connId}-oidc-secret`)
`zitadelAppId`	Zitadel app ID (used for rotation/deletion)

How connection secrets are stored:

Admin calls UpsertConnection with raw dbURI + oidcSecret
  → Console writes dbURI      to vault: conn-{connId}-dburi
  → Console writes oidcSecret to vault: conn-{connId}-oidc-secret
  → Database stores key references only — never the raw values

Migration runner needs DB URI
  → reads conn.db_uri_ref from database        ("conn-{connId}-dburi")
  → calls vault.Get("conn-{connId}-dburi") → raw URI returned in memory only

Raw credentials are never written to the database or returned in API responses.

Instance Ingress Requirements

The Console Worker calls /internal/* endpoints on each instance via the public ingress. The instance's ingress must route /internal to the Go backend service (exto-go), not to the web frontend (SPA).

Common problem: Many instance ingresses use nginx.ingress.kubernetes.io/rewrite-target: /$1 with regex path captures (e.g., /api/(.*)). If /internal is added to the same ingress, the rewrite strips the prefix — /internal/health becomes /health, which doesn't match the Go route. The request either hits the SPA catch-all (/(.*)) and returns HTML, or hits the Go backend with a wrong path and returns 404.

Solution: Create a separate ingress resource for /internal without the rewrite-target annotation, so the path passes through to the Go backend as-is:

yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: go-internal-ingress
  namespace: <instance-namespace>
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "250m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "660"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "660"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "660"
    # No rewrite-target — /internal/health passes through as-is
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - <instance-domain>
      secretName: <tls-secret>
  rules:
    - host: <instance-domain>
      http:
        paths:
          - path: /internal
            pathType: Prefix
            backend:
              service:
                name: <exto-go-service>
                port:
                  number: 80

Verify it works:

bash

# Should return 401 (auth required) — confirms the Go backend is receiving the request
curl -s -o /dev/null -w "%{http_code}" https://<instance-domain>/internal/health
# Expected: 401

# If you get HTML or 404, the ingress is not routing correctly

Endpoint: Health Check

The Console polls this endpoint on a configurable interval (default: every 5 minutes) to record performance metrics.

GET {healthCheckUrl}
Authorization: Bearer <zitadel-worker-token>

Your instance must respond with:

Field	Type	Description
`latency50ms`	float64	p50 response latency in milliseconds
`latency95ms`	float64	p95 response latency in milliseconds
`latency99ms`	float64	p99 response latency in milliseconds
`errorRate`	float64	Error rate as a fraction (e.g., `0.005` = 0.5%)
`uptimeSecs`	int64	Total uptime in seconds since last restart

json

{
  "latency50ms": 25.5,
  "latency95ms": 45.2,
  "latency99ms": 78.9,
  "errorRate": 0.001,
  "uptimeSecs": 864000
}

Expected response: HTTP 2xx. The Console times out after 10 seconds.

Polling interval: Configurable via HEALTH_POLL_INTERVAL_SECS (default: 300 seconds).

Endpoint: Provision Tenant

When a new tenant is assigned to an instance, the Console calls this endpoint to initialize the tenant workspace on the instance. The Console retries every 2 minutes until it gets a success response.

POST {apiBaseUrl}/internal/provision-tenant
Authorization: Bearer <zitadel-worker-token>
Content-Type: application/json

Request body:

Field	Type	Description
`tenantId`	string	ID of the tenant
`name`	string	Human-readable tenant slug/name
`env`	string	Environment: `production`, `staging`, `dev`

json

{
  "tenantId": "507f1f77bcf86cd799439011",
  "name": "acme-corp",
  "env": "production"
}

Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure and will be retried.

Side effects on success:

The TenantInstanceBinding state transitions from pending → active.
A EventTenantProvisioningComplete notification is dispatched.

Your instance must be idempotent on this endpoint — it may be called more than once for the same tenantId.

Endpoint: Provision User

When a user is invited or assigned to a tenant, the Console calls this endpoint to create the user account on the instance. The Console retries every 5 minutes for unprovisioned users.

POST {apiBaseUrl}/internal/provision-user
Authorization: Bearer <zitadel-worker-token>
Content-Type: application/json

Request body:

Field	Type	Description
`zitadelUserId`	string	The user's unique Zitadel ID
`tenantId`	string	ID of the tenant
`roles`	string[]	Roles assigned to this user (e.g., `["admin", "viewer"]`)
`email`	string	User's email address
`displayName`	string	User's display name

json

{
  "zitadelUserId": "user-zitadel-id",
  "tenantId": "507f1f77bcf86cd799439011",
  "roles": ["admin", "viewer"],
  "email": "user@example.com",
  "displayName": "John Doe"
}

Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure and retried.

Your instance must be idempotent on this endpoint — the same user may be sent multiple times.

Endpoint: Deprovision User

When a user is removed from a tenant, the Console calls this endpoint to delete or deactivate the user on the instance.

DELETE {apiBaseUrl}/internal/provision-user/{zitadelUserId}?tenantId={tenantId}
Authorization: Bearer <zitadel-worker-token>

Path / query parameters:

Parameter	Location	Description
`zitadelUserId`	path	The user's Zitadel ID
`tenantId`	query string	ID of the tenant

Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure.

Part 4: Summary of All Endpoints

Instance → Console (Instance Token)

Method	Path	Auth	Purpose
`POST`	`/api/v1/server/instances/:id/startup`	Instance Token	Boot signal (activates + logs boot event)
`POST`	`/api/v1/server/instances/:id/heartbeat`	Instance Token	Push resource metrics
`POST`	`/api/v1/server/instances/:id/usage`	Instance Token	Ingest usage events

Admin → Console (Zitadel JWT)

Method	Path	Auth	Purpose
`POST`	`/api/v1/instances/:id/rotate-token`	Zitadel JWT	Rotate instance token (old token invalidated immediately)
`POST`	`/api/v1/instances/:id/decommission`	Zitadel JWT	Retire instance, stop all worker calls, preserve data
`POST`	`/api/v1/instances/:id/maintenance`	Zitadel JWT	Enter maintenance mode
`DELETE`	`/api/v1/instances/:id/maintenance`	Zitadel JWT	Exit maintenance mode, return to active
`PUT`	`/api/v1/instances/:id/maintenance-poster`	Zitadel JWT	Set maintenance message and schedule
`DELETE`	`/api/v1/instances/:id/maintenance-poster`	Zitadel JWT	Clear maintenance message

Console → Instance (Zitadel Worker Token)

Method	Path	Auth	Purpose
`GET`	`{healthCheckUrl}`	Zitadel Worker Token	Poll health metrics
`POST`	`{apiBaseUrl}/internal/provision-tenant`	Zitadel Worker Token	Initialize tenant workspace
`POST`	`{apiBaseUrl}/internal/provision-user`	Zitadel Worker Token	Create user on instance
`DELETE`	`{apiBaseUrl}/internal/provision-user/{userId}`	Zitadel Worker Token	Remove user from instance
`POST`	`{apiBaseUrl}/internal/tenant-status`	Zitadel Worker Token	Push tenant suspend/resume to instance

Part 5: Console Worker Schedule

The Console background worker runs these jobs automatically:

Job	Interval	Skips instances in status	Outbound calls
Health Poller	Every 5 min (configurable)	`decommissioned`	`GET {healthCheckUrl}` per active instance
Degraded Watcher	Every 60 sec	`decommissioned`, `maintenance`, `provisioning`	None (heartbeat timeout detection only)
Tenant Provisioner	Every 30 sec	`decommissioned`, `maintenance`, `provisioning`	`POST /internal/provision-tenant` for pending tenants
User Provisioner	Every 5 min	`decommissioned`	`POST /internal/provision-user` for pending users
Usage Aggregation	Every hour	—	None (internal Console processing)
Dunning	Every 6 hr	—	None (email dispatch only)
Migration Runner	Every 30 sec	—	Migration step calls (see migration docs)
Customer Purge	Every 6 hr	—	None (internal Console cleanup)

Part 6: Instance Lifecycle

Instance Created
  │
  ├─ Console generates token → writes raw token to Azure Key Vault → stores hash in PostgreSQL
  ├─ Console creates Zitadel OIDC app → stores clientId + clientSecret ref in instance_connections
  ├─ Console returns secretRef: "instance-{id}" (vault key name, not the token)
  │
  ▼
Instance Boots (every pod start — first boot, restarts, new replicas, DR)
  │
  ├─ Reads raw token from vault using Managed Identity
  ├─ Stores token in memory
  ├─ Calls POST /startup with {podName, version}
  │    ├─ Response always includes "status" field — instance caches this
  │    ├─ status = "provisioning"     → first boot: transitions to "active"
  │    ├─ status = "active"           → subsequent boot: boot event recorded
  │    ├─ status = "maintenance"      → boot event recorded, middleware blocks tenant traffic
  │    ├─ status = "degraded"         → boot event recorded, middleware allows tenant traffic
  │    └─ status = "decommissioned"   → no boot event, middleware blocks all tenant traffic
  │
  ▼
Instance Active
  │
  ├─ Heartbeat loop (every 30–60s): POST /heartbeat → cache "status" from response
  │    ├─ status = "active"           → serve tenant traffic normally
  │    ├─ status = "maintenance"      → middleware returns 503 to tenants
  │    ├─ status = "degraded"         → serve tenant traffic (Console auto-set, not admin action)
  │    └─ status = "decommissioned"   → recorded: false → middleware blocks all tenant traffic
  │
  ├─ Console polls health → GET {healthCheckUrl} (every 5 min)
  │
  ├─ Tenant assigned to instance
  │    └─ Console worker calls POST /internal/provision-tenant
  │         ├─ Skipped if instance is: maintenance | decommissioned | provisioning
  │         └─ On success: tenant status transitions provisioning → active
  │
  ├─ User invited to tenant
  │    └─ Console worker calls POST /internal/provision-user
  │
  ├─ User removed from tenant
  │    └─ Console worker calls DELETE /internal/provision-user/{id}
  │
  ├─ Instance pushes usage events → POST /usage (per billing period)
  │
  ├─ Admin sets maintenance mode
  │    ├─ POST /maintenance → status: active → maintenance
  │    ├─ Optional: PUT /maintenance-poster (sets user-facing message, independent of status)
  │    ├─ Instance learns on next heartbeat → middleware returns 503
  │    └─ DELETE /maintenance → status: maintenance → active
  │         └─ Instance learns on next heartbeat → middleware resumes serving
  │
  └─ Admin decommissions instance
       ├─ POST /decommission → status: * → decommissioned
       ├─ All non-archived tenants suspended immediately
       ├─ Instance learns on next heartbeat (recorded: false) or startup
       └─ Instance middleware blocks all tenant traffic
            │
            └─ To delete the instance:
                 ├─ Reassign all tenants to a new instance
                 ├─ Restore tenants: POST /tenants/:id/restore
                 └─ DELETE /instances/:id  (blocked until zero active tenants)

Part 7: Configuration Reference

Variables the Console requires (in .env):

env

# Console API server
API_ADDR=:8000

# PostgreSQL
DATABASE_URL=postgres://console:password@localhost:5432/console?sslmode=disable

# Zitadel — JWT validation (Console API)
ZITADEL_ISSUER=https://auth.example.com
ZITADEL_API_URL=https://auth.example.com
ZITADEL_PAT=<personal-access-token>
ZITADEL_ADMIN_ORG_ID=<console-admin-org-id>
EXTOID_PROJECT_ID=<project-id>

# Zitadel — Console service account credentials (used by both API and worker)
CONSOLE_SERVICE_CLIENT_ID=<machine-user-client-id>
CONSOLE_SERVICE_CLIENT_SECRET=<machine-user-client-secret>

# Zitadel — webhook verification (Zitadel → Console)
# Signing key returned by Zitadel when creating the webhook target (from .bootstrap.env)
ZITADEL_WEBHOOK_SECRET=<signing-key-from-bootstrap>

# Worker
HEALTH_POLL_INTERVAL_SECS=300

# Secret store (SOC 2 required: use azure-keyvault in all non-local environments)
SECRET_STORE_PROVIDER=azure-keyvault        # or "noop" for local dev only
AZURE_KEY_VAULT_URL=https://my-vault.vault.azure.net/

# Email
EMAIL_PROVIDER=sendgrid
EMAIL_API_KEY=<api-key>
EMAIL_FROM_ADDR=noreply@exto360.com

Part 8: Security Notes

Instance tokens are hashed (SHA-256) at rest. The Console never stores the raw token.
Console → Instance calls use short-lived Zitadel JWTs. Instances should validate the JWT signature using Zitadel's JWKS endpoint and check token expiry.
Webhook calls from Zitadel to Console are HMAC-SHA256 verified using ZITADEL_WEBHOOK_SECRET.
All Console → Instance endpoints (/internal/*) should only be accessible from the Console worker — consider network-level restrictions in addition to token validation.
The instance token is shown exactly once at creation. It cannot be retrieved. Rotate by creating a new instance record if compromised.

Instance ↔ Console Integration Guide ​

Overview ​

Part 1: Instance → Console ​

Authentication ​

Vault Setup ​

Endpoint: Startup Signal ​

Success Responses (200 OK) ​

Error Responses ​

Side Effects ​

What Console infers from boot events ​

Endpoint: Heartbeat ​

Success Responses (200 OK) ​

Error Responses ​

Side Effects (non-decommissioned only) ​

Degraded auto-detection thresholds (configurable per instance) ​

Instance middleware behaviour by status ​

Endpoint: Usage Event Ingestion ​

Part 2: Admin → Console ​

Token Rotation ​

Decommission ​

Set Maintenance Mode ​

Clear Maintenance Mode ​

Set Maintenance Poster ​

Clear Maintenance Poster ​

Part 3: Console → Instance ​

Authentication ​

Instance Connection Config ​

Instance Ingress Requirements ​

Endpoint: Health Check ​

Endpoint: Provision Tenant ​

Endpoint: Provision User ​

Endpoint: Deprovision User ​

Part 4: Summary of All Endpoints ​

Instance → Console (Instance Token) ​

Admin → Console (Zitadel JWT) ​

Console → Instance (Zitadel Worker Token) ​

Part 5: Console Worker Schedule ​

Part 6: Instance Lifecycle ​

Part 7: Configuration Reference ​

Part 8: Security Notes ​

Instance ↔ Console Integration Guide

Overview

Part 1: Instance → Console

Authentication

Vault Setup

Endpoint: Startup Signal

Success Responses (200 OK)

Error Responses

Side Effects

What Console infers from boot events

Endpoint: Heartbeat

Success Responses (200 OK)

Error Responses

Side Effects (non-decommissioned only)

Degraded auto-detection thresholds (configurable per instance)

Instance middleware behaviour by status

Endpoint: Usage Event Ingestion

Part 2: Admin → Console

Token Rotation

Decommission

Set Maintenance Mode

Clear Maintenance Mode

Set Maintenance Poster

Clear Maintenance Poster

Part 3: Console → Instance

Authentication

Instance Connection Config

Instance Ingress Requirements

Endpoint: Health Check

Endpoint: Provision Tenant

Endpoint: Provision User

Endpoint: Deprovision User

Part 4: Summary of All Endpoints

Instance → Console (Instance Token)

Admin → Console (Zitadel JWT)

Console → Instance (Zitadel Worker Token)

Part 5: Console Worker Schedule

Part 6: Instance Lifecycle

Part 7: Configuration Reference

Part 8: Security Notes