Appearance
Instance ↔ Console Integration Guide
This is the definitive reference for how an instance integrates with the Console. It covers both directions of communication, authentication mechanisms, and all request/response contracts.
Overview
There are three directions of communication:
| Direction | Auth Mechanism | Purpose |
|---|---|---|
| Instance → Console | Instance Registration Token (Bearer) | Startup signal, heartbeat, usage event ingestion |
| Admin → Console | Zitadel JWT (Bearer) | Token rotation, maintenance mode, maintenance poster, decommission |
| Console → Instance | Zitadel Worker Token (Bearer) | Health polling, tenant/user provisioning |
Part 1: Instance → Console
Authentication
When the Console creates an instance, it generates a 256-bit random token, writes the raw token to the configured key vault (Azure Key Vault), and stores only the SHA-256 hash in PostgreSQL. The raw token never appears in Console API responses or logs.
How it works end-to-end:
Console creates instance
→ generates raw token + SHA-256 hash
→ writes raw token to vault: instance-{instanceID}
→ stores hash in PostgreSQL: instance_token_hash
→ returns secretRef to admin: "instance-507f1f77bcf86cd799439011"
Instance boots
→ reads raw token from vault using its Managed Identity
→ sends token as Bearer on every request to Console
Console verifies request
→ SHA-256 hashes the incoming token
→ looks up hash in PostgreSQL → match = authenticatedRequest header:
Authorization: Bearer <raw-token-from-vault>What is stored where:
| Location | Value | Who reads it |
|---|---|---|
| Azure Key Vault | Raw token (plaintext) | Instance pods only (Managed Identity) |
| Console DB | SHA-256 hash | Console API only (never exposed) |
Token expiry: Tokens do not expire on a schedule. They are invalidated only by admin rotation or instance deletion. Routine rotation is handled via the
rotate-tokenendpoint (see Part 2: Admin → Console).
DR / new pod: A replacement instance pod reads the token from the vault on startup — no manual injection, no k8s secret coordination.
Vault Setup
Azure RBAC required:
| Principal | Role | Purpose |
|---|---|---|
| Console API / Worker pods | Key Vault Secrets Officer | Read + write + delete secrets |
| Instance pods | Key Vault Secrets User | Read secrets only |
Secret naming convention: instance-{instanceID} (e.g., instance-507f1f77bcf86cd799439011)
Console configuration:
env
SECRET_STORE_PROVIDER=azure-keyvault
AZURE_KEY_VAULT_URL=https://my-vault.vault.azure.net/Auth uses DefaultAzureCredential — resolves automatically via Managed Identity on AKS, or AZURE_CLIENT_ID / AZURE_CLIENT_SECRET / AZURE_TENANT_ID env vars for non-AKS deployments.
Local development:
env
SECRET_STORE_PROVIDER=noopTokens are not persisted — Console logs a warning. Use noop only for local dev.
Endpoint: Startup Signal
Every instance pod calls this on boot. The response always includes the current instance status — the instance must read this and apply its availability middleware accordingly.
POST /api/v1/server/instances/:id/startup
Authorization: Bearer <instance-registration-token>
Content-Type: application/jsonRequest body (optional):
| Field | Type | Required | Description |
|---|---|---|---|
podName | string | no | Kubernetes pod name (from HOSTNAME env var) |
version | string | no | Deployed version string (e.g., v1.2.3) |
json
{
"podName": "instance-1-abc123",
"version": "v1.2.3"
}The body is optional. An empty body is accepted.
Success Responses (200 OK)
First boot — instance was in provisioning state, now activated:
json
{
"status": "active",
"firstBoot": true,
"message": "Instance is now active."
}Subsequent boot — instance already active, boot event recorded:
json
{
"status": "active",
"firstBoot": false,
"message": "Boot event recorded."
}Boot while in maintenance — boot event recorded, instance remains in maintenance:
json
{
"status": "maintenance",
"firstBoot": false,
"message": "Boot event recorded."
}The instance middleware must return 503 to tenant traffic when
statusismaintenance.
Boot while degraded — boot event recorded, Console will re-evaluate on next heartbeat:
json
{
"status": "degraded",
"firstBoot": false,
"message": "Boot event recorded."
}Boot while decommissioned — Console acknowledges but tells instance to self-block:
json
{
"status": "decommissioned",
"firstBoot": false,
"message": "Instance is decommissioned. Tenant traffic must be blocked."
}Returned as 200, not 4xx — so the instance can read the status and block traffic. The instance must not serve tenant requests when
statusisdecommissioned.
Error Responses
| Condition | HTTP | Body |
|---|---|---|
| Invalid instance ID format | 400 | {"error": "Invalid id"} |
| Token doesn't match this instance | 403 | {"error": "Token does not match instance"} |
| Instance not found | 404 | {"error": "Instance not found"} |
| Internal error fetching instance | 500 | {"error": "Failed to get instance"} |
| Internal error activating instance | 500 | {"error": "Failed to activate instance"} |
Side Effects
- First boot only: Transitions instance status
provisioning→active. - Every boot (except decommissioned): Inserts a boot event row with
instance_id,pod_name,version,booted_at. Boot events auto-expire after 30 days. - Decommissioned: No state change, no boot event recorded.
What Console infers from boot events
| Signal | How |
|---|---|
| Crash loop | Same podName appearing repeatedly in a short window |
| Rollout | New version appearing across different pod names |
| Replica count | Count of distinct podName values in a recent time window |
| Restart | Same podName + same version appearing again |
The first boot is the only way to transition out of
provisioning. The Console does not poll health or provision tenants until the instance isactive.
Endpoint: Heartbeat
Instances push resource metrics and receive their current Console-side status in every response. The instance must update its cached status from the response and apply availability middleware accordingly.
POST /api/v1/server/instances/:id/heartbeat
Authorization: Bearer <instance-registration-token>
Content-Type: application/jsonRequest body:
| Field | Type | Required | Description |
|---|---|---|---|
status | string | yes | Instance-reported status: active, degraded |
cpuPercent | float64 | yes | Current CPU utilization (0–100) |
memoryPercent | float64 | yes | Current memory utilization (0–100) |
diskPercent | float64 | yes | Current disk utilization (0–100) |
activeTenantCount | int | yes | Number of tenants currently active on this instance |
version | string | yes | Deployed version string (e.g., v1.2.3) |
json
{
"status": "active",
"cpuPercent": 45.2,
"memoryPercent": 62.8,
"diskPercent": 78.5,
"activeTenantCount": 3,
"version": "v1.2.3"
}Success Responses (200 OK)
Normal heartbeat — metrics recorded, Console-side status returned:
json
{
"recorded": true,
"status": "active",
"timestamp": "2026-03-08T10:30:00Z"
}Heartbeat when Console-side status is maintenance — metrics recorded, instance should block tenant traffic:
json
{
"recorded": true,
"status": "maintenance",
"timestamp": "2026-03-08T10:30:00Z"
}Heartbeat when Console-side status is degraded — metrics recorded (Console may have auto-set degraded due to threshold breach):
json
{
"recorded": true,
"status": "degraded",
"timestamp": "2026-03-08T10:30:00Z"
}Heartbeat when Console-side status is decommissioned — metrics NOT recorded, instance must self-block:
json
{
"recorded": false,
"status": "decommissioned",
"timestamp": "2026-03-08T10:30:00Z"
}When
recorded: false, the Console has acknowledged the signal but written nothing. The instance should stop serving tenant traffic immediately.
Error Responses
| Condition | HTTP | Body |
|---|---|---|
| Invalid instance ID format | 400 | {"error": "Invalid id"} |
| Token doesn't match this instance | 403 | {"error": "Token does not match instance"} |
| Internal error fetching instance | 500 | {"error": "Failed to get instance"} |
| Internal error recording metrics | 500 | {"error": "Failed to record heartbeat"} |
Side Effects (non-decommissioned only)
- Updates
last_heartbeat_aton the instance record. - Updates
last_cpu_percent,last_memory_percent,last_disk_percent,last_active_tenant_count,last_version. - If any metric exceeds configured thresholds, the Console overrides status to
degradedregardless of the instance-reported value.
Degraded auto-detection thresholds (configurable per instance)
| Metric | Default |
|---|---|
| CPU | 80% |
| Memory | 85% |
| Disk | 90% |
Instance middleware behaviour by status
status in response | Tenant traffic | Notes |
|---|---|---|
active | Allow | |
degraded | Allow | Console auto-set; instance continues serving |
maintenance | Block (503) | Optionally surface maintenance poster message |
decommissioned | Block (503) | recorded: false; instance should not re-activate |
Recommendation: Send a heartbeat every 30–60 seconds. The degraded watcher marks an instance degraded if no heartbeat is received within the configured timeout.
Endpoint: Usage Event Ingestion
Instances call this endpoint to push billable usage events to the Console. The Console aggregates these hourly and applies them against the customer's billing plan.
POST /api/v1/server/instances/:id/usage
Authorization: Bearer <instance-registration-token>
Content-Type: application/jsonRequest body:
| Field | Type | Required | Description |
|---|---|---|---|
tenantId | string | yes | ID of the tenant |
meter | string | yes | Billable dimension name (see well-known meters below) |
value | number | yes | Usage quantity |
unit | string | yes | Unit label (e.g., count, gb) |
periodStart | string (RFC3339) | yes | Start of the usage period |
periodEnd | string (RFC3339) | yes | End of the usage period |
json
{
"tenantId": "507f1f77bcf86cd799439011",
"meter": "api_calls",
"value": 1500,
"unit": "count",
"periodStart": "2026-03-08T00:00:00Z",
"periodEnd": "2026-03-08T01:00:00Z"
}Response (202 Accepted — new event):
json
{
"status": "accepted"
}Response (200 OK — duplicate, already recorded):
json
{
"status": "duplicate, ignored"
}Idempotency: Requests are deduplicated on the composite key (instanceId, tenantId, meter, periodStart). Retrying the same event is safe.
Well-known meter names:
| Meter | Description |
|---|---|
active_projects | Number of active projects |
active_users | Number of active users |
api_calls | Total API calls made |
storage_gb | Storage consumed in GB |
workflow_executions | Number of workflow runs |
data_migrations | Number of data migrations executed |
integrations | Number of active integrations |
Custom meters are supported — any string value is accepted, but only well-known meters are subject to plan limit enforcement.
For real-time tenant status updates (suspend/resume while the instance is running), the Console pushes changes directly to the instance via
POST /internal/tenant-status. The instance writes the change to its local DB and updates its in-memory cache. On restart, the instance rebuilds the cache from its own local DB — no Console call is needed. See Tenant Suspension for full details.
Part 2: Admin → Console
These endpoints are called by Exto operators via the Console admin API. Auth is a Zitadel JWT — the instance registration token is not used here.
Token Rotation
Use when a token is suspected compromised. The new token is written to the vault first, then the hash is updated in the database. The old token is immediately invalid once the database hash is updated.
POST /api/v1/instances/:id/rotate-token
Authorization: Bearer <zitadel-admin-jwt>No request body.
Response (200 OK):
json
{
"secretRef": "instance-507f1f77bcf86cd799439011",
"message": "Token rotated and written to vault. The instance will use the new token on its next vault secret refresh. Previous token is immediately invalid."
}What happens (vault-first ordering):
- New 256-bit random token generated.
- Raw token written to vault (
instance-{id}) — vault auto-versions the old value. - New SHA-256 hash written to PostgreSQL — old token is invalidated at this point.
- The instance picks up the new token on its next vault secret refresh (no restart needed if using CSI driver or secret refresh).
The raw token is never returned to the caller — it goes directly to the vault. This is intentional for SOC 2 compliance.
Decommission
Permanently retires an instance. All Console worker outbound calls stop immediately. All non-archived tenants on the instance are suspended — they must be manually restored after migrating to a new instance.
POST /api/v1/instances/:id/decommission
Authorization: Bearer <zitadel-admin-jwt>No request body.
Response (200 OK):
json
{
"status": "decommissioned",
"tenantsSuspended": 4,
"message": "Instance decommissioned. All tenant access suspended. Console worker calls stopped."
}| Field | Description |
|---|---|
status | Always "decommissioned" |
tenantsSuspended | Number of tenants that were active/provisioning and are now suspended |
message | Human-readable summary |
Error responses:
| Condition | HTTP | Body |
|---|---|---|
| Instance not found | 404 | {"error": "Instance not found"} |
| Already decommissioned | 409 | {"error": "Instance is already decommissioned"} |
| Internal error | 500 | {"error": "Failed to decommission instance"} |
What happens immediately:
- Instance status set to
decommissionedin the database. - All non-archived, non-suspended tenants on this instance are set to
suspended. - On the instance's next heartbeat or startup, Console returns
status: "decommissioned"— the instance middleware must block all tenant traffic.
What Console stops doing:
| Worker job | Behaviour after decommission |
|---|---|
| Health Poller | Skips this instance |
| Degraded Watcher | Skips this instance (no timeout enforcement) |
| Tenant Provisioner | Skips tenants assigned to this instance |
| User Provisioner | Skips users for tenants on this instance |
| Heartbeats | Still accepted — returns status: "decommissioned" so the instance self-blocks |
What is preserved:
- Instance record and connection config
- All usage events and aggregates
- All tenant records (status changed to
suspended, not deleted) - All boot events (expire naturally after 30 days)
To restore tenants after migrating to a new instance:
- Reassign each tenant to the new instance via
PUT /api/v1/tenants/:id. - Restore each tenant via
POST /api/v1/tenants/:id/restore. - The Console worker will re-provision them to the new instance automatically.
Decommission does not delete the instance. To delete, use
DELETE /api/v1/instances/:id(blocked until all tenants are removed).
Set Maintenance Mode
Places an instance into maintenance mode. The instance should block tenant traffic while in this state. Console worker jobs (health polling, heartbeat recording) continue — provisioning of new tenants and users is paused until maintenance is cleared.
POST /api/v1/instances/:id/maintenance
Authorization: Bearer <zitadel-admin-jwt>No request body.
Response (200 OK):
json
{
"status": "maintenance",
"message": "Instance is now in maintenance mode."
}Error responses:
| Condition | Status | Body |
|---|---|---|
| Instance is decommissioned | 409 | {"error": "cannot set maintenance on a decommissioned instance"} |
| Already in maintenance | 409 | {"error": "instance is already in maintenance"} |
Clear Maintenance Mode
Returns an instance from maintenance mode to active.
DELETE /api/v1/instances/:id/maintenance
Authorization: Bearer <zitadel-admin-jwt>No request body.
Response (200 OK):
json
{
"status": "active",
"message": "Maintenance lifted. Instance is now active."
}Error responses:
| Condition | Status | Body |
|---|---|---|
| Not in maintenance | 409 | {"error": "instance is not in maintenance"} |
Set Maintenance Poster
Attaches a user-facing maintenance message to the instance. This is independent of maintenance mode — a poster can be set while the instance is still active (e.g., to announce upcoming maintenance).
PUT /api/v1/instances/:id/maintenance-poster
Authorization: Bearer <zitadel-admin-jwt>
Content-Type: application/jsonRequest body:
| Field | Type | Required | Description |
|---|---|---|---|
message | string | yes | Markdown-formatted maintenance message |
scheduledStart | string (ISO8601) | no | When maintenance is scheduled to begin |
scheduledEnd | string (ISO8601) | no | When maintenance is scheduled to end |
json
{
"message": "Scheduled maintenance for database migration.",
"scheduledStart": "2026-03-15T02:00:00Z",
"scheduledEnd": "2026-03-15T04:00:00Z"
}Response (200 OK):
json
{
"message": "Scheduled maintenance for database migration.",
"scheduledStart": "2026-03-15T02:00:00Z",
"scheduledEnd": "2026-03-15T04:00:00Z",
"setBy": "zitadel-subject-id",
"setAt": "2026-03-10T12:00:00Z"
}Side effects:
- Saves the poster to the instance record. Does not change the instance status.
setByis populated from the caller's Zitadel subject claim.
Clear Maintenance Poster
Removes the maintenance poster from the instance. Does not affect the instance status.
DELETE /api/v1/instances/:id/maintenance-poster
Authorization: Bearer <zitadel-admin-jwt>No request body.
Response (200 OK):
json
{
"cleared": true
}Part 3: Console → Instance
The Console's background worker makes outbound HTTP calls to instances for health monitoring, tenant provisioning, and user lifecycle management.
Authentication
The Console authenticates to instances using a Zitadel Worker Token, obtained via the OAuth 2.0 client_credentials flow.
Request header (all Console → Instance calls):
Authorization: Bearer <zitadel-worker-token>
Content-Type: application/jsonThe token is issued by Zitadel using:
CONSOLE_SERVICE_CLIENT_IDCONSOLE_SERVICE_CLIENT_SECRET
Important: The
console-workermachine user in Zitadel must have its Access Token Type set to JWT (not opaque). This is configured during bootstrap (ACCESS_TOKEN_TYPE_JWT). If the token type is changed to opaque, instances will reject the token with a 401 because they validate via JWKS (not introspection).
Instances must validate this JWT against the Zitadel JWKS endpoint:
GET {ZITADEL_ISSUER}/oauth/v2/keysInstance Connection Config
When an instance is registered in the Console, an InstanceConnection record is stored alongside it. This holds the URLs the Console uses to reach the instance, plus vault key references for sensitive credentials.
| Field | Description |
|---|---|
apiBaseUrl | Internal base URL for provisioning calls (e.g., https://instance-1.internal) |
healthCheckUrl | Full URL for health polling (e.g., https://instance-1.internal/internal/health) |
dbURIRef | Vault key name for the database connection URI (e.g., conn-{connId}-dburi) |
oidcClientId | Zitadel OIDC client ID — not sensitive, stored directly in the database |
oidcClientSecretRef | Vault key name for the OIDC client secret (e.g., conn-{connId}-oidc-secret) |
zitadelAppId | Zitadel app ID (used for rotation/deletion) |
How connection secrets are stored:
Admin calls UpsertConnection with raw dbURI + oidcSecret
→ Console writes dbURI to vault: conn-{connId}-dburi
→ Console writes oidcSecret to vault: conn-{connId}-oidc-secret
→ Database stores key references only — never the raw values
Migration runner needs DB URI
→ reads conn.db_uri_ref from database ("conn-{connId}-dburi")
→ calls vault.Get("conn-{connId}-dburi") → raw URI returned in memory onlyRaw credentials are never written to the database or returned in API responses.
Instance Ingress Requirements
The Console Worker calls /internal/* endpoints on each instance via the public ingress. The instance's ingress must route /internal to the Go backend service (exto-go), not to the web frontend (SPA).
Common problem: Many instance ingresses use nginx.ingress.kubernetes.io/rewrite-target: /$1 with regex path captures (e.g., /api/(.*)). If /internal is added to the same ingress, the rewrite strips the prefix — /internal/health becomes /health, which doesn't match the Go route. The request either hits the SPA catch-all (/(.*)) and returns HTML, or hits the Go backend with a wrong path and returns 404.
Solution: Create a separate ingress resource for /internal without the rewrite-target annotation, so the path passes through to the Go backend as-is:
yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: go-internal-ingress
namespace: <instance-namespace>
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "250m"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "660"
nginx.ingress.kubernetes.io/proxy-read-timeout: "660"
nginx.ingress.kubernetes.io/proxy-send-timeout: "660"
# No rewrite-target — /internal/health passes through as-is
spec:
ingressClassName: nginx
tls:
- hosts:
- <instance-domain>
secretName: <tls-secret>
rules:
- host: <instance-domain>
http:
paths:
- path: /internal
pathType: Prefix
backend:
service:
name: <exto-go-service>
port:
number: 80Verify it works:
bash
# Should return 401 (auth required) — confirms the Go backend is receiving the request
curl -s -o /dev/null -w "%{http_code}" https://<instance-domain>/internal/health
# Expected: 401
# If you get HTML or 404, the ingress is not routing correctlyEndpoint: Health Check
The Console polls this endpoint on a configurable interval (default: every 5 minutes) to record performance metrics.
GET {healthCheckUrl}
Authorization: Bearer <zitadel-worker-token>Your instance must respond with:
| Field | Type | Description |
|---|---|---|
latency50ms | float64 | p50 response latency in milliseconds |
latency95ms | float64 | p95 response latency in milliseconds |
latency99ms | float64 | p99 response latency in milliseconds |
errorRate | float64 | Error rate as a fraction (e.g., 0.005 = 0.5%) |
uptimeSecs | int64 | Total uptime in seconds since last restart |
json
{
"latency50ms": 25.5,
"latency95ms": 45.2,
"latency99ms": 78.9,
"errorRate": 0.001,
"uptimeSecs": 864000
}Expected response: HTTP 2xx. The Console times out after 10 seconds.
Polling interval: Configurable via HEALTH_POLL_INTERVAL_SECS (default: 300 seconds).
Endpoint: Provision Tenant
When a new tenant is assigned to an instance, the Console calls this endpoint to initialize the tenant workspace on the instance. The Console retries every 2 minutes until it gets a success response.
POST {apiBaseUrl}/internal/provision-tenant
Authorization: Bearer <zitadel-worker-token>
Content-Type: application/jsonRequest body:
| Field | Type | Description |
|---|---|---|
tenantId | string | ID of the tenant |
name | string | Human-readable tenant slug/name |
env | string | Environment: production, staging, dev |
json
{
"tenantId": "507f1f77bcf86cd799439011",
"name": "acme-corp",
"env": "production"
}Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure and will be retried.
Side effects on success:
- The
TenantInstanceBindingstate transitions frompending→active. - A
EventTenantProvisioningCompletenotification is dispatched.
Your instance must be idempotent on this endpoint — it may be called more than once for the same
tenantId.
Endpoint: Provision User
When a user is invited or assigned to a tenant, the Console calls this endpoint to create the user account on the instance. The Console retries every 5 minutes for unprovisioned users.
POST {apiBaseUrl}/internal/provision-user
Authorization: Bearer <zitadel-worker-token>
Content-Type: application/jsonRequest body:
| Field | Type | Description |
|---|---|---|
zitadelUserId | string | The user's unique Zitadel ID |
tenantId | string | ID of the tenant |
roles | string[] | Roles assigned to this user (e.g., ["admin", "viewer"]) |
email | string | User's email address |
displayName | string | User's display name |
json
{
"zitadelUserId": "user-zitadel-id",
"tenantId": "507f1f77bcf86cd799439011",
"roles": ["admin", "viewer"],
"email": "user@example.com",
"displayName": "John Doe"
}Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure and retried.
Your instance must be idempotent on this endpoint — the same user may be sent multiple times.
Endpoint: Deprovision User
When a user is removed from a tenant, the Console calls this endpoint to delete or deactivate the user on the instance.
DELETE {apiBaseUrl}/internal/provision-user/{zitadelUserId}?tenantId={tenantId}
Authorization: Bearer <zitadel-worker-token>Path / query parameters:
| Parameter | Location | Description |
|---|---|---|
zitadelUserId | path | The user's Zitadel ID |
tenantId | query string | ID of the tenant |
Expected response: HTTP 2xx. Any HTTP >= 400 is treated as failure.
Part 4: Summary of All Endpoints
Instance → Console (Instance Token)
| Method | Path | Auth | Purpose |
|---|---|---|---|
POST | /api/v1/server/instances/:id/startup | Instance Token | Boot signal (activates + logs boot event) |
POST | /api/v1/server/instances/:id/heartbeat | Instance Token | Push resource metrics |
POST | /api/v1/server/instances/:id/usage | Instance Token | Ingest usage events |
Admin → Console (Zitadel JWT)
| Method | Path | Auth | Purpose |
|---|---|---|---|
POST | /api/v1/instances/:id/rotate-token | Zitadel JWT | Rotate instance token (old token invalidated immediately) |
POST | /api/v1/instances/:id/decommission | Zitadel JWT | Retire instance, stop all worker calls, preserve data |
POST | /api/v1/instances/:id/maintenance | Zitadel JWT | Enter maintenance mode |
DELETE | /api/v1/instances/:id/maintenance | Zitadel JWT | Exit maintenance mode, return to active |
PUT | /api/v1/instances/:id/maintenance-poster | Zitadel JWT | Set maintenance message and schedule |
DELETE | /api/v1/instances/:id/maintenance-poster | Zitadel JWT | Clear maintenance message |
Console → Instance (Zitadel Worker Token)
| Method | Path | Auth | Purpose |
|---|---|---|---|
GET | {healthCheckUrl} | Zitadel Worker Token | Poll health metrics |
POST | {apiBaseUrl}/internal/provision-tenant | Zitadel Worker Token | Initialize tenant workspace |
POST | {apiBaseUrl}/internal/provision-user | Zitadel Worker Token | Create user on instance |
DELETE | {apiBaseUrl}/internal/provision-user/{userId} | Zitadel Worker Token | Remove user from instance |
POST | {apiBaseUrl}/internal/tenant-status | Zitadel Worker Token | Push tenant suspend/resume to instance |
Part 5: Console Worker Schedule
The Console background worker runs these jobs automatically:
| Job | Interval | Skips instances in status | Outbound calls |
|---|---|---|---|
| Health Poller | Every 5 min (configurable) | decommissioned | GET {healthCheckUrl} per active instance |
| Degraded Watcher | Every 60 sec | decommissioned, maintenance, provisioning | None (heartbeat timeout detection only) |
| Tenant Provisioner | Every 30 sec | decommissioned, maintenance, provisioning | POST /internal/provision-tenant for pending tenants |
| User Provisioner | Every 5 min | decommissioned | POST /internal/provision-user for pending users |
| Usage Aggregation | Every hour | — | None (internal Console processing) |
| Dunning | Every 6 hr | — | None (email dispatch only) |
| Migration Runner | Every 30 sec | — | Migration step calls (see migration docs) |
| Customer Purge | Every 6 hr | — | None (internal Console cleanup) |
Part 6: Instance Lifecycle
Instance Created
│
├─ Console generates token → writes raw token to Azure Key Vault → stores hash in PostgreSQL
├─ Console creates Zitadel OIDC app → stores clientId + clientSecret ref in instance_connections
├─ Console returns secretRef: "instance-{id}" (vault key name, not the token)
│
▼
Instance Boots (every pod start — first boot, restarts, new replicas, DR)
│
├─ Reads raw token from vault using Managed Identity
├─ Stores token in memory
├─ Calls POST /startup with {podName, version}
│ ├─ Response always includes "status" field — instance caches this
│ ├─ status = "provisioning" → first boot: transitions to "active"
│ ├─ status = "active" → subsequent boot: boot event recorded
│ ├─ status = "maintenance" → boot event recorded, middleware blocks tenant traffic
│ ├─ status = "degraded" → boot event recorded, middleware allows tenant traffic
│ └─ status = "decommissioned" → no boot event, middleware blocks all tenant traffic
│
▼
Instance Active
│
├─ Heartbeat loop (every 30–60s): POST /heartbeat → cache "status" from response
│ ├─ status = "active" → serve tenant traffic normally
│ ├─ status = "maintenance" → middleware returns 503 to tenants
│ ├─ status = "degraded" → serve tenant traffic (Console auto-set, not admin action)
│ └─ status = "decommissioned" → recorded: false → middleware blocks all tenant traffic
│
├─ Console polls health → GET {healthCheckUrl} (every 5 min)
│
├─ Tenant assigned to instance
│ └─ Console worker calls POST /internal/provision-tenant
│ ├─ Skipped if instance is: maintenance | decommissioned | provisioning
│ └─ On success: tenant status transitions provisioning → active
│
├─ User invited to tenant
│ └─ Console worker calls POST /internal/provision-user
│
├─ User removed from tenant
│ └─ Console worker calls DELETE /internal/provision-user/{id}
│
├─ Instance pushes usage events → POST /usage (per billing period)
│
├─ Admin sets maintenance mode
│ ├─ POST /maintenance → status: active → maintenance
│ ├─ Optional: PUT /maintenance-poster (sets user-facing message, independent of status)
│ ├─ Instance learns on next heartbeat → middleware returns 503
│ └─ DELETE /maintenance → status: maintenance → active
│ └─ Instance learns on next heartbeat → middleware resumes serving
│
└─ Admin decommissions instance
├─ POST /decommission → status: * → decommissioned
├─ All non-archived tenants suspended immediately
├─ Instance learns on next heartbeat (recorded: false) or startup
└─ Instance middleware blocks all tenant traffic
│
└─ To delete the instance:
├─ Reassign all tenants to a new instance
├─ Restore tenants: POST /tenants/:id/restore
└─ DELETE /instances/:id (blocked until zero active tenants)Part 7: Configuration Reference
Variables the Console requires (in .env):
env
# Console API server
API_ADDR=:8000
# PostgreSQL
DATABASE_URL=postgres://console:password@localhost:5432/console?sslmode=disable
# Zitadel — JWT validation (Console API)
ZITADEL_ISSUER=https://auth.example.com
ZITADEL_API_URL=https://auth.example.com
ZITADEL_PAT=<personal-access-token>
ZITADEL_ADMIN_ORG_ID=<console-admin-org-id>
EXTOID_PROJECT_ID=<project-id>
# Zitadel — Console service account credentials (used by both API and worker)
CONSOLE_SERVICE_CLIENT_ID=<machine-user-client-id>
CONSOLE_SERVICE_CLIENT_SECRET=<machine-user-client-secret>
# Zitadel — webhook verification (Zitadel → Console)
# Signing key returned by Zitadel when creating the webhook target (from .bootstrap.env)
ZITADEL_WEBHOOK_SECRET=<signing-key-from-bootstrap>
# Worker
HEALTH_POLL_INTERVAL_SECS=300
# Secret store (SOC 2 required: use azure-keyvault in all non-local environments)
SECRET_STORE_PROVIDER=azure-keyvault # or "noop" for local dev only
AZURE_KEY_VAULT_URL=https://my-vault.vault.azure.net/
# Email
EMAIL_PROVIDER=sendgrid
EMAIL_API_KEY=<api-key>
EMAIL_FROM_ADDR=noreply@exto360.comPart 8: Security Notes
- Instance tokens are hashed (SHA-256) at rest. The Console never stores the raw token.
- Console → Instance calls use short-lived Zitadel JWTs. Instances should validate the JWT signature using Zitadel's JWKS endpoint and check token expiry.
- Webhook calls from Zitadel to Console are HMAC-SHA256 verified using
ZITADEL_WEBHOOK_SECRET. - All Console → Instance endpoints (
/internal/*) should only be accessible from the Console worker — consider network-level restrictions in addition to token validation. - The instance token is shown exactly once at creation. It cannot be retrieved. Rotate by creating a new instance record if compromised.

