Migration Jobs — Full Implementation Guide

Deep dive on every migration job: dump, import, clone, delete, verify, instance-replace. Covers data model, executor wiring, shared helpers, decision trees, and flow diagrams.

1. Overview

Migrations are async jobs queued via REST API or CLI, executed by a background worker.

Architecture:

mermaid

flowchart LR
    UI[Web UI / CLI] -->|POST migrations| H[Handler]
    H -->|insert Job| DB[(Postgres<br/>jobs/job_logs/job_artifacts)]
    W[WorkerMigrationRunner<br/>30s tick] -->|NextQueuedJob| DB
    W -->|dispatch| EX[Executor]
    EX -->|RunDump/Import/Clone/Delete/Verify| MG[internal/migrate]
    MG -->|read/write| MONGO[(MongoDB)]
    EX -->|logs/artifact| DB
    EX -->|notify| NOTIF[Notifications]

Job types (internal/model/migration.go):

Type	Purpose
`tenant_copy`	Direct instance-to-instance tenant copy
`tenant_clone`	Dump + import (within or across instance)
`tenant_dump`	Standalone tenant export → .zip
`tenant_import`	Archive → target instance
`tenant_delete`	Destructive remove + pre-delete safety dump
`instance_replace`	Full-instance migration

Status machine:

queued ──(worker pick)──▶ running ──(finish)──▶ succeeded
                              │
                              └──(error)────▶ failed

No job-level retry. Batch-level retry handled inside migrate package.

2. Data Model

Tables (internal/db/migrations/):

jobs — one row per job. Key fields: type, status, tenant_id, source_instance_id, target_instance_id, dry_run, progress, target_customer_id, target_tenant_name, clone_user_assignments, user_remap (jsonb), reuse_existing_users, dump_file_path, import_file_path, include_collections, exclude_collections, error_message, created_tenant_id, created_tenant_code.
job_logs — streamed log entries (level, message).
job_artifacts — final JSON reports (migration_report, dump_report, import_report, clone_report, delete_report).

Mongo tenant-scope shapes (three forms; filter must match all):

{ tenantId: "ABC1234" }                    // scalar (most collections)
{ tenantID: "ABC1234" }                    // scalar (legacy spelling)
{ tenantIDs: ["ABC1234", ...] }            // array (user multi-tenant)
{ byTenant: { ABC1234: {...} } }           // nested map (user, user-session)

Tenant-namespaced collection prefixes: custom_<code>_, x_<code>_, x_mt_<code>_, cx_s_<code>_. Entire collection belongs to one tenant.

3. Collection Classification

Every collection falls into one bucket. This drives every job's decision tree.

mermaid

flowchart TD
    C[Collection] --> A{system.*?}
    A -->|yes| SKIP[Skip always]
    A -->|no| B{Tenant-namespaced<br/>prefix matches code?}
    B -->|yes| NS[Treat as owned<br/>dump all / drop entire]
    B -->|no| D{In excluded list?<br/>appAudit, version-history,<br/>test, exto-modules-old}
    D -->|yes| SKIP
    D -->|no| E{Name in<br/>special list?}
    E -->|user| U[User path<br/>remap / reuse]
    E -->|user-session| US[Fresh _id,<br/>no idMap walk]
    E -->|customer| CUS[Rewrite name/code,<br/>fresh _id]
    E -->|other| GEN[Classified path<br/>idMap deep-walk,<br/>upsert by _id]

Skip-list for import (importSkipCollections): appAudit, user-session. Skip-list for classify + idMap (idClassifySkipList): user, user-session, customer.

4. Shared Helpers

File	What
classify.go	Pass-1 _id classification. Returns `remapSets` per collection.
idmap.go	`IDMap` interface. In-memory → bbolt spill at 500k entries. LRU 100k read cache.
stream.go	`forEachDoc` — streams `.jsonl` (Extended JSON) or `.bson` (mongodump) from zip.
rewrite.go	`rewriteDocObjectIDs` deep-walks all OIDs. `rewriteDocUserRefs` handles known email/OID paths + customFields.
tenantrefs.go	`rewriteTenantRefs`, `ensureTenantID` (backfill scalar), `grantReusedUserMembership` (atomic pipeline).
retry.go	`isTransientWriteErr`, `upsertBatchDocsWithRetry` — exp backoff 1s/3s/9s then split-batch recurse.
preflight.go	`PreflightTargetTenantUnique` — fails if code/name exists in target `customer`.
archive_users.go	Active-user extraction from archive/DB. `ApplyEffectiveEmail`.
collections.go	Named constants + namespace detection.
log.go	`StderrLogger` — `[HH:MM:SS] msg` format.
migrate.go	`CollectionFilter(name, code)` — the $or filter used everywhere.

5. Dump Job

Purpose: Export every tenant-owned doc + indexes from one Mongo DB into a single .zip.

Entry: migrate.RunDump(ctx, DumpConfig, logger) → DumpReport.

Flow

mermaid

flowchart TD
    S[Start] --> C[Connect Mongo]
    C --> L[List collections]
    L --> F{Filter: system?<br/>excluded?<br/>tenant-ns?}
    F -->|skip| NX[Next]
    F -->|include| DR{dry-run?}
    DR -->|yes| CNT[Count only]
    DR -->|no| Z[Open .zip entry<br/>dbname/coll.jsonl]
    Z --> ST[Stream cursor,<br/>batch 1000 docs]
    ST --> JSONL[Write Extended JSON<br/>one line per doc]
    JSONL --> IDX[List indexes,<br/>strip _id_/v/ns]
    IDX --> IX[Write<br/>dbname/coll.indexes.jsonl]
    IX --> NX
    NX --> MORE{more colls?}
    MORE -->|yes| F
    MORE -->|no| META[Write _metadata.json]
    META --> R[Build DumpReport]

Archive Layout

<tenant>.zip
├─ _metadata.json                       {tenantId, tenantCode, tenantName, dbName, format, exportedAt}
├─ <dbname>/<coll>.jsonl                one doc per line (Extended JSON)
└─ <dbname>/<coll>.indexes.jsonl        index specs, minus _id_, v, ns

Decision Tree — "include collection?"

system.* ────────────────────────────▶ skip
in dumpExcludedCollections ──────────▶ skip  (appAudit, version-history, test, exto-modules-old)
matches tenant-ns prefix ────────────▶ include (all docs)
IncludeCollections set & not in it ──▶ skip
ExcludeCollections set & in it ──────▶ skip
otherwise ───────────────────────────▶ include (filter by CollectionFilter)

Errors

Connection fail → abort.
Per-collection fail → logged, DumpReport.HadErrors=true, continue.

6. Import Job

Purpose: Take a dump archive, write into target Mongo DB as a new tenant with safe _id collision handling, tenant-code rewrite, and user identity reuse/remap.

Entry: migrate.RunImport(ctx, ImportConfig, logger) → MigrationReport.

Phases

mermaid

flowchart TD
    S[Start] --> P0[Phase 0: Preflight<br/>target tenant code/name unique]
    P0 --> P1[Phase 1: Open zip,<br/>read _metadata.json]
    P1 --> P2[Phase 2: Build coll list<br/>user first, rest alpha]
    P2 --> P3[Phase 3: Index preflight<br/>atomic then per-index fallback]
    P3 --> P4[Phase 4: Init IDMap<br/>bbolt spill 500k]
    P4 --> P5[Phase 5: Pre-scan target users<br/>email to OID map]
    P5 --> P6[Phase 6: Pass-1 CLASSIFY<br/>per coll, chunks of 10k _ids]
    P6 --> P7[Phase 7: Pass-2 IMPORT<br/>per coll, per-doc transform]
    P7 --> RPT[MigrationReport]

Pass 1 — Classify (`classify.go`)

For each non-skip collection, stream src _ids in 10k chunks; query target {_id:{$in:chunk}}:

mermaid

flowchart TD
    ID[src _id] --> Q{in target?}
    Q -->|no| C2[Case 2: keep _id<br/>no remap]
    Q -->|yes| T{same tenant?}
    T -->|yes| C1[Case 1: keep _id<br/>upsert idempotent]
    T -->|no| C3[Case 3: alloc new OID<br/>store src-hex to new-oid in idMap<br/>add to remapSet]

Pass 2 — Per-Collection Path Decision

mermaid

flowchart TD
    COL[Collection cf] --> S1{in importSkipCollections?<br/>appAudit, user-session}
    S1 -->|yes| SK[Skip]
    S1 -->|no| U1{name == user?}
    U1 -->|no| S2{name in<br/>idClassifySkipList?<br/>user-session, customer}
    U1 -->|yes| R1{ReuseExistingUsers<br/>or UserRemap set?}
    R1 -->|yes| PA[Path A: user reuse/remap]
    R1 -->|no| PB[Path B: skip-list]
    S2 -->|yes| PB
    S2 -->|no| PC[Path C: classified]

Path A — User with Reuse/Remap

mermaid

flowchart TD
    D[user doc] --> P[prepareCollectionDoc<br/>prune byTenant to target only]
    P --> RT[rewriteTenantRefs<br/>src code to tgt code]
    RT --> EM[Extract email lowercased]
    EM --> RM[ApplyEffectiveEmail<br/>explicit then default then src]
    RM --> EX{effective email<br/>in target users?}
    EX -->|yes| REU[action=reused<br/>grantReusedUserMembership<br/>atomic pipeline<br/>tenantIDs = existing minus src plus tgt<br/>populate maps, no insert]
    EX -->|no| INS{email changed?}
    INS -->|yes| IR[action=inserted_renamed<br/>new _id, set email]
    INS -->|no| II[action=inserted<br/>new _id]
    IR --> BA[Batch insert]
    II --> BA
    REU --> NX[Next doc]
    BA --> NX

grantReusedUserMembership pipeline:

[
  { $set: {
      tenantIDs: {
        $setUnion: [
          { $filter: { input: "$tenantIDs", cond: { $ne: ["$$this", srcCode] }}},
          [tgtCode]
        ]
      }
  }}
]

Path B — Skip-List (user-session, customer)

customer: overwrite name = TargetTenantName, code = TargetTenantCode.
Fresh _id via bson.NewObjectID().
rewriteDocUserRefs on known paths + customFields.
No idMap deep-walk.
InsertMany (not upsert — fresh ids can't collide).

Path C — Classified (default)

mermaid

flowchart TD
    D[doc] --> P[prepareCollectionDoc]
    P --> RT[rewriteTenantRefs]
    RT --> ET[ensureTenantID<br/>backfill scalar tenantId]
    ET --> ID{src _id in<br/>remapSet for coll?}
    ID -->|yes| SW[Swap: doc._id = idMap.Get hex]
    ID -->|no| KP[Keep src _id]
    SW --> RO[rewriteDocObjectIDs<br/>deep-walk, replace<br/>any OID in idMap]
    KP --> RO
    RO --> RU[rewriteDocUserRefs<br/>known paths + customFields]
    RU --> B[Batch]
    B --> F{batch full?}
    F -->|yes| UP[upsertBatchDocsWithRetry<br/>ReplaceOne upsert by _id]
    UP --> NX[Next doc]
    F -->|no| NX

Index Preflight

For each coll with .indexes.jsonl:
  parse specs → createIndexes atomic
    → success ✓
    → fail → per-index retry
      → unique-index fail → ABORT entire import (schema violation on target data)
      → non-unique fail → log, continue

Errors & Idempotency

Error	Behavior
target tenant code/name exists	abort (preflight)
unique-index build fails	abort
transient write (pipe/EOF)	retry 1s/3s/9s → split batch → single-doc recurse
per-coll stream error	log, `HadErrors=true`, continue
re-run of same archive	safe — upsert-by-_id is idempotent

7. Clone Job

Purpose: Copy tenant within or across instances. Thin wrapper: Preflight → Dump → Import.

Entry: migrate.RunClone(ctx, CloneConfig, logger) → MigrationReport.

mermaid

flowchart LR
    S[Start] --> PF[PreflightTargetTenantUnique<br/>target code/name free?]
    PF -->|fail| X[Abort]
    PF -->|ok| D[RunDump to temp zip<br/>progress 0 to 50 percent]
    D --> DR{dry-run?}
    DR -->|yes| DONE[Return, keep archive]
    DR -->|no| I[RunImport from temp zip<br/>progress 50 to 100 percent]
    I --> CLN[defer: delete temp zip]
    CLN --> DONE

Why Preflight first: Saves ~minutes of dump work if target already has collision.

ReuseExistingUsers matters here for same-DB clones (source and target in same Mongo) — without it, user docs would collide on email unique index.

8. Delete Job

Purpose: Remove all tenant data. Always preceded by safety dump (when not dry-run).

Entry: migrate.RunDelete(ctx, DeleteConfig, logger) → DeleteReport.

Flow

mermaid

flowchart TD
    S[Start] --> SD{dry-run?}
    SD -->|no| DUMP[Safety dump<br/>via runDumpForTenant]
    SD -->|yes| L[List collections]
    DUMP --> L
    L --> LOOP[For each coll]
    LOOP --> DEC{classify}
    DEC -->|system prefix| SK[Skip]
    DEC -->|excluded list| SK
    DEC -->|tenant-ns prefix| DROP[Drop collection]
    DEC -->|user / user-session| UPD[UpdateMany:<br/>$pull tenantIDs<br/>$unset byTenant.code<br/>then delete orphans]
    DEC -->|other| DEL{dry-run?}
    DEL -->|yes| CNT[CountDocuments]
    DEL -->|no| DM[DeleteMany CollectionFilter]
    DROP --> NX[Next]
    UPD --> NX
    SK --> NX
    CNT --> NX
    DM --> NX
    NX --> MORE{more?}
    MORE -->|yes| LOOP
    MORE -->|no| VER{Verify flag?}
    VER -->|yes| V[RunVerifyWithDB<br/>embed in report]
    VER -->|no| R[DeleteReport]
    V --> R

Orphan-User Cleanup

After $pull/$unset on user coll:

db.user.deleteMany({
  $and: [
    { $or: [{ tenantIDs: { $size: 0 } }, { tenantIDs: { $exists: false } }] },
    { username: { $ne: "dev" } }
  ]
})

dev account preserved (bootstrap).

Idempotency

$pull / $unset → no-op if already absent.
DeleteMany with filter → no-op if no matches.
Re-running delete is safe.

9. Verify Job

Purpose: Read-only scan — confirm no tenant data remains.

Entry: migrate.RunVerify or RunVerifyWithDB.

mermaid

flowchart TD
    S[Start] --> L[List collections]
    L --> LOOP[For each non-system coll]
    LOOP --> D{tenant-ns prefix?}
    D -->|yes| C1[Count all docs<br/>finding: coll should be dropped]
    D -->|no| C2[CountDocuments CollectionFilter]
    C1 --> CH{count non-zero?}
    C2 --> CH
    CH -->|yes| F[Add finding<br/>Passed=false]
    CH -->|no| NX[Next]
    F --> NX
    NX --> MORE{more?}
    MORE -->|yes| LOOP
    MORE -->|no| R[VerifyReport]

Passed = true only when zero findings.

10. Worker / Executor Wiring

Poller: internal/worker/migration_runner.go — 30s tick.

mermaid

flowchart TD
    T[Ticker 30s] --> N[repo.NextQueuedJob]
    N -->|none| TR[tracker.Record idle]
    N -->|job| RUN[status=running]
    RUN --> DP{job.Type}
    DP -->|tenant_copy| EC[executeTenantCopy]
    DP -->|tenant_dump| ED[executeTenantDump]
    DP -->|tenant_import| EI[executeTenantImport]
    DP -->|tenant_clone| ECL[executeTenantClone]
    DP -->|tenant_delete| EDL[executeTenantDelete]
    DP -->|instance_replace| ER[executeInstanceReplace]
    EC --> FIN[store artifact<br/>status succeeded/failed<br/>notify]
    ED --> FIN
    EI --> FIN
    ECL --> FIN
    EDL --> FIN
    ER --> FIN

Executor Responsibilities

executor_tenant_dump.go:

Resolve tenant + instance URI.
runDumpForTenant (shared with delete) → migrate.RunDump.
Set job.dump_file_path.

executor_tenant_import.go:

Resolve target URI + DB name.
Read archive _metadata.json.
preflightImportUserIDP — scan active users from archive, apply remap, verify each effective email has a Zitadel identity. Fail job if any missing.
tenantRepo.CreateMinimalTenant (unless dry-run).
migrate.RunImport.
On fail → delete the half-created tenant record.
userRepo.OnboardUsersToTenant — create Console membership rows.
pushTenantStatusToInstance — notify instance cache.

executor_tenant_clone.go: same as import but source reads live DB (not archive) for user scan, and uses tenantRepo.CloneTenantRecord.

executor_tenant_delete.go:

Pre-delete safety dump (via runDumpForTenant).
POST /internal/tenant-deprovision to instance API.
migrate.RunDelete.
userRepo.DeleteAssignmentsByTenant.
tenantRepo.SetStatus(archived).

IDP Preflight (internal/worker/idp_preflight.go)

mermaid

flowchart TD
    S[Archive / source DB] --> E[ReadActiveUserEmails<br/>status=1 only]
    E --> R[ApplyEffectiveEmail<br/>per remap config]
    R --> Q[Query Zitadel<br/>for each effective email]
    Q --> CH{all identities<br/>exist?}
    CH -->|no| FAIL[Fail job BEFORE<br/>creating tenant]
    CH -->|yes| OK[Proceed to import/clone]

Prevents orphaned tenant (half-created when users can't log in).

11. REST Endpoints

Handler: internal/handler/migrations/handler.go.

Method	Path	Creates	Notes
POST	`/api/v1/migrations`	tenant_copy	src ≠ tgt instance; ≤3 concurrent
POST	`/api/v1/migrations/tenant-clone`	tenant_clone	defaults tgt=src
POST	`/api/v1/migrations/tenant-delete`	tenant_delete	fails if tenant archived
POST	`/api/v1/migrations/tenant-dump`	tenant_dump
POST	`/api/v1/migrations/tenant-import`	tenant_import	multipart upload, saved to DumpDir
POST	`/api/v1/migrations/instance-replace`	instance_replace
GET	`/api/v1/jobs?status=`	—	list
GET	`/api/v1/jobs/:id`	—
GET	`/api/v1/jobs/:id/logs`	—	JobLog[]
GET	`/api/v1/jobs/:id/artifacts`	—	JobArtifact[]
POST	`/api/v1/jobs/:id/download-token`	—	60s, single-use
GET	`/api/v1/jobs/:id/download?token=`	—	serves dump .zip
GET	`/api/v1/instances/:id/jobs`	—	jobs where instance is src/tgt

12. CLI

Each CLI calls the same migrate.Run* function as the executor, so behavior is identical between manual and worker runs.

Binary	Flags (key)
`tenant-dump`	`--mongo-uri`, `--tenant-code`, `-o`, `--dry-run`, `-v`, `-r`
`tenant-import`	`-z`, `--mongo-uri`, `--tenant-code`, `--tenant-name`, `--batch-size`, `-m`, `--reuse-existing-users`, `--dry-run`, `-r`
`tenant-delete`	`--mongo-uri`, `--tenant-code` or `-f`, `--dry-run`, `--verify`, `-y`, `-r`
`tenant-verify`	`--mongo-uri`, `--tenant-code` or `-f`, `-r`

See tenant-cli.md and tenant-import-idempotent.md for examples.

13. Per-Document Transform Pipeline (Import Pass 2)

mermaid

flowchart TD
    D[Raw doc from zip] --> IS{tenant-namespaced<br/>collection?}
    IS -->|yes| SKIP_TR[Skip prepare/rewrite<br/>entire coll belongs to tenant]
    IS -->|no| PR[prepareCollectionDoc<br/>prune byTenant to tgt only]
    PR --> TR[rewriteTenantRefs<br/>src code to tgt code]
    SKIP_TR --> PATH{path?}
    TR --> PATH
    PATH -->|A user reuse| A_EM[email remap<br/>reuse decision]
    PATH -->|B skip-list| B_CUS[customer name/code<br/>user-session fresh _id]
    PATH -->|C classified| C_ID[ensureTenantID<br/>_id decision]
    C_ID --> C_OID[rewriteDocObjectIDs<br/>deep-walk idMap]
    C_OID --> C_US[rewriteDocUserRefs<br/>emailMap + userOIDMap]
    A_EM --> BAT[Batch]
    B_CUS --> BAT
    C_US --> BAT
    BAT --> FL{batch full?}
    FL -->|yes path A or B| INS[InsertMany]
    FL -->|yes path C| UPS[upsertBatchDocsWithRetry<br/>ReplaceOne upsert]
    FL -->|no| NXT[Next doc]
    INS --> NXT
    UPS --> NXT

User Ref Rewrite Scope (`rewriteDocUserRefs`)

Tier 1: known email paths (createdBy, updatedBy, + array variants, + nested array variants like checklist[].createdBy).
Tier 2: full deep-walk on customFields subtree.
Out of scope: free-text fields, partial emails, non-standard OID refs.

14. Retries, Batching, Memory

Knob	Value
Batch size	1000 docs (configurable)
Classify chunk	10 000 src _ids per `$in` query
IDMap in-mem threshold	500 000 entries → spill to bbolt
IDMap read cache	100 000 LRU on top of bolt
Write retry	transient only; 1s / 3s / 9s; then split batch; down to single doc
Wire limit	48 MB per Mongo request; split-batch handles overflow
Connections	per-operation open/close
Temp files	dumps → `DumpDir` or `/tmp/console-dumps`; clone → `/tmp/clone-<oid>.zip`, deleted on exit

Transient classification: broken pipe, conn reset, timeout, EOF. Context cancel is NOT transient (user abort stops everything).

15. Full Job Decision Tree (which job should I run?)

Goal ─▶ tenant data out to file ───────────────────────▶ DUMP
Goal ─▶ tenant data in from file ──────────────────────▶ IMPORT
Goal ─▶ same tenant exists twice (new code/name) ──────▶ CLONE (preflight+dump+import)
Goal ─▶ tenant gone, keep safety copy ─────────────────▶ DELETE (safety-dump+drop+$pull)
Goal ─▶ confirm previous delete left nothing ──────────▶ VERIFY
Goal ─▶ copy tenant instance→instance, one shot ───────▶ COPY (tenant_copy executor)
Goal ─▶ migrate whole instance ────────────────────────▶ INSTANCE_REPLACE

16. Error Scenario Summary

Scenario	Dump	Import	Clone	Delete	Verify
Connection fail	abort	abort	abort	abort	abort
Target code/name collision	n/a	abort preflight	abort preflight	n/a	n/a
Missing IDP identity (preflight)	n/a	abort before tenant create	abort	n/a	n/a
Per-collection error	log + continue	log + continue	log + continue	log + continue	log + finding
Unique-index build fails	n/a	abort	abort	n/a	n/a
Transient write error	n/a	retry + split	retry + split	retry + split	n/a
Dry-run	count only	preflight skipped, no writes	dump real, import skipped	count only	same (read-only)
Re-run after partial success	same output	idempotent (upsert)	idempotent via import	idempotent ($pull)	same

17. Key Files Index

Model: internal/model/migration.go
Migrate pkg: internal/migrate/
Executors: internal/worker/executor_tenant_*.go
Runner: internal/worker/migration_runner.go
IDP preflight: internal/worker/idp_preflight.go
REST: internal/handler/migrations/handler.go
CLI: cmd/tenant-dump/, cmd/tenant-import/, cmd/tenant-delete/, cmd/tenant-verify/
SQL: internal/db/migrations/
Existing docs: tenant-cli.md, tenant-import-idempotent.md

Migration Jobs — Full Implementation Guide ​

1. Overview ​

2. Data Model ​

3. Collection Classification ​

4. Shared Helpers ​

5. Dump Job ​

Flow ​

Archive Layout ​

Decision Tree — "include collection?" ​

Errors ​

6. Import Job ​

Phases ​

Pass 1 — Classify (classify.go) ​

Pass 2 — Per-Collection Path Decision ​

Path A — User with Reuse/Remap ​

Path B — Skip-List (user-session, customer) ​

Path C — Classified (default) ​

Index Preflight ​

Errors & Idempotency ​

7. Clone Job ​

8. Delete Job ​

Flow ​

Orphan-User Cleanup ​

Idempotency ​

9. Verify Job ​

10. Worker / Executor Wiring ​

Executor Responsibilities ​

IDP Preflight (internal/worker/idp_preflight.go) ​

11. REST Endpoints ​

12. CLI ​

13. Per-Document Transform Pipeline (Import Pass 2) ​

User Ref Rewrite Scope (rewriteDocUserRefs) ​

14. Retries, Batching, Memory ​

15. Full Job Decision Tree (which job should I run?) ​

16. Error Scenario Summary ​

17. Key Files Index ​

Migration Jobs — Full Implementation Guide

1. Overview

2. Data Model

3. Collection Classification

4. Shared Helpers

5. Dump Job

Flow

Archive Layout

Decision Tree — "include collection?"

Errors

6. Import Job

Phases

Pass 1 — Classify (`classify.go`)

Pass 2 — Per-Collection Path Decision

Path A — User with Reuse/Remap

Path B — Skip-List (user-session, customer)

Path C — Classified (default)

Index Preflight

Errors & Idempotency

7. Clone Job

8. Delete Job

Flow

Orphan-User Cleanup

Idempotency

9. Verify Job

10. Worker / Executor Wiring

Executor Responsibilities

IDP Preflight (internal/worker/idp_preflight.go)

11. REST Endpoints

12. CLI

13. Per-Document Transform Pipeline (Import Pass 2)

User Ref Rewrite Scope (`rewriteDocUserRefs`)

14. Retries, Batching, Memory

15. Full Job Decision Tree (which job should I run?)

16. Error Scenario Summary

17. Key Files Index