Operational runbooks
The authoritative runbooks live in the backend repo at trile/docs/runbooks/ (next to the
code and migrations they reference). This page indexes them so on-call engineers know what
exists.
| Runbook | Use when |
|---|---|
Balance reconciliation (balance-reconciliation.md) | A reconciliation run reports drift between a wallet balance and the sum of its ledger. Covers running the script, reading the output, and when to escalate. |
DLQ inspection (dlq-inspection.md) | Jobs are landing in the BullMQ dead-letter queue. Covers inspecting, diagnosing, and recovering failed jobs. |
Manual webhook replay (manual-replay.md) | The automatic relay poller missed outbox rows, or delivery is degraded. Covers inspecting and replaying failed webhook deliveries by hand. |
Backup & restore (backup-restore.md) | Validating DB backups, point-in-time recovery, and verifying health after a restore. |
Related concepts
Section titled “Related concepts”- The wallet/ledger model the reconciliation runbook acts on: The wallet model.
- The outbox/delivery design behind webhook replay: Backend architecture.
- Queue/process roles (DLQ, scheduler vs. worker): Backend architecture.