Status: Draft — Pending Sprint 1 Completion
Owner: Development + CTO
Review Cycle: Monthly after go-live
Last Updated: 2026-07-03
Dieses Runbook definiert das Backup- und Recovery-Verfahren für Cartlys PostgreSQL-Datenbank. Ziel: RTO < 1h, RPO < 24h für alle produktiven Datenbanken.
Hinweis: Sprint 1 nutzt PostgreSQL 16 via Supabase/Neon. Konkrete Provider-Details werden nach Sprint-1-Abschluss eingetragen.
| Typ | Häufigkeit | Retention | Storage | |
|---|---|---|---|---|
| Automated Daily | pg_dump (full) | Täglich 02:00 UTC | 30 Tage | Backblaze B2 |
| Point-in-Time (PITR) | WAL Archiving | Kontinuierlich | 7 Tage | Backblaze B2 |
| Weekly Full | pg_dump (full) | Sonntag 01:00 UTC | 12 Wochen | Backblaze B2 |
| Monthly Full | pg_dump (full) | Erster Montag/Monat 00:00 UTC | 12 Monate | Backblaze B2 |
| Pre-Migration | pg_dump (full) | Vor jedem Schema-Migration | 90 Tage | Backblaze B2 + S3 |
|| Umgebung | RTO Target | RPO Target | Priorität |
|---------|----------|----------|----------|
| Production | < 1 Stunde | < 24 Stunden | Critical |
| Staging | < 4 Stunden | < 1 Woche | High |
| Development | Kein SLA | Kein SLA | Low |
#!/bin/bash
# supabase-backup.sh — täglicher Full Backup
set -euo pipefail
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/tmp/backups"
BUCKET="cartly-db-backups"
RETENTION_DAYS=30
# Supabase CLI backup
supabase db dump --db-url "${SUPABASE_DB_URL}" \
--file "/tmp/cartly_backup_${DATE}.sql"
# Upload to B2
B2_ACCOUNT_ID="${B2_ACCOUNT_ID}"
B2_APP_KEY="${B2_APP_KEY}"
b2 upload-file "$BUCKET" \
"/tmp/cartly_backup_${DATE}.sql" \
"daily/cartly_backup_${DATE}.sql"
# Cleanup old local files
find /tmp/backups -name "cartly_backup_*.sql" -mtime +2 -delete
echo "[$(date)] Backup completed: cartly_backup_${DATE}.sql"
#!/bin/bash
# neon-backup.sh — täglicher Full Backup via Neon
set -euo pipefail
DATE=$(date +%Y%m%d_%H%M%S)
NEON_API_KEY="${NEON_API_KEY}"
NEON_PROJECT_ID="${NEON_PROJECT_ID}"
# Create Neon branch for backup
BRANCH=$(curl -sS -X POST \
"https://console.neon.tech/api/v2/projects/${NEON_PROJECT_ID}/branches" \
-H "Authorization: Bearer ${NEON_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"name": "backup-'${DATE}'"}' | jq -r '.branch.id')
# Wait for branch to be ready
sleep 15
# Export from branch
pg_dump "${BRANCH_CONNECTION_STRING}" | gzip > "/tmp/cartly_backup_${DATE}.sql.gz"
# Upload to B2
b2 upload-file "cartly-db-backups" \
"/tmp/cartly_backup_${DATE}.sql.gz" \
"daily/cartly_backup_${DATE}.sql.gz"
# Delete branch after backup
curl -sS -X DELETE \
"https://console.neon.tech/api/v2/branches/${BRANCH}" \
-H "Authorization: Bearer ${NEON_API_KEY}"
echo "[$(date)] Backup completed: cartly_backup_${DATE}.sql.gz"
# /etc/cron.d/cartly-backup
# Täglich um 02:00 UTC
0 2 * * * root /opt/cartly/scripts/supabase-backup.sh >> /var/log/cartly-backup.log 2>&1
# Wöchentlich (Sonntag)
0 1 * * 0 root /opt/cartly/scripts/weekly-full-backup.sh >> /var/log/cartly-backup.log 2>&1
#!/bin/bash
# restore-from-backup.sh
set -euo pipefail
DATE=$(date +%Y%m%d)
BACKUP_FILE="cartly_backup_${DATE}.sql"
BUCKET="cartly-db-backups"
# 1. Download latest backup from B2
echo "[$(date)] Downloading backup..."
b2 download-file-by-name "$BUCKET" "daily/${BACKUP_FILE}" /tmp/restore.sql
# 2. Drop existing connections
psql "${PRODUCTION_DB_URL}" -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'cartly';
"
# 3. Drop and recreate database
psql "${PRODUCTION_DB_URL}" -c "DROP DATABASE cartly;"
psql "${PRODUCTION_DB_URL}" -c "CREATE DATABASE cartly;"
# 4. Restore
echo "[$(date)] Restoring database..."
psql "${PRODUCTION_DB_URL}/cartly" < /tmp/restore.sql
# 5. Verify
psql "${PRODUCTION_DB_URL}/cartly" -c "SELECT COUNT(*) FROM users;"
echo "[$(date)] Restore completed successfully"
#!/bin/bash
# pitr-restore.sh — Restore zu einem bestimmten Zeitpunkt
set -euo pipefail
TARGET_TIME="2026-07-03 14:30:00"
BACKUP_DIR="/tmp/backups"
WAL_DIR="/tmp/wal"
# 1. Stop application
systemctl stop cartly
# 2. Restore from base backup
pg_restore -d "${PRODUCTION_DB_URL}/cartly" "${BACKUP_DIR}/base.tar" --clean
# 3. Create recovery.conf / postgresql.conf settings
#wal_level = replica
#archive_mode = on
#restore_command = 'b2 cat cartly-wal/%f > %p'
# 4. Start PostgreSQL in recovery mode
systemctl start postgresql
# 5. Verify data at target time
psql "${PRODUCTION_DB_URL}/cartly" -c "SELECT COUNT(*) FROM audit_log WHERE created_at < '${TARGET_TIME}';"
echo "[$(date)] PITR restore to ${TARGET_TIME} completed"
#!/bin/bash
# verify-backup.sh — monatlicher Restore-Test
set -euo pipefail
STAGING_DB_URL="${STAGING_DB_URL}"
LATEST_BACKUP=$(b2 list-file-names cartly-db-backups/daily | tail -1)
# 1. Download to staging
b2 download-file-by-name "cartly-db-backups" "daily/${LATEST_BACKUP}" /tmp/verify.sql
# 2. Restore to staging
psql "${STAGING_DB_URL}" -c "DROP DATABASE cartly_test;"
psql "${STAGING_DB_URL}" -c "CREATE DATABASE cartly_test;"
psql "${STAGING_DB_URL}/cartly_test" < /tmp/verify.sql
# 3. Verify tables and row counts
TABLES=("users" "companies" "stores" "products" "orders")
for table in "${TABLES[@]}"; do
COUNT=$(psql "${STAGING_DB_URL}/cartly_test" -t -c "SELECT COUNT(*) FROM ${table};")
echo "Table ${table}: ${COUNT} rows"
done
# 4. Spot-check critical data
psql "${STAGING_DB_URL}/cartly_test" -c "
SELECT id, email FROM users WHERE email LIKE '%@test.cartly%';
"
echo "[$(date)] Backup verification completed"
| Check | Threshold | Alert |
|---|---|---|
| Backup file size | < 1KB | SEV-1 Alert |
| Last successful backup | > 26h | SEV-2 Alert |
| Backup upload to B2 | > 1h | SEV-2 Alert |
| Restore test | Failure | SEV-2 Alert |
# healthchecks.io or similar
checks:
- name: daily-backup
url: https://hc-ping.com/XXXX/daily
schedule: "0 3 * * *"
timeout: 1h
expected: 200
alert:
- type: slack
channel: "#cartly-ops"
- type: pagerduty
severity: warning
gpg --symmetric --cipher-algo AES256 \
--batch --passphrase "${BACKUP_PASSPHRASE}" \
cartly_backup_${DATE}.sql
| Secret | Location | Rotation |
|---|---|---|
B2_ACCOUNT_ID |
Railway/Render Env Vars | Nach Incident |
B2_APP_KEY |
Railway/Render Env Vars | Quartalsweise |
SUPABASE_DB_URL |
Supabase Dashboard | Nach Incident |
BACKUP_PASSPHRASE |
Vault/Bitwarden | Jährlich |
| Situation | Kontakt | Reaktionszeit |
|---|---|---|
| Backup fehlgeschlagen | DEV (9f66dba7) | Innerhalb 1h |
| Restore benötigt | DEV (9f66dba7) | Sofort bei SEV-1 |
| B2 Zugang verloren | CTO (b999c0b2) | Innerhalb 4h |
Erstellt: 2026-07-03 von Documentation Agent (a66674bf)
Review: Nach Sprint-1-Abschluss durch CTO + DEV