DB-001: Database Backup Runbook

Status: Draft — Pending Sprint 1 Completion
Owner: Development + CTO
Review Cycle: Monthly after go-live
Last Updated: 2026-07-03


Overview

Dieses Runbook definiert das Backup- und Recovery-Verfahren für Cartlys PostgreSQL-Datenbank. Ziel: RTO < 1h, RPO < 24h für alle produktiven Datenbanken.

Hinweis: Sprint 1 nutzt PostgreSQL 16 via Supabase/Neon. Konkrete Provider-Details werden nach Sprint-1-Abschluss eingetragen.


1. Backup Strategy

1.1 Backup Tiers

Typ Häufigkeit Retention Storage
Automated Daily pg_dump (full) Täglich 02:00 UTC 30 Tage Backblaze B2
Point-in-Time (PITR) WAL Archiving Kontinuierlich 7 Tage Backblaze B2
Weekly Full pg_dump (full) Sonntag 01:00 UTC 12 Wochen Backblaze B2
Monthly Full pg_dump (full) Erster Montag/Monat 00:00 UTC 12 Monate Backblaze B2
Pre-Migration pg_dump (full) Vor jedem Schema-Migration 90 Tage Backblaze B2 + S3

1.2 RTO / RPO Targets

|| Umgebung | RTO Target | RPO Target | Priorität |
|---------|----------|----------|----------|
| Production | < 1 Stunde | < 24 Stunden | Critical |
| Staging | < 4 Stunden | < 1 Woche | High |
| Development | Kein SLA | Kein SLA | Low |


2. Automated Backup Procedure

2.1 PostgreSQL via Supabase

#!/bin/bash
# supabase-backup.sh — täglicher Full Backup
set -euo pipefail

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/tmp/backups"
BUCKET="cartly-db-backups"
RETENTION_DAYS=30

# Supabase CLI backup
supabase db dump --db-url "${SUPABASE_DB_URL}" \
  --file "/tmp/cartly_backup_${DATE}.sql"

# Upload to B2
B2_ACCOUNT_ID="${B2_ACCOUNT_ID}"
B2_APP_KEY="${B2_APP_KEY}"
b2 upload-file "$BUCKET" \
  "/tmp/cartly_backup_${DATE}.sql" \
  "daily/cartly_backup_${DATE}.sql"

# Cleanup old local files
find /tmp/backups -name "cartly_backup_*.sql" -mtime +2 -delete

echo "[$(date)] Backup completed: cartly_backup_${DATE}.sql"

2.2 PostgreSQL via Neon

#!/bin/bash
# neon-backup.sh — täglicher Full Backup via Neon
set -euo pipefail

DATE=$(date +%Y%m%d_%H%M%S)
NEON_API_KEY="${NEON_API_KEY}"
NEON_PROJECT_ID="${NEON_PROJECT_ID}"

# Create Neon branch for backup
BRANCH=$(curl -sS -X POST \
  "https://console.neon.tech/api/v2/projects/${NEON_PROJECT_ID}/branches" \
  -H "Authorization: Bearer ${NEON_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"name": "backup-'${DATE}'"}' | jq -r '.branch.id')

# Wait for branch to be ready
sleep 15

# Export from branch
pg_dump "${BRANCH_CONNECTION_STRING}" | gzip > "/tmp/cartly_backup_${DATE}.sql.gz"

# Upload to B2
b2 upload-file "cartly-db-backups" \
  "/tmp/cartly_backup_${DATE}.sql.gz" \
  "daily/cartly_backup_${DATE}.sql.gz"

# Delete branch after backup
curl -sS -X DELETE \
  "https://console.neon.tech/api/v2/branches/${BRANCH}" \
  -H "Authorization: Bearer ${NEON_API_KEY}"

echo "[$(date)] Backup completed: cartly_backup_${DATE}.sql.gz"

2.3 Cron Schedule

# /etc/cron.d/cartly-backup
# Täglich um 02:00 UTC
0 2 * * * root /opt/cartly/scripts/supabase-backup.sh >> /var/log/cartly-backup.log 2>&1

# Wöchentlich (Sonntag)
0 1 * * 0 root /opt/cartly/scripts/weekly-full-backup.sh >> /var/log/cartly-backup.log 2>&1

3. Restore Procedure

3.1 Full Restore from Latest Daily Backup

#!/bin/bash
# restore-from-backup.sh
set -euo pipefail

DATE=$(date +%Y%m%d)
BACKUP_FILE="cartly_backup_${DATE}.sql"
BUCKET="cartly-db-backups"

# 1. Download latest backup from B2
echo "[$(date)] Downloading backup..."
b2 download-file-by-name "$BUCKET" "daily/${BACKUP_FILE}" /tmp/restore.sql

# 2. Drop existing connections
psql "${PRODUCTION_DB_URL}" -c "
  SELECT pg_terminate_backend(pid)
  FROM pg_stat_activity
  WHERE datname = 'cartly';
"

# 3. Drop and recreate database
psql "${PRODUCTION_DB_URL}" -c "DROP DATABASE cartly;"
psql "${PRODUCTION_DB_URL}" -c "CREATE DATABASE cartly;"

# 4. Restore
echo "[$(date)] Restoring database..."
psql "${PRODUCTION_DB_URL}/cartly" < /tmp/restore.sql

# 5. Verify
psql "${PRODUCTION_DB_URL}/cartly" -c "SELECT COUNT(*) FROM users;"

echo "[$(date)] Restore completed successfully"

3.2 Point-in-Time Recovery (PITR)

#!/bin/bash
# pitr-restore.sh — Restore zu einem bestimmten Zeitpunkt
set -euo pipefail

TARGET_TIME="2026-07-03 14:30:00"
BACKUP_DIR="/tmp/backups"
WAL_DIR="/tmp/wal"

# 1. Stop application
systemctl stop cartly

# 2. Restore from base backup
pg_restore -d "${PRODUCTION_DB_URL}/cartly" "${BACKUP_DIR}/base.tar" --clean

# 3. Create recovery.conf / postgresql.conf settings
#wal_level = replica
#archive_mode = on
#restore_command = 'b2 cat cartly-wal/%f > %p'

# 4. Start PostgreSQL in recovery mode
systemctl start postgresql

# 5. Verify data at target time
psql "${PRODUCTION_DB_URL}/cartly" -c "SELECT COUNT(*) FROM audit_log WHERE created_at < '${TARGET_TIME}';"

echo "[$(date)] PITR restore to ${TARGET_TIME} completed"

4. Backup Verification

4.1 Restore Test Procedure (monatlich)

#!/bin/bash
# verify-backup.sh — monatlicher Restore-Test
set -euo pipefail

STAGING_DB_URL="${STAGING_DB_URL}"
LATEST_BACKUP=$(b2 list-file-names cartly-db-backups/daily | tail -1)

# 1. Download to staging
b2 download-file-by-name "cartly-db-backups" "daily/${LATEST_BACKUP}" /tmp/verify.sql

# 2. Restore to staging
psql "${STAGING_DB_URL}" -c "DROP DATABASE cartly_test;"
psql "${STAGING_DB_URL}" -c "CREATE DATABASE cartly_test;"
psql "${STAGING_DB_URL}/cartly_test" < /tmp/verify.sql

# 3. Verify tables and row counts
TABLES=("users" "companies" "stores" "products" "orders")
for table in "${TABLES[@]}"; do
  COUNT=$(psql "${STAGING_DB_URL}/cartly_test" -t -c "SELECT COUNT(*) FROM ${table};")
  echo "Table ${table}: ${COUNT} rows"
done

# 4. Spot-check critical data
psql "${STAGING_DB_URL}/cartly_test" -c "
  SELECT id, email FROM users WHERE email LIKE '%@test.cartly%';
"

echo "[$(date)] Backup verification completed"

5. Monitoring & Alerts

5.1 Backup Health Checks

Check Threshold Alert
Backup file size < 1KB SEV-1 Alert
Last successful backup > 26h SEV-2 Alert
Backup upload to B2 > 1h SEV-2 Alert
Restore test Failure SEV-2 Alert

5.2 Monitoring Implementation

# healthchecks.io or similar
checks:
  - name: daily-backup
    url: https://hc-ping.com/XXXX/daily
    schedule: "0 3 * * *"
    timeout: 1h
    expected: 200
    alert:
      - type: slack
        channel: "#cartly-ops"
      - type: pagerduty
        severity: warning

6. Backup Encryption


7. Secrets & Credentials

Secret Location Rotation
B2_ACCOUNT_ID Railway/Render Env Vars Nach Incident
B2_APP_KEY Railway/Render Env Vars Quartalsweise
SUPABASE_DB_URL Supabase Dashboard Nach Incident
BACKUP_PASSPHRASE Vault/Bitwarden Jährlich

8. Emergency Contacts

Situation Kontakt Reaktionszeit
Backup fehlgeschlagen DEV (9f66dba7) Innerhalb 1h
Restore benötigt DEV (9f66dba7) Sofort bei SEV-1
B2 Zugang verloren CTO (b999c0b2) Innerhalb 4h

9. Todo / Open Items


Erstellt: 2026-07-03 von Documentation Agent (a66674bf)
Review: Nach Sprint-1-Abschluss durch CTO + DEV