DEPLOY-001: Deployment Runbook

Status: Draft — Pending Sprint 1 Completion
Owner: Development + CTO
Review Cycle: Per-Release
Last Updated: 2026-07-03


Overview

Dieses Runbook definiert den Deployment-Prozess für Cartly. Ziel: Zero-Downtime Deployments, automatisierte Rollbacks bei Fehlern.

Hinweis: Sprint 1 nutzt Docker + Railway (MVP). Konkrete Schritte werden nach Sprint-1-Abschluss finalisiert.


1. Deployment Environment

1.1 Environments

|| Environment | URL | Zweck | Branch |
|------------|-----|-------|--------|
| Production | https://app.cartly.io | Live-System | main |
| Staging | https://staging.cartly.io | Pre-Release Tests | staging |
| Preview | https://pr-N.cartly.io | PR-Vorschau | PR-Branch |
| Development | http://localhost:3000 | Lokale Entwicklung | dev/* |

1.2 Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Railway / Render                        │
│  ┌──────────────────┐     ┌──────────────────┐            │
│  │  web (Node/Fastify) │ │  worker (Queue) │             │
│  │  Port 3000        │     │  Port 3001       │            │
│  └────────┬─────────┘     └────────┬─────────┘            │
│           │                         │                       │
│           └──────────┬──────────────┘                       │
│                      ▼                                      │
│           ┌──────────────────┐                              │
│           │  PostgreSQL 16   │                              │
│           │  (Supabase/Neon)│                              │
│           └──────────────────┘                              │
│                      │                                      │
│           ┌──────────────────┐                              │
│           │  Redis (Upstash) │                              │
│           └──────────────────┘                              │
└─────────────────────────────────────────────────────────────┘

2. CI/CD Pipeline

2.1 GitHub Actions Workflow

# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main, staging]
  pull_request:
    types: [opened, synchronize]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # ─────────────────────────────────────────────────────────────
  test:
    name: Test Suite
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Type check
        run: npm run typecheck
      
      - name: Lint
        run: npm run lint
      
      - name: Unit tests
        run: npm run test:unit -- --coverage
      
      - name: E2E tests
        run: npm run test:e2e
        env:
          DATABASE_URL: ${{ secrets.TEST_DATABASE_URL }}

  # ─────────────────────────────────────────────────────────────
  build:
    name: Build Docker Image
    runs-on: ubuntu-latest
    needs: test
    if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/staging'
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Login to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=ref,event=branch
            type=sha,prefix={{branch}}-
            type=raw,value={{date 'YYYYMMDD'}}
      
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ─────────────────────────────────────────────────────────────
  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/staging'
    environment: staging
    steps:
      - name: Deploy to Railway
        run: |
          curl -X POST https://api.railway.app/v1/environments/${{ vars.RAILWAY_STAGING_ENV }}/deployments \
            -H "Authorization: Bearer ${{ secrets.RAILWAY_TOKEN }}" \
            -H "Content-Type: application/json" \
            -d '{"service": "${{ vars.RAILWAY_STAGING_SERVICE }}", "image": "${{ env.IMAGE_NAME }}:staging-${{ github.sha }}"}'

  # ─────────────────────────────────────────────────────────────
  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - name: Blue-Green Deploy on Railway
        run: |
          # Railway blue-green deployment
          railway up --environment production --service web
        env:
          RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}
      
      - name: Run database migrations
        run: |
          npx prisma migrate deploy
        env:
          DATABASE_URL: ${{ secrets.PRODUCTION_DATABASE_URL }}
      
      - name: Health check
        run: |
          sleep 10
          curl -f https://app.cartly.io/health || exit 1
      
      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v1
        with:
          channel-id: '#cartly-deploys'
          payload: |
            {
              "text": "${{ job.status }}: Deploy ${{ github.sha }} to production",
              "blocks": [{
                "type": "section",
                "text": {
                  "type": "mrkdwn",
                  "text": "*${{ job.status }}*: Deploy `${{ github.sha }}` to *production*\n<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>"
                }
              }]
            }
        env:
          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

3. Deployment Steps

3.1 Standard Deployment (Main Branch)

# 1. Merge PR to main
git checkout main
git pull origin main

# 2. CI/CD übernimmt automatisch:
#    a) Run Test Suite
#    b) Build Docker Image → GHCR
#    c) Run migrations
#    d) Deploy to Railway (Blue-Green)
#    e) Health Check
#    f) Notify Slack

# 3. Verify in production
curl -f https://app.cartly.io/health

3.2 Hotfix Deployment

# 1. Branch von main erstellen
git checkout -b hotfix/HF-XXX-short-description main

# 2. Fix implementieren
git commit -m "fix(HF-XXX): describe fix"

# 3. Push und PR erstellen
git push origin hotfix/HF-XXX-short-description
gh pr create --base main --title "Hotfix HF-XXX" --label hotfix

# 4. Reviewer benachrichtigen
gh pr review --approve
gh pr merge --squash

# 5. Pipeline startet automatisch
# Monitoring: gh run watch

3.3 Manual Deployment (Emergency)

# NUR wenn CI/CD nicht verfügbar

# 1. Login
docker login ghcr.io -u $GITHUB_ACTOR --password-stdin <<< $GITHUB_TOKEN

# 2. Image bauen
docker build -t ghcr.io/cartly/cartly:manual-$DATE .
docker push ghcr.io/cartly/cartly:manual-$DATE

# 3. Railway deploy
railway login
railway add --environment production
railway up --image ghcr.io/cartly/cartly:manual-$DATE

# 4. Health check
curl -f https://app.cartly.io/health

# 5. Bei Problemen: Railway Dashboard → Rollback

4. Zero-Downtime Deployment

4.1 Blue-Green Strategy (Railway)

Railway's built-in blue-green deployment ensures zero-downtime:

  1. Deploy: Neue Version wird parallel zur aktuellen gestartet
  2. Health Check: Railway prüft /health Endpoint
  3. Switch: Traffic wird atomar umgeschalten
  4. Cleanup: Alte Version wird erst nach erfolgreichem Switch gestoppt
Before Switch:
  [Green] v1.0.0 ←── Active
  [Blue]  v1.1.0 ←── Standby (testing)

After Switch:
  [Green] v1.0.0 (old)
  [Blue]  v1.1.0 ←── Active

4.2 Database Migrations (Zero-Downtime)

// Prisma Migration: Zero-Downtime Pattern
// Migrations müssen rückwärtskompatibel sein!

// 1. NEUE Spalte hinzufügen (nullable, mit Default)
await db.execute(sql`
  ALTER TABLE users 
  ADD COLUMN IF NOT EXISTS phone VARCHAR(20) DEFAULT NULL;
`);

// 2. Code deployen, der neue Spalte nutzt (aber auch ohne kann)

// 3. Backfill Daten (falls nötig)
await db.execute(sql`
  UPDATE users SET phone = email WHERE phone IS NULL;
`);

// 4. Spalte NOT NULL machen (nachdem alle Instanzen aktualisiert)
await db.execute(sql`
  ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
`);

5. Rollback Procedure

5.1 Automatic Rollback (CI/CD Failure)

# Im GitHub Action:
- name: Health check
  run: |
    for i in {1..5}; do
      if curl -f https://app.cartly.io/health; then
        echo "Health check passed"
        exit 0
      fi
      echo "Attempt $i failed, retrying..."
      sleep 5
    done
    echo "Health check failed after 5 attempts"
    exit 1

Bei failure: Railway behält automatisch die alte Version bei.

5.2 Manual Rollback

# Via Railway CLI
railway rollback --environment production

# Via Railway Dashboard
# 1. railway.app → Projekt → Production
# 2. deployments → letzte funktionierende Version → "Redeploy"

6. Environment Variables & Secrets

6.1 Required Secrets

Secret Beschreibung Wo konfiguriert
DATABASE_URL PostgreSQL Connection String Railway Env Vars
REDIS_URL Redis Connection String Railway Env Vars
JWT_SECRET JWT signing secret Railway Env Vars
OPENROUTER_API_KEY AI API Key Railway Env Vars
SENTRY_DSN Error Tracking Railway Env Vars
B2_ACCESS_KEY Backup Storage Railway Env Vars
B2_SECRET_KEY Backup Storage Railway Env Vars
SLACK_BOT_TOKEN Notifications GitHub Secrets

6.2 Secrets Rotation

# Secrets werden in Railway verwaltet
# Nach Änderung: railway redeploy --environment production

# WICHTIG: Nie Secrets in Code/Commit!
# Bei Leak: sofort rotieren + CTO informieren

7. Health Check Endpoints

7.1 Backend Health Check

// GET /health
{
  "status": "ok",
  "version": process.env.npm_package_version,
  "uptime": process.uptime(),
  "checks": {
    "database": "ok",
    "redis": "ok",
    "migrations": "ok"
  }
}

// GET /health/live (Kubernetes liveness probe)
{ "status": "ok" }

// GET /health/ready (Kubernetes readiness probe)
{
  "status": "ok",
  "checks": {
    "database": await db.$queryRaw`SELECT 1`,
    "redis": await redis.ping()
  }
}

7.2 Frontend Health

// NEXT.js: app/api/health/route.ts
export async function GET() {
  return Response.json({ 
    status: 'ok',
    env: process.env.NODE_ENV,
    timestamp: new Date().toISOString()
  });
}

8. Monitoring & Alerts

8.1 Deployment Metrics

Metric Threshold Alert
Deployment Duration > 10 min SEV-2
Health check failures > 3 SEV-2
Error rate post-deploy > 1% SEV-2
Memory usage post-deploy > 85% SEV-3

8.2 Post-Deploy Checklist

# Nach jedem Production Deploy:
□ 1. Health Check verifizieren (https://app.cartly.io/health)
□ 2. Sentry Dashboard prüfen (keine neuen Errors)
□ 3. Datenbank-Migration verifizieren (Tabellen, Indizes)
□ 4. Login-Flow testen (neue Session erstellen)
□ 5. Ein kritischer User-Flow (Checkout/Order) testen
□ 6. Slack #cartly-deploys: Deployment bestätigen

9. Todo / Open Items


10. Emergency Contacts

Situation Kontakt Reaktionszeit
Deployment fehlgeschlagen DEV (9f66dba7) Innerhalb 30min
Production Down DEV + CTO Sofort
Datenverlust DEV + CTO Sofort

Erstellt: 2026-07-03 von Documentation Agent (a66674bf)
Review: Nach Sprint-1-Abschluss durch CTO + DEV