ADR-005: Monitoring & Observability Stack

Status: Accepted
Date: 2026-07-03
Deciders: CTO (b999c0b2), DEV (9f66dba7)


Context

Cartly muss produktionsreif überwacht werden. Bei Incidents muss das Team schnell root-cause finden können. Bei einem SaaS-Produkt mit Retail-Kunden sind uptime und correct data critical.

Anforderungen:

Decision

Stack:

Purpose Tool Tier Reason
Error Tracking Sentry Free → Pay Beste JS/Node Integration, Source Maps, DSGVO-konform
Uptime Monitoring Better Uptime Free 60 Checks/min, Status Page inklusive
Log Aggregation Self-hosted (Grafana Loki) Free DSGVO-konform (kein US-Cloud), skalierbar
Metrics Grafana + Prometheus Free Standard, gute Railway/Neon Integration
Status Page Better Uptime Public Status Free Kunden-facing, automatische Updates
Alerts PagerDuty / Slack Pay CI-Integration, Escalation Chains

Warum Sentry + Better Uptime:

Warum NICHT Datadog/New Relic:

Consequences

Positiv:

Negativ:

Monitoring Requirements (Cartly-spezifisch)

API:
  - Error Rate: > 1% → SEV-2 Alert
  - P99 Latency: > 2s → SEV-3 Alert
  - Health Check failures: > 3 → SEV-2 Alert

Database (Neon):
  - Connection Pool: > 80% utilized → SEV-3
  - Slow Queries: > 1s → SEV-3
  - Replication Lag: > 5s → SEV-2

Business:
  - Failed Payments: > 0 → SEV-1
  - Auth Failures (brute force): > 10/min → SEV-2
  - Data Export Requests: DSGVO-Compliance-Tracking

Related


Accepted: 2026-07-03