Service level objectives (SLOs)
This page defines customer-facing reliability targets for the SREDSimplify production environment (prd). It is a living document: when architecture or traffic changes, update SLOs and linked runbooks together.
Scope
| Surface | Users | Notes |
|---|---|---|
| Web application (Next.js) | End customers and internal operators | Includes marketing pages and authenticated workspace |
| API (Spring Boot) | Web client and integrations | JWT on custom auth header per API contract |
| Python document service | Invoked from backend workflows | Long-running AI and document jobs |
Availability SLOs (draft)
These are targets until historical metrics back them; treat the percentages as design goals for alerting thresholds.
| Service | Monthly availability target | Measurement window |
|---|---|---|
| Web + API (synthetic or edge checks) | 99.5% | Rolling 30 days |
| Background document jobs | 99.0% | Job success rate over completed jobs |
Error budget (conceptual)
For a 99.5% monthly availability target, roughly 3.6 hours of combined outage budget exists per month. When burn is high:
- Triage with on-call or engineering lead.
- Open or update a tracking issue with customer impact.
- Link a postmortem if user-visible failure occurred (example).
Latency (draft)
| Path class | Target (p95) | Notes |
|---|---|---|
| Authenticated workspace shell | Under 2s TTFB at edge | Excludes long AI runs |
| Core REST mutations | Under 5s server-side | AI-heavy endpoints may use async patterns |
Document concrete probes and dashboards in your observability tool of choice; keep deep links out of this repo if they rotate frequently.
Dependencies that affect SLOs
- PostgreSQL — primary data store; see database runbooks from Architecture hub.
- Redis — quotas, auth tokens, rate limits; see Redis high memory.
- External LLM and document providers — third-party outages may consume error budget without a code defect.
Related reading
- Redis high memory runbook
- Login outage postmortem
- Backend deployment notes
- Tooling reference for CI/CD and release entry points