Skip to content

Runbook: Python Worker Degraded

Trigger: AI runs fail, indexing stalls, or the Python worker health endpoints degrade. Impact: PRECHECK, T661, LOGBOOK, and file indexing workflows may fail or stall.

Health checks

Check the worker health endpoints first:

bash
curl -fsS http://localhost:7002/api/v1/health/
curl -fsS http://localhost:7002/api/v1/health/readiness
curl -fsS http://localhost:7002/api/v1/health/liveness

If the environment runs the worker on 8000, replace the port accordingly.

Common failure classes

Java cannot reach Python

Check:

  • PYTHON_SERVICE_BASE_URL in the backend
  • network reachability from backend to worker
  • whether the worker process is actually listening on the expected port

Internal token mismatch

Symptoms often show up as backend-side call failures even when both services are healthy.

Check:

  • INTERNAL_API_TOKEN on both sides
  • whether recent secret rotation updated both services consistently

Readiness fails because dependencies are missing

Check:

  • OPENAI_API_KEY
  • REDIS_URL
  • PGVECTOR_CONNECTION_STRING and related settings when RAG_ENABLED=true
  • R2 settings if indexing or download URL flows are failing

Long-running run failures

Check:

  • backend-side Python client timeouts
  • worker logs around task execution
  • whether selected files are actually READY

Related references:

Indexing-specific checks

If uploads succeed but files never become READY:

  1. Confirm the upload was acknowledged through the confirm endpoint.
  2. Check file indexStatus values in the application.
  3. Check worker readiness and any RAG prerequisites.
  4. Inspect whether the issue is broad or isolated to a specific file/content type.

Escalation

If customer-visible run execution is degraded for more than 5 minutes:

  1. Page the backend or AI workflow owner
  2. Capture sample failing session IDs and file IDs
  3. Record whether the failure is health, auth, timeout, or indexing related

Post-action

  1. Update the relevant API reference page if a new operational caveat was discovered.
  2. Add a postmortem if the incident was customer-visible.