Operations handbook (weekly health + incident + upgrades)¶
This handbook defines the recurring operator workflow for running SDETKit as a platform.
1) Weekly health review¶
Cadence: every Wednesday.
Checklist:
- Review KPI board from the Top-tier program dashboard.
- Confirm each workstream has owner, status, and evidence links.
- Identify top 3 risks and assign mitigation owners.
- Publish a mid-week status note.
2) Incident triage flow¶
- Detect signal (failed release gate, doctor regression, schema drift).
- Assign severity via Support and escalation model.
- Start incident record and assign incident commander.
- Mitigate and communicate using severity update cadence.
- Close incident with root cause and follow-up tasks.
3) Upgrade planning lane¶
Run this monthly before release train updates:
- Review Policy compatibility matrix.
- Inventory legacy usage via migration tooling/docs.
- Run canonical path in staging repos.
- Publish upgrade plan (timeline, owners, rollback plan).
- Approve rollout wave sequence (startup -> scale -> regulated).
4) Release owner checklist¶
For every version cut:
- Validate gate fast and gate release artifacts are attached.
- Confirm doctor output is healthy.
- Confirm docs navigation reflects new/changed controls.
- Confirm compatibility/deprecation notes are updated.
- Confirm support/escalation notices are prepared if needed.
5) Evidence policy¶
Every operational claim should reference one of:
- canonical gate artifacts
- dashboard status entries
- incident records
- release notes with migration guidance