Adaptive Investigation Roadmap¶
Executive summary¶
SDETKit is evolving from a collection of strong diagnostic and maintenance tools into a connected deterministic repo investigator and, later, a guarded auto-fix system.
The product spine is:
detect → diagnose → recommend → plan → prove → classify → trend → candidate → probation → policy proposal → dry run → guarded PR auto-fix → remember outcome
The investigation spine is:
scan → narrow → reproduce → classify → recommend → verify → remember
The immediate goal is not to enable broad automation. The immediate goal is to make SDETKit better at understanding failures, explaining why they matter, recommending safe next actions, collecting proof, and remembering outcomes. Auto-fix should launch later, only for narrow mechanical classes with repeated proof and explicit policy gates.
The most important architecture rule is: do not build a second investigator brain. The new investigate surface should be a thin human-friendly front door over existing shared engines. The shared classification brain remains adaptive_diagnosis; maintenance, review, mission-control, boost/index, and forensics should call into it or contribute evidence to it.
Current foundation¶
SDETKit already has the right base pieces. The roadmap should connect and deepen these systems rather than replace them.
Existing intelligence and evidence layers¶
adaptive_diagnosisalready acts as the shared failure classification layer.maintenance_autopilotalready acts as the CI/autopilot caller and writes diagnosis, remediation, safe-fix, and learning artifacts.revieworchestrates doctor, inspect, readiness, comparison, probe planning, contradiction clustering, confidence scoring, and history-aware evidence.mission-controlbundles release evidence, gate/doctor/readiness steps, stdout/stderr artifacts, findings, next actions, Doctor Cortex, and run history.boostandindexscan repo shape, high-signal files, risk markers, hotspots, symbols, adaptive memory, and risk hygiene.forensicscompares run records and builds deterministic repro/evidence bundles.- PR quality comments and maintenance issue comments already publish operator-facing summaries.
Recent maintenance roadmap already merged¶
Recent PRs established a maintenance intelligence chain:
maintenance run
↓
maintenance priority rollup
↓
maintenance policy decisions
↓
policy decision history
↓
policy memory context
↓
adaptive maintenance recommendations
↓
recommendation eligibility diagnostics
↓
maintenance action plan
The current merged behavior is intentionally conservative:
diagnostic_only: true
automation_allowed: false
auto_fix_enabled: false
That is correct. The system can now recommend and plan, but it should not auto-fix until the proof, category, trend, candidate, probation, and policy proposal layers are mature.
Real workflow guide from PR #1155¶
A real product gap around async/client/helper envelope pagination parity showed what SDETKit should learn to do automatically:
scan → narrow → reproduce → classify → recommend → verify → remember
The manual process surfaced recurring failure classes that should become first-class diagnosis families:
- formatting drift from pre-commit / Ruff format
- Ruff fixable lint
- missing test dependencies
- Python runtime compatibility problems
- local WSL or
/mnt/cenvironment friction - broken test doubles
- missing public API parity
- git branch divergence
- remote branch drift after bot or remote updates
- product logic failures
- unknown review-required failures
Architecture roles¶
Keep responsibilities separate and composable.
| Component | Role | Should do | Should not do |
|---|---|---|---|
adaptive_diagnosis |
Shared failure-classification brain | Classify failure logs and structured evidence into deterministic diagnosis families | Own CI, comments, repo scanning, or remediation execution |
maintenance_autopilot |
CI/autopilot caller | Invoke checks, collect artifacts, call diagnosis/policy layers, optionally commit only approved safe fixes | Become the diagnosis brain |
review |
Evidence orchestration and decision layer | Combine doctor/inspect/readiness/probe evidence into review decisions | Duplicate boost/index/forensics classification logic |
mission-control |
Release evidence bundle | Package release confidence evidence and run history | Decide low-level fix policies |
boost / index |
Repo and surface scan | Produce repo shape, symbols, hotspots, risk markers, high-signal files | Emit final failure decisions alone |
forensics |
Compare/bundle/repro evidence | Compare runs, preserve logs, generate repro bundles | Choose remediation policy |
investigate |
Human-friendly front door | Wrap diagnosis, repo scan, surface scan, and evidence bundles into simple commands | Duplicate adaptive diagnosis, boost, index, review, or maintenance logic |
| safe-fix policy | Guardrail layer | Decide which diagnosis classes may become auto-fix candidates | Auto-fix product/API gaps or ambiguous failures |
| outcome memory | Learning loop | Record repeated signals, proof, fixes, failures, PR outcomes, time-to-green | Override policy gates without proof |
Product principles¶
- Classify before recommending. Every recommendation should be grounded in a diagnosis family or explicit unknown/review-required state.
- Recommend before fixing. Recommendations should explain what to do, why, and what proof is needed.
- Proof before candidates. An item should not become an auto-fix candidate until the proof checklist is satisfiable and history supports it.
- Candidates before policy. A candidate registry should exist before any auto-fix policy proposal.
- Policy before execution. Auto-fix should only run after an explicit policy PR or equivalent reviewed configuration change.
- Dry run before write. Planned changes should be visible before any modifying run.
- PR-only execution. Guarded auto-fix should open a PR, never push directly to
main. - Outcome memory closes the loop. Every attempt, success, failure, human edit, revert, and proof command should feed future diagnosis and recommendations.
Phase-by-phase roadmap¶
Baseline readiness — Upgrade the shared diagnosis brain¶
Goal¶
Teach adaptive_diagnosis the local investigation failure classes that appeared during real repo work, so later roadmap layers consume one shared classification model.
PR 1: Expand adaptive diagnosis for local investigation failures¶
Suggested branch:
feature/adaptive-diagnosis-local-investigation-failures
Suggested PR title:
Expand adaptive diagnosis for local investigation failures
Diagnosis families to add:
PRE_COMMIT_FORMAT_DRIFT
RUFF_FIXABLE_LINT
MISSING_TEST_DEPENDENCY
PYTHON_RUNTIME_COMPATIBILITY
LOCAL_ENVIRONMENT_FRICTION
BROKEN_TEST_DOUBLE
MISSING_PUBLIC_API_PARITY
GIT_BRANCH_DIVERGED
REMOTE_BRANCH_DRIFT
PRODUCT_LOGIC_FAILURE
UNKNOWN_REVIEW_REQUIRED
Representative signal mapping:
| Diagnosis | Example signals | Default route | Auto-fix candidate? |
|---|---|---|---|
PRE_COMMIT_FORMAT_DRIFT |
ruff-format, end-of-file-fixer, files were modified by this hook |
safe mechanical review or auto-fix candidate | yes, later |
RUFF_FIXABLE_LINT |
Ruff fixable lint output, --fix suggestion |
narrow mechanical candidate | yes, later |
MISSING_TEST_DEPENDENCY |
ModuleNotFoundError, missing hypothesis, missing yaml |
environment/dependency guidance | no |
PYTHON_RUNTIME_COMPATIBILITY |
ImportError: cannot import name 'UTC' from datetime |
compatibility PR | no |
LOCAL_ENVIRONMENT_FRICTION |
venv/pip hangs, slow paths under /mnt/c, WSL friction |
environment guidance | no |
BROKEN_TEST_DOUBLE |
TypeError: Resp() takes no arguments, broken mock/init |
review-first test fix | no by default |
MISSING_PUBLIC_API_PARITY |
async method missing while sync method exists, helper/API mismatch | product implementation | no |
GIT_BRANCH_DIVERGED |
push rejected, fetch first, non-fast-forward | command guidance | no |
REMOTE_BRANCH_DRIFT |
local branch behind PR branch after bot/remote update | sync guidance | no |
PRODUCT_LOGIC_FAILURE |
deterministic assertion failure in product behavior | review-first product fix | no |
UNKNOWN_REVIEW_REQUIRED |
no confident class | review-first | no |
Output contract:
{
"schema_version": "sdetkit.adaptive.diagnosis.v2",
"classification": "MISSING_PUBLIC_API_PARITY",
"confidence": "high",
"product_logic_likely": true,
"test_bug_likely": false,
"environment_likely": false,
"git_workflow_likely": false,
"formatting_likely": false,
"safe_to_auto_fix": false,
"requires_human_review": true,
"summary": "Missing async public API parity detected.",
"why_it_matters": "The async client lacks a public method available on the sync client.",
"next_action": "Add async parity and focused helper-level coverage.",
"proof_commands": [
"PYTHONPATH=src python -m pytest -q tests/test_netclient_envelope_parity.py"
],
"memory_lookup_key": "diagnosis:MISSING_PUBLIC_API_PARITY:netclient"
}
Tests:
PYTHONPATH=src python -m pytest -q tests/test_adaptive_diagnosis.py
python -m pre_commit run -a
./scripts/pr_preflight.sh
Acceptance criteria:
- Each new class has a deterministic test fixture.
- Unknown cases fall back to
UNKNOWN_REVIEW_REQUIRED. - Only narrow mechanical classes can return
safe_to_auto_fix: true. - Product/API/test/runtime/dependency/git-drift classes remain review-first.
Release readiness — Align maintenance action categories with diagnosis classes¶
Goal¶
Let maintenance action plans consume richer diagnosis categories instead of inventing separate category logic.
PR 2: Add maintenance action categories using diagnosis classes¶
Suggested branch:
feature/maintenance-action-diagnosis-categories
Suggested PR title:
Classify maintenance actions with adaptive diagnosis classes
Outputs:
artifacts/maintenance-action-categories.json
artifacts/maintenance-action-categories.md
JSON contract:
{
"schema_version": "sdetkit.maintenance.action_categories.v1",
"diagnostic_only": true,
"automation_allowed": false,
"category_count": 10,
"counts_by_category": {
"formatting": 1,
"tests": 1,
"security": 1,
"workflow_hygiene": 1
},
"items": [
{
"rank": 1,
"signal": "Run ruff check",
"memory_lookup_key": "maintenance-action:lint_check:ruff-check",
"diagnosis_class": "RUFF_FIXABLE_LINT",
"category": "lint",
"risk_level": "low",
"safe_fix_route": "candidate_later",
"review_required": true,
"reason": "Ruff lint may be mechanically fixable, but policy proof is still required."
}
]
}
Markdown contract:
# Maintenance action categories
- diagnostic only: **True**
- automation allowed: **False**
- categories: **N**
## Category mix
| Category | Count | Safe-fix route |
|---|---:|---|
## Classified actions
| Rank | Category | Diagnosis | Risk | Signal | Route |
|---:|---|---|---|---|---|
Tests:
PYTHONPATH=src python -m pytest -q \
tests/test_maintenance_action_categories.py \
tests/test_maintenance_on_demand_action_categories_workflow.py
./scripts/pr_preflight.sh
Acceptance criteria:
- Uses
adaptive_diagnosisclasses where possible. - Does not enable auto-fix.
- Uploads JSON/Markdown artifacts.
- Adds an issue-comment section after action plan and before lower-level recommendation detail.
Platform readiness — Add proof checklist¶
Goal¶
Turn action-plan items into explicit evidence requirements. This is the bridge between “recommend” and “can progress.”
PR 3: Publish maintenance proof checklist¶
Suggested branch:
feature/maintenance-proof-checklist
Suggested PR title:
Publish maintenance proof checklist
Outputs:
artifacts/maintenance-proof-checklist.json
artifacts/maintenance-proof-checklist.md
JSON contract:
{
"schema_version": "sdetkit.maintenance.proof_checklist.v1",
"diagnostic_only": true,
"automation_allowed": false,
"proof_item_count": 10,
"complete_count": 0,
"missing_count": 10,
"items": [
{
"rank": 8,
"signal": "Run pytest -q",
"memory_lookup_key": "maintenance-action:tests_check:run-tests",
"diagnosis_class": "PRODUCT_LOGIC_FAILURE",
"required_proof": "Attach passing pytest output.",
"proof_status": "missing",
"proof_commands": [
"python -m pytest -q"
],
"required_artifacts": [
"pytest output"
],
"can_progress_to_candidate": false,
"blocking_reason": "Review proof has not been attached."
}
]
}
Markdown contract:
# Maintenance proof checklist
- diagnostic only: **True**
- proof items: **N**
- missing proof: **N**
## Proof checklist
| Rank | Signal | Diagnosis | Proof status | Required proof | Can progress |
|---:|---|---|---|---|---|
Tests:
PYTHONPATH=src python -m pytest -q \
tests/test_maintenance_proof_checklist.py \
tests/test_maintenance_on_demand_proof_checklist_workflow.py
./scripts/pr_preflight.sh
Acceptance criteria:
- Every action-plan item gets a proof row.
- Missing proof blocks candidate progression.
- Mechanical classes still require repeated/history evidence before auto-fix policy changes.
Operational readiness — Add signal trends¶
Goal¶
Use memory/history to distinguish one-off signals from repeated signals and prior successful fixes.
PR 4: Publish maintenance signal trend summary¶
Suggested branch:
feature/maintenance-signal-trends
Suggested PR title:
Publish maintenance signal trend summary
Outputs:
artifacts/maintenance-signal-trends.json
artifacts/maintenance-signal-trends.md
JSON contract:
{
"schema_version": "sdetkit.maintenance.signal_trends.v1",
"diagnostic_only": true,
"automation_allowed": false,
"signals": [
{
"memory_lookup_key": "maintenance-action:lint_check:ruff-check",
"signal": "Run ruff check",
"diagnosis_class": "RUFF_FIXABLE_LINT",
"seen_count": 4,
"recent_count": 2,
"safe_fix_attempts": 1,
"safe_fix_successes": 1,
"trend": "recurring",
"trend_confidence": "medium",
"recommendation_impact": "candidate_later"
}
]
}
Tests:
PYTHONPATH=src python -m pytest -q \
tests/test_maintenance_signal_trends.py \
tests/test_maintenance_on_demand_signal_trends_workflow.py
./scripts/pr_preflight.sh
Acceptance criteria:
- Uses policy decision history and memory context.
- Repeated signals are visible.
- Trends affect recommendations only diagnostically at this stage.
Adoption readiness — Add human-friendly investigation front door¶
Goal¶
Expose shared adaptive diagnosis directly to humans through a thin command surface.
PR 5: Add failure investigation command¶
Suggested branch:
feature/investigate-failure-command
Suggested PR title:
Add failure investigation command
Commands:
python -m sdetkit investigate failure --log quality.log --format json
python -m sdetkit investigate failure --log quality.log --format markdown
Output fields:
classification
confidence
likely type
recommended next command
proof commands
safe-fix eligibility
memory lookup key
Example Markdown:
# Failure investigation
- classification: **MISSING_PUBLIC_API_PARITY**
- confidence: **high**
- likely type: **product/API gap**
- safe to auto-fix: **False**
- requires human review: **True**
## Why
The log shows an AttributeError for a missing async method while the sync method exists.
## Next action
Add async parity and helper-level coverage, then run the focused test slice.
## Proof commands
```bash
PYTHONPATH=src python -m pytest -q tests/test_netclient_envelope_parity.py
Tests:
```bash
PYTHONPATH=src python -m pytest -q \
tests/test_investigate_failure.py \
tests/test_adaptive_diagnosis.py
python -m pre_commit run -a
Acceptance criteria:
- Calls
adaptive_diagnosis; does not duplicate classification logic. - Supports JSON and Markdown.
- Exits nonzero only for malformed inputs, not for diagnosed failures.
Scale readiness — Add repository investigation summary¶
Goal¶
Use boost and index evidence to choose where to investigate next.
PR 6: Add repository investigation summary¶
Suggested branch:
feature/investigate-repo-summary
Suggested PR title:
Add repository investigation summary
Command:
python -m sdetkit investigate repo --root . --format json
JSON contract:
{
"schema_version": "sdetkit.investigate.repo.v1",
"repo_shape": {
"source_files": 328,
"test_files": 591,
"workflow_files": 18
},
"top_surfaces": [
{
"name": "netclient",
"production_files": [
"src/sdetkit/netclient.py"
],
"test_files": [
"tests/test_netclient.py"
],
"reason": "bounded HTTP client surface with sync/async/API/CLI parity risk",
"recommended_next_probe": "investigate surface --surface netclient"
}
]
}
Tests:
PYTHONPATH=src python -m pytest -q tests/test_investigate_repo.py
./scripts/pr_preflight.sh
Acceptance criteria:
- Wraps or consumes boost/index outputs.
- Produces concise operator-facing surface choices.
- Does not replace boost or index.
Phase 7 — Add focused surface investigation¶
Goal¶
Automate the manual narrowing work performed for netclient and similar surfaces.
PR 7: Add focused surface investigation¶
Suggested branch:
feature/investigate-surface
Suggested PR title:
Add focused surface investigation
Command:
python -m sdetkit investigate surface --root . --surface netclient --format json
JSON contract:
{
"schema_version": "sdetkit.investigate.surface.v1",
"surface": "netclient",
"production_files": [
"src/sdetkit/netclient.py",
"src/sdetkit/apiclient.py",
"src/sdetkit/apiget.py"
],
"test_files": [
"tests/test_netclient.py"
],
"public_symbols": [
"SdetHttpClient.get_json_list_paginated_envelope",
"SdetAsyncHttpClient.get_json_list_paginated"
],
"parity_risks": [
{
"kind": "sync_async_method_gap",
"sync_symbol": "get_json_list_paginated_envelope",
"async_symbol": "get_json_list_paginated_envelope",
"status": "missing"
}
],
"recommended_probe": "write focused parity repro"
}
Tests:
PYTHONPATH=src python -m pytest -q tests/test_investigate_surface.py
./scripts/pr_preflight.sh
Phase 8 — Add deterministic parity detectors¶
Goal¶
Catch sync/async/helper/CLI/public-mode parity gaps before humans manually find them.
PR 8: Detect public API parity gaps¶
Suggested branch:
feature/public-api-parity-detectors
Suggested PR title:
Detect public API parity gaps
Detector families:
SYNC_ASYNC_METHOD_GAP
SYNC_ASYNC_HELPER_GAP
CLI_BACKEND_PARITY_GAP
PUBLIC_MODE_UNTESTED
JSON contract:
{
"schema_version": "sdetkit.investigate.parity.v1",
"surface": "netclient",
"findings": [
{
"kind": "SYNC_ASYNC_METHOD_GAP",
"severity": "warning",
"sync_symbol": "SdetHttpClient.get_json_list_paginated_envelope",
"async_symbol": "SdetAsyncHttpClient.get_json_list_paginated_envelope",
"status": "missing",
"recommended_test": "focused sync/async parity test"
}
]
}
Tests:
PYTHONPATH=src python -m pytest -q tests/test_public_api_parity_detectors.py
./scripts/pr_preflight.sh
Acceptance criteria:
- Deterministic AST-based checks.
- No import-time side effects.
- Detects known PR #1155-style gap from a fixture.
Phase 9 — Generate investigation evidence bundles¶
Goal¶
Write durable evidence artifacts for candidate freeze, audit result, proof commands, and investigation JSON.
PR 9: Write investigation candidate evidence¶
Suggested branch:
feature/investigation-evidence-writer
Suggested PR title:
Write investigation candidate evidence
Command:
python -m sdetkit investigate evidence \
--classification MISSING_PUBLIC_API_PARITY \
--surface netclient \
--out-dir build/investigate/netclient
Generated files:
build/investigate/netclient/CANDIDATE_FREEZE.md
build/investigate/netclient/AUDIT_RESULT.md
build/investigate/netclient/proof-commands.md
build/investigate/netclient/investigation.json
Tests:
PYTHONPATH=src python -m pytest -q tests/test_investigate_evidence.py
./scripts/pr_preflight.sh
Baseline readiness0 — Route investigation diagnoses through safe-fix policy¶
Goal¶
Connect diagnosis classes to safe-fix eligibility without enabling broad automation.
PR 10: Route investigation diagnoses through safe-fix policy¶
Suggested branch:
feature/investigation-safe-fix-policy-routing
Suggested PR title:
Route investigation diagnoses through safe-fix policy
Policy matrix:
| Diagnosis | Auto-fix? | Route |
|---|---|---|
PRE_COMMIT_FORMAT_DRIFT |
yes, later | safe mechanical |
RUFF_FIXABLE_LINT |
yes, narrow, later | safe mechanical |
GIT_BRANCH_DIVERGED |
no | command guidance |
REMOTE_BRANCH_DRIFT |
no | sync guidance |
MISSING_TEST_DEPENDENCY |
no | environment guidance |
PYTHON_RUNTIME_COMPATIBILITY |
no | compatibility PR |
LOCAL_ENVIRONMENT_FRICTION |
no | local environment guidance |
BROKEN_TEST_DOUBLE |
no by default | review-first test fix |
MISSING_PUBLIC_API_PARITY |
no | product implementation |
PRODUCT_LOGIC_FAILURE |
no | review-first product fix |
UNKNOWN_REVIEW_REQUIRED |
no | review-first |
Tests:
PYTHONPATH=src python -m pytest -q tests/test_investigation_safe_fix_policy.py
./scripts/pr_preflight.sh
Acceptance criteria:
- Broad diagnosis does not imply broad auto-fix.
- Only mechanical classes can be candidates.
- Candidate still requires history/proof/policy before execution.
Baseline readiness1 — Publish investigation summaries in PR comments¶
Goal¶
When CI fails, PR comments should show classification, confidence, safe-fix status, next proof command, and memory context.
PR 11: Publish investigation summaries for PR failures¶
Suggested branch:
feature/pr-investigation-summaries
Suggested PR title:
Publish investigation summaries for PR failures
Comment section example:
### Failure investigation
- classification: **PRE_COMMIT_FORMAT_DRIFT**
- confidence: **high**
- safe-fix status: **candidate later**
- next command: `python -m pre_commit run -a`
- memory: seen 2 times, fixed manually 2 times
Tests:
PYTHONPATH=src python -m pytest -q tests/test_pr_investigation_summary_workflow.py
./scripts/pr_preflight.sh
Baseline readiness2 — Remember investigation outcomes¶
Goal¶
Turn investigation outputs into durable memory that improves future recommendations, eligibility, action plans, safe-fix candidates, and risk scoring.
PR 12: Record investigation outcome memory¶
Suggested branch:
feature/investigation-outcome-memory
Suggested PR title:
Record investigation outcome memory
Memory fields:
{
"schema_version": "sdetkit.investigation.outcome_memory.v1",
"records": [
{
"classification": "PRE_COMMIT_FORMAT_DRIFT",
"surface": "tests",
"affected_files": ["tests/test_example.py"],
"proof_command": "python -m pre_commit run -a",
"safe_fix_outcome": "manual_success",
"manual_fix_outcome": "merged",
"pr_number": 1152,
"merged": true,
"time_to_green_seconds": 420
}
]
}
Tests:
PYTHONPATH=src python -m pytest -q tests/test_investigation_outcome_memory.py
./scripts/pr_preflight.sh
Baseline readiness3 — Safe-fix candidate registry¶
Goal¶
Publish candidate status for classes that may eventually become automatable.
PR 13: Publish safe-fix candidate registry¶
Suggested branch:
feature/safe-fix-candidate-registry
Suggested PR title:
Publish safe-fix candidate registry
JSON contract:
{
"schema_version": "sdetkit.safe_fix.candidates.v1",
"automation_allowed": false,
"candidates": [
{
"candidate_key": "diagnosis:PRE_COMMIT_FORMAT_DRIFT",
"category": "formatting",
"risk_level": "low",
"required_history_count": 3,
"required_success_count": 3,
"allowed_commands": ["python -m pre_commit run -a"],
"forbidden_paths": [".github/workflows"],
"rollback_required": true,
"current_status": "OBSERVE_MORE"
}
]
}
Baseline readiness4 — Auto-fix probation report¶
Goal¶
Decide which candidates are not ready, need more observation, are blocked, or are ready for a policy PR.
PR 14: Publish auto-fix probation report¶
Suggested branch:
feature/auto-fix-probation-report
Suggested PR title:
Publish auto-fix probation report
Statuses:
NOT_READY
OBSERVE_MORE
READY_FOR_POLICY_PR
BLOCKED
Baseline readiness5 — Policy proposal generator¶
Goal¶
Generate proposed policy changes when proof exists. Do not execute them.
PR 15: Publish maintenance policy proposals¶
Suggested branch:
feature/maintenance-policy-proposals
Suggested PR title:
Publish maintenance policy proposals
Example output:
# Maintenance policy proposal
## Proposal
Allow Ruff format drift to be fixed automatically in PR-only mode.
## Why
- 5 repeated reviewed successes
- no human edits after auto-format
- preflight passed every time
## Scope
- tests only
- no workflow files
- no security-sensitive files
## Required checks
- `python -m pre_commit run -a`
- `./scripts/pr_preflight.sh`
## Rollback
Required.
Baseline readiness6 — Auto-fix dry-run planner¶
Goal¶
Show exact planned changes without modifying files.
PR 16: Publish auto-fix dry-run plan¶
Suggested branch:
feature/auto-fix-dry-run-plan
Suggested PR title:
Publish auto-fix dry-run plan
Safety:
no file writes
no commits
no PR creation
no allowlist expansion
Baseline readiness7 — Guarded PR-only auto-fix¶
Goal¶
Enable auto-fix only for approved safe mechanical classes and only through PRs.
PR 17: Enable guarded PR-only auto-fix¶
Suggested branch:
feature/guarded-pr-auto-fix
Suggested PR title:
Enable guarded PR-only auto-fix
Rules:
never push directly to main
only approved candidate classes
only allowed commands
only allowed paths
must show diff
must run preflight
must attach proof
must open PR
must record outcome
Baseline readiness8 — Auto-fix outcome memory¶
Goal¶
Record every guarded auto-fix attempt, success, failure, revert, human edit, and check outcome.
PR 18: Record auto-fix outcome memory¶
Suggested branch:
feature/auto-fix-outcome-memory
Suggested PR title:
Record auto-fix outcome memory
Memory fields:
{
"schema_version": "sdetkit.auto_fix.outcome_memory.v1",
"records": [
{
"candidate_key": "diagnosis:PRE_COMMIT_FORMAT_DRIFT",
"attempted": true,
"succeeded": true,
"failed": false,
"reverted": false,
"human_edited": false,
"checks_passed": true,
"checks_failed": false,
"pr_number": 1200
}
]
}
Command surfaces to add¶
Add commands gradually, only when the underlying shared modules exist.
python -m sdetkit investigate failure --log quality.log --format json
python -m sdetkit investigate failure --log quality.log --format markdown
python -m sdetkit investigate repo --root . --format json
python -m sdetkit investigate surface --root . --surface netclient --format json
python -m sdetkit investigate evidence --classification MISSING_PUBLIC_API_PARITY --surface netclient --out-dir build/investigate/netclient
Potential package entrypoint later:
sdetkit investigate failure --log quality.log --format markdown
sdetkit investigate repo --root . --format markdown
sdetkit investigate surface --root . --surface netclient --format markdown
JSON and Markdown output conventions¶
Every roadmap artifact should follow these rules:
JSON conventions¶
- Include
schema_version. - Include
diagnostic_onlywhen artifact is not allowed to mutate behavior. - Include
automation_allowedwhen relevant. - Include stable keys for memory lookup.
- Use deterministic sorting for counts/maps.
- Include enough fields for downstream consumers.
- Avoid hidden behavior only present in code.
Markdown conventions¶
- Start with a clear H1.
- Show safety state near the top.
- Include counts before details.
- Use compact tables for operator scanning.
- Include “What to do next” or equivalent.
- Include proof commands when actionable.
- Keep issue-comment sections truncation-safe.
Safety boundaries¶
Strict safety rules for the whole roadmap:
Product/API gaps stay review-first.
Broken test doubles stay review-first by default.
Runtime compatibility issues stay review-first.
Missing dependencies stay guidance/review-first.
Git branch drift stays command guidance.
Security findings stay review-first.
Unknown classifications stay review-first.
Auto-fix must never push directly to main.
Auto-fix must never broaden policy implicitly.
Auto-fix must never run without proof, policy, allowed commands, allowed paths, and PR-only guardrails.
What stays diagnostic-only¶
These layers should remain diagnostic-only unless a later policy PR explicitly changes behavior:
- adaptive diagnosis classification
- failure investigation command
- repo investigation summary
- surface investigation summary
- parity detectors
- maintenance action categories
- proof checklist
- signal trends
- candidate registry
- probation report
- policy proposal generator
- dry-run planner
- PR/CI investigation summaries
- outcome memory recording
What can become safe mechanical auto-fix later¶
Only narrow mechanical classes can become candidates, and only after proof/history/policy gates:
| Class | Candidate? | Notes |
|---|---|---|
PRE_COMMIT_FORMAT_DRIFT |
yes | Only through pre-commit/formatters, PR-only, allowed paths |
RUFF_FIXABLE_LINT |
yes, narrow | Only approved Ruff fixable rules, PR-only, allowed paths |
GIT_BRANCH_DIVERGED |
no | Command guidance only |
REMOTE_BRANCH_DRIFT |
no | Sync guidance only |
MISSING_TEST_DEPENDENCY |
no | Dependency/environment guidance or explicit PR |
PYTHON_RUNTIME_COMPATIBILITY |
no | Compatibility PR required |
LOCAL_ENVIRONMENT_FRICTION |
no | Local guidance only |
BROKEN_TEST_DOUBLE |
no by default | Test behavior can be semantically wrong |
MISSING_PUBLIC_API_PARITY |
no | Product implementation required |
PRODUCT_LOGIC_FAILURE |
no | Product review required |
UNKNOWN_REVIEW_REQUIRED |
no | No automation until classified |
How memory/history feeds recommendations¶
Memory should become the connective tissue of the system.
Inputs to remember¶
- diagnosis class
- source surface
- affected files
- failure log hash or signature
- proof command
- proof result
- PR number
- merged/not merged
- checks passed/failed
- whether human edited the fix
- whether auto-fix was reverted
- time to green
- recurrence count
- last seen timestamp
Consumers of memory¶
- recommendations
- priority rollups
- eligibility diagnostics
- action plans
- proof checklists
- category classifier
- signal trends
- candidate registry
- probation report
- policy proposals
- PR quality comments
- mission-control release bundles
- surface risk scoring
Memory-driven promotion path¶
first observation
↓
review required
↓
proof attached
↓
repeated successful evidence
↓
candidate later
↓
probation
↓
ready for policy PR
↓
dry run
↓
guarded PR-only auto-fix
↓
outcome memory
First 5 PRs to execute¶
1. Expand adaptive diagnosis for local investigation failures¶
Branch:
feature/adaptive-diagnosis-local-investigation-failures
Title:
Expand adaptive diagnosis for local investigation failures
Why first:
- It upgrades the shared brain.
- It captures the real manual workflow.
- It prevents action categories/proof checklists from inventing separate classification logic.
2. Classify maintenance actions with adaptive diagnosis classes¶
Branch:
feature/maintenance-action-diagnosis-categories
Title:
Classify maintenance actions with adaptive diagnosis classes
Why second:
- Maintenance action plans need diagnosis classes.
- Later auto-fix safety depends on category.
3. Publish maintenance proof checklist¶
Branch:
feature/maintenance-proof-checklist
Title:
Publish maintenance proof checklist
Why third:
- Every future candidate needs explicit proof requirements.
4. Publish maintenance signal trend summary¶
Branch:
feature/maintenance-signal-trends
Title:
Publish maintenance signal trend summary
Why fourth:
- The system needs repeated-history strength before candidate/probation logic.
5. Add failure investigation command¶
Branch:
feature/investigate-failure-command
Title:
Add failure investigation command
Why fifth:
- Once the shared diagnosis brain is richer, expose it as a human-friendly command.
GitHub Project board mapping¶
Use GitHub Project #2 as the execution board for this roadmap.
Recommended project name:
SDETKit Adaptive Investigation Roadmap
Recommended views¶
| View | Purpose | Group/sort by |
|---|---|---|
| Roadmap | Main execution board | Status |
| Phases | See roadmap progress by phase | Phase |
| Safety lane | Separate diagnostic-only work from future auto-fix work | Safety Route |
| PR queue | Track the next small PRs to execute | Priority, Status |
| Automation ladder | Track candidate/probation/policy/dry-run/auto-fix maturity | Safety Route |
Recommended fields¶
| Field | Values |
|---|---|
| Status | Backlog, Ready, In Progress, In Review, Merged, Blocked, Later |
| Phase | Baseline readiness through Baseline readiness8 |
| Safety Route | Diagnostic Only, Review First, Safe Mechanical Candidate, Probation, Policy Proposal, Guarded PR Auto-Fix |
| Priority | P0, P1, P2 |
| Artifact Type | JSON, Markdown, CLI, Workflow, Memory, Policy, Tests |
| Depends On | Linked issue or PR |
| Proof Status | Missing, Partial, Complete, Not Required |
| Automation Status | Not Eligible, Observe More, Candidate Later, Ready for Policy PR, Blocked |
| Owner | Maintainer or automation lane owner |
Suggested initial project issues¶
| Order | Issue title | Phase | Safety Route |
|---|---|---|---|
| 1 | Expand adaptive diagnosis for local investigation failures | Baseline readiness | Diagnostic Only |
| 2 | Classify maintenance actions with adaptive diagnosis classes | Release readiness | Diagnostic Only |
| 3 | Publish maintenance proof checklist | Platform readiness | Diagnostic Only |
| 4 | Publish maintenance signal trend summary | Operational readiness | Diagnostic Only |
| 5 | Add failure investigation command | Adoption readiness | Diagnostic Only |
| 6 | Add repository investigation summary | Scale readiness | Diagnostic Only |
| 7 | Add focused surface investigation | Phase 7 | Diagnostic Only |
| 8 | Detect public API parity gaps | Phase 8 | Review First |
| 9 | Write investigation candidate evidence | Phase 9 | Diagnostic Only |
| 10 | Route investigation diagnoses through safe-fix policy | Baseline readiness0 | Review First |
| 11 | Publish investigation summaries for PR failures | Baseline readiness1 | Diagnostic Only |
| 12 | Record investigation outcome memory | Baseline readiness2 | Diagnostic Only |
| 13 | Publish safe-fix candidate registry | Baseline readiness3 | Safe Mechanical Candidate |
| 14 | Publish auto-fix probation report | Baseline readiness4 | Probation |
| 15 | Publish maintenance policy proposals | Baseline readiness5 | Policy Proposal |
| 16 | Publish auto-fix dry-run plan | Baseline readiness6 | Policy Proposal |
| 17 | Enable guarded PR-only auto-fix | Baseline readiness7 | Guarded PR Auto-Fix |
| 18 | Record auto-fix outcome memory | Baseline readiness8 | Guarded PR Auto-Fix |
Board execution rules¶
- Keep only one or two P0 items in progress at a time.
- Do not move an item to
Readyunless its dependency issue or PR is merged. - Do not move any auto-fix item past
Probationunless proof, trend, candidate registry, and policy proposal artifacts exist. - Keep Baseline readiness7 as a major milestone, not a near-term task. It should stay blocked until Phases 1-16 are stable.
- Every issue should include the safety route and whether behavior is diagnostic-only, review-first, or an approved mechanical candidate.
Definition of done for every roadmap PR¶
Every PR in this roadmap should meet this checklist before merge.
Required for every PR¶
- PR scope is one small roadmap step.
- PR body explains why this step exists in the product spine.
- JSON schema is added or updated if the PR emits machine-readable output.
- Markdown output is added if the PR is operator-facing.
- Deterministic tests are added for every new diagnosis, policy, artifact, or command path.
- Existing behavior remains diagnostic-only unless the phase explicitly changes that.
- No auto-fix behavior is enabled unless the PR is a later policy-approved auto-fix phase.
- Rollback plan is included in the PR body.
- ./scripts/pr_preflight.sh passes.
Extra requirements for workflow PRs¶
- Workflow YAML is validated by pre-commit.
- Artifact upload paths are deterministic.
- Reporting steps use safe failure behavior when they must not create or mask CI failures.
- PR or maintenance comments stay truncation-safe.
- Comment sections are ordered so high-level diagnosis appears before low-level details.
Extra requirements for diagnosis/classification PRs¶
- Every new diagnosis class has at least one positive fixture.
- Unknown or ambiguous logs fall back to UNKNOWN_REVIEW_REQUIRED.
- Confidence is deterministic and explainable.
- Product/API/test/runtime/dependency/security/git-drift classes remain review-first.
- Only approved mechanical classes may set safe-to-auto-fix signals.
Extra requirements for memory/history PRs¶
- Records have stable IDs or stable memory lookup keys.
- Appends are idempotent where possible.
- Repeated-signal rollups are deterministic.
- Missing or malformed optional history files degrade safely.
- Memory never overrides safety policy without explicit proof and policy gates.
Extra requirements for safe-fix or auto-fix PRs¶
- The PR states the exact allowed commands.
- The PR states the exact allowed file/path scope.
- The PR states forbidden paths.
- The PR requires proof commands.
- The PR includes rollback behavior.
- The PR never pushes directly to main.
- The PR never mutates fork PRs.
- The PR records outcome memory.
Major milestone gate for guarded auto-fix¶
Baseline readiness7, guarded PR-only auto-fix, is a major milestone and must not start until the earlier investigation, proof, trend, candidate, probation, policy proposal, and dry-run layers are stable.
Before Baseline readiness7 starts, the project should have:
- Stable diagnosis classes for local investigation failures.
- Maintenance actions classified by diagnosis.
- Proof checklist artifacts.
- Signal trend artifacts.
- Investigation CLI front door.
- Repo and surface investigation summaries.
- Parity detectors.
- Investigation evidence writer.
- Safe-fix policy routing.
- PR/CI investigation summaries.
- Investigation outcome memory.
- Safe-fix candidate registry.
- Auto-fix probation report.
- Maintenance policy proposal generator.
- Auto-fix dry-run planner.
Only after those layers are proven should guarded PR-only auto-fix move from Blocked to Ready.
Final roadmap line¶
detect → diagnose → recommend → plan → prove → classify → trend → candidate → probation → policy proposal → dry run → guarded PR auto-fix → remember outcome
This is the product direction. Every PR should either strengthen one stage of this path or improve the evidence flow between stages.