Adaptive Investigation Roadmap¶

Executive summary¶

SDETKit is evolving from a collection of strong diagnostic and maintenance tools into a connected deterministic repo investigator and, later, a guarded auto-fix system.

The product spine is:

detect → diagnose → recommend → plan → prove → classify → trend → candidate → probation → policy proposal → dry run → guarded PR auto-fix → remember outcome

The investigation spine is:

scan → narrow → reproduce → classify → recommend → verify → remember

The immediate goal is not to enable broad automation. The immediate goal is to make SDETKit better at understanding failures, explaining why they matter, recommending safe next actions, collecting proof, and remembering outcomes. Auto-fix should launch later, only for narrow mechanical classes with repeated proof and explicit policy gates.

The most important architecture rule is: do not build a second investigator brain. The new investigate surface should be a thin human-friendly front door over existing shared engines. The shared classification brain remains adaptive_diagnosis; maintenance, review, mission-control, boost/index, and forensics should call into it or contribute evidence to it.

Current foundation¶

SDETKit already has the right base pieces. The roadmap should connect and deepen these systems rather than replace them.

Existing intelligence and evidence layers¶

adaptive_diagnosis already acts as the shared failure classification layer.
maintenance_autopilot already acts as the CI/autopilot caller and writes diagnosis, remediation, safe-fix, and learning artifacts.
review orchestrates doctor, inspect, readiness, comparison, probe planning, contradiction clustering, confidence scoring, and history-aware evidence.
mission-control bundles release evidence, gate/doctor/readiness steps, stdout/stderr artifacts, findings, next actions, Doctor Cortex, and run history.
boost and index scan repo shape, high-signal files, risk markers, hotspots, symbols, adaptive memory, and risk hygiene.
forensics compares run records and builds deterministic repro/evidence bundles.
PR quality comments and maintenance issue comments already publish operator-facing summaries.

Recent maintenance roadmap already merged¶

Recent PRs established a maintenance intelligence chain:

maintenance run
  ↓
maintenance priority rollup
  ↓
maintenance policy decisions
  ↓
policy decision history
  ↓
policy memory context
  ↓
adaptive maintenance recommendations
  ↓
recommendation eligibility diagnostics
  ↓
maintenance action plan

The current merged behavior is intentionally conservative:

diagnostic_only: true
automation_allowed: false
auto_fix_enabled: false

That is correct. The system can now recommend and plan, but it should not auto-fix until the proof, category, trend, candidate, probation, and policy proposal layers are mature.

Real workflow guide from PR #1155¶

A real product gap around async/client/helper envelope pagination parity showed what SDETKit should learn to do automatically:

scan → narrow → reproduce → classify → recommend → verify → remember

The manual process surfaced recurring failure classes that should become first-class diagnosis families:

formatting drift from pre-commit / Ruff format
Ruff fixable lint
missing test dependencies
Python runtime compatibility problems
local WSL or /mnt/c environment friction
broken test doubles
missing public API parity
git branch divergence
remote branch drift after bot or remote updates
product logic failures
unknown review-required failures

Architecture roles¶

Keep responsibilities separate and composable.

Component	Role	Should do	Should not do
`adaptive_diagnosis`	Shared failure-classification brain	Classify failure logs and structured evidence into deterministic diagnosis families	Own CI, comments, repo scanning, or remediation execution
`maintenance_autopilot`	CI/autopilot caller	Invoke checks, collect artifacts, call diagnosis/policy layers, optionally commit only approved safe fixes	Become the diagnosis brain
`review`	Evidence orchestration and decision layer	Combine doctor/inspect/readiness/probe evidence into review decisions	Duplicate boost/index/forensics classification logic
`mission-control`	Release evidence bundle	Package release confidence evidence and run history	Decide low-level fix policies
`boost` / `index`	Repo and surface scan	Produce repo shape, symbols, hotspots, risk markers, high-signal files	Emit final failure decisions alone
`forensics`	Compare/bundle/repro evidence	Compare runs, preserve logs, generate repro bundles	Choose remediation policy
`investigate`	Human-friendly front door	Wrap diagnosis, repo scan, surface scan, and evidence bundles into simple commands	Duplicate adaptive diagnosis, boost, index, review, or maintenance logic
safe-fix policy	Guardrail layer	Decide which diagnosis classes may become auto-fix candidates	Auto-fix product/API gaps or ambiguous failures
outcome memory	Learning loop	Record repeated signals, proof, fixes, failures, PR outcomes, time-to-green	Override policy gates without proof

Product principles¶

Classify before recommending. Every recommendation should be grounded in a diagnosis family or explicit unknown/review-required state.
Recommend before fixing. Recommendations should explain what to do, why, and what proof is needed.
Proof before candidates. An item should not become an auto-fix candidate until the proof checklist is satisfiable and history supports it.
Candidates before policy. A candidate registry should exist before any auto-fix policy proposal.
Policy before execution. Auto-fix should only run after an explicit policy PR or equivalent reviewed configuration change.
Dry run before write. Planned changes should be visible before any modifying run.
PR-only execution. Guarded auto-fix should open a PR, never push directly to main.
Outcome memory closes the loop. Every attempt, success, failure, human edit, revert, and proof command should feed future diagnosis and recommendations.

Phase-by-phase roadmap¶

Baseline readiness — Upgrade the shared diagnosis brain¶

Goal¶

Teach adaptive_diagnosis the local investigation failure classes that appeared during real repo work, so later roadmap layers consume one shared classification model.

PR 1: Expand adaptive diagnosis for local investigation failures¶

Suggested branch:

feature/adaptive-diagnosis-local-investigation-failures

Suggested PR title:

Expand adaptive diagnosis for local investigation failures

Diagnosis families to add:

PRE_COMMIT_FORMAT_DRIFT
RUFF_FIXABLE_LINT
MISSING_TEST_DEPENDENCY
PYTHON_RUNTIME_COMPATIBILITY
LOCAL_ENVIRONMENT_FRICTION
BROKEN_TEST_DOUBLE
MISSING_PUBLIC_API_PARITY
GIT_BRANCH_DIVERGED
REMOTE_BRANCH_DRIFT
PRODUCT_LOGIC_FAILURE
UNKNOWN_REVIEW_REQUIRED

Representative signal mapping:

Diagnosis	Example signals	Default route	Auto-fix candidate?
`PRE_COMMIT_FORMAT_DRIFT`	`ruff-format`, `end-of-file-fixer`, `files were modified by this hook`	safe mechanical review or auto-fix candidate	yes, later
`RUFF_FIXABLE_LINT`	Ruff fixable lint output, `--fix` suggestion	narrow mechanical candidate	yes, later
`MISSING_TEST_DEPENDENCY`	`ModuleNotFoundError`, missing `hypothesis`, missing `yaml`	environment/dependency guidance	no
`PYTHON_RUNTIME_COMPATIBILITY`	`ImportError: cannot import name 'UTC' from datetime`	compatibility PR	no
`LOCAL_ENVIRONMENT_FRICTION`	venv/pip hangs, slow paths under `/mnt/c`, WSL friction	environment guidance	no
`BROKEN_TEST_DOUBLE`	`TypeError: Resp() takes no arguments`, broken mock/init	review-first test fix	no by default
`MISSING_PUBLIC_API_PARITY`	async method missing while sync method exists, helper/API mismatch	product implementation	no
`GIT_BRANCH_DIVERGED`	push rejected, fetch first, non-fast-forward	command guidance	no
`REMOTE_BRANCH_DRIFT`	local branch behind PR branch after bot/remote update	sync guidance	no
`PRODUCT_LOGIC_FAILURE`	deterministic assertion failure in product behavior	review-first product fix	no
`UNKNOWN_REVIEW_REQUIRED`	no confident class	review-first	no

Output contract:

{
  "schema_version": "sdetkit.adaptive.diagnosis.v2",
  "classification": "MISSING_PUBLIC_API_PARITY",
  "confidence": "high",
  "product_logic_likely": true,
  "test_bug_likely": false,
  "environment_likely": false,
  "git_workflow_likely": false,
  "formatting_likely": false,
  "safe_to_auto_fix": false,
  "requires_human_review": true,
  "summary": "Missing async public API parity detected.",
  "why_it_matters": "The async client lacks a public method available on the sync client.",
  "next_action": "Add async parity and focused helper-level coverage.",
  "proof_commands": [
    "PYTHONPATH=src python -m pytest -q tests/test_netclient_envelope_parity.py"
  ],
  "memory_lookup_key": "diagnosis:MISSING_PUBLIC_API_PARITY:netclient"
}

Tests:

PYTHONPATH=src python -m pytest -q tests/test_adaptive_diagnosis.py
python -m pre_commit run -a
./scripts/pr_preflight.sh

Acceptance criteria:

Each new class has a deterministic test fixture.
Unknown cases fall back to UNKNOWN_REVIEW_REQUIRED.
Only narrow mechanical classes can return safe_to_auto_fix: true.
Product/API/test/runtime/dependency/git-drift classes remain review-first.

Release readiness — Align maintenance action categories with diagnosis classes¶

Goal¶

Let maintenance action plans consume richer diagnosis categories instead of inventing separate category logic.

PR 2: Add maintenance action categories using diagnosis classes¶

Suggested branch:

feature/maintenance-action-diagnosis-categories

Suggested PR title:

Classify maintenance actions with adaptive diagnosis classes

Outputs:

artifacts/maintenance-action-categories.json
artifacts/maintenance-action-categories.md

JSON contract:

{
  "schema_version": "sdetkit.maintenance.action_categories.v1",
  "diagnostic_only": true,
  "automation_allowed": false,
  "category_count": 10,
  "counts_by_category": {
    "formatting": 1,
    "tests": 1,
    "security": 1,
    "workflow_hygiene": 1
  },
  "items": [
    {
      "rank": 1,
      "signal": "Run ruff check",
      "memory_lookup_key": "maintenance-action:lint_check:ruff-check",
      "diagnosis_class": "RUFF_FIXABLE_LINT",
      "category": "lint",
      "risk_level": "low",
      "safe_fix_route": "candidate_later",
      "review_required": true,
      "reason": "Ruff lint may be mechanically fixable, but policy proof is still required."
    }
  ]
}

Markdown contract:

# Maintenance action categories

- diagnostic only: **True**
- automation allowed: **False**
- categories: **N**

## Category mix

| Category | Count | Safe-fix route |
|---|---:|---|

## Classified actions

| Rank | Category | Diagnosis | Risk | Signal | Route |
|---:|---|---|---|---|---|

Tests:

PYTHONPATH=src python -m pytest -q \
  tests/test_maintenance_action_categories.py \
  tests/test_maintenance_on_demand_action_categories_workflow.py
./scripts/pr_preflight.sh

Acceptance criteria:

Uses adaptive_diagnosis classes where possible.
Does not enable auto-fix.
Uploads JSON/Markdown artifacts.
Adds an issue-comment section after action plan and before lower-level recommendation detail.

Platform readiness — Add proof checklist¶

Goal¶

Turn action-plan items into explicit evidence requirements. This is the bridge between “recommend” and “can progress.”

PR 3: Publish maintenance proof checklist¶

Suggested branch:

feature/maintenance-proof-checklist

Suggested PR title:

Publish maintenance proof checklist

Outputs:

artifacts/maintenance-proof-checklist.json
artifacts/maintenance-proof-checklist.md

JSON contract:

{
  "schema_version": "sdetkit.maintenance.proof_checklist.v1",
  "diagnostic_only": true,
  "automation_allowed": false,
  "proof_item_count": 10,
  "complete_count": 0,
  "missing_count": 10,
  "items": [
    {
      "rank": 8,
      "signal": "Run pytest -q",
      "memory_lookup_key": "maintenance-action:tests_check:run-tests",
      "diagnosis_class": "PRODUCT_LOGIC_FAILURE",
      "required_proof": "Attach passing pytest output.",
      "proof_status": "missing",
      "proof_commands": [
        "python -m pytest -q"
      ],
      "required_artifacts": [
        "pytest output"
      ],
      "can_progress_to_candidate": false,
      "blocking_reason": "Review proof has not been attached."
    }
  ]
}

Markdown contract:

# Maintenance proof checklist

- diagnostic only: **True**
- proof items: **N**
- missing proof: **N**

## Proof checklist

| Rank | Signal | Diagnosis | Proof status | Required proof | Can progress |
|---:|---|---|---|---|---|

Tests:

PYTHONPATH=src python -m pytest -q \
  tests/test_maintenance_proof_checklist.py \
  tests/test_maintenance_on_demand_proof_checklist_workflow.py
./scripts/pr_preflight.sh

Acceptance criteria:

Every action-plan item gets a proof row.
Missing proof blocks candidate progression.
Mechanical classes still require repeated/history evidence before auto-fix policy changes.

Operational readiness — Add signal trends¶

Goal¶

Use memory/history to distinguish one-off signals from repeated signals and prior successful fixes.

PR 4: Publish maintenance signal trend summary¶

Suggested branch:

feature/maintenance-signal-trends

Suggested PR title:

Publish maintenance signal trend summary

Outputs:

artifacts/maintenance-signal-trends.json
artifacts/maintenance-signal-trends.md

JSON contract:

{
  "schema_version": "sdetkit.maintenance.signal_trends.v1",
  "diagnostic_only": true,
  "automation_allowed": false,
  "signals": [
    {
      "memory_lookup_key": "maintenance-action:lint_check:ruff-check",
      "signal": "Run ruff check",
      "diagnosis_class": "RUFF_FIXABLE_LINT",
      "seen_count": 4,
      "recent_count": 2,
      "safe_fix_attempts": 1,
      "safe_fix_successes": 1,
      "trend": "recurring",
      "trend_confidence": "medium",
      "recommendation_impact": "candidate_later"
    }
  ]
}

Tests:

PYTHONPATH=src python -m pytest -q \
  tests/test_maintenance_signal_trends.py \
  tests/test_maintenance_on_demand_signal_trends_workflow.py
./scripts/pr_preflight.sh

Acceptance criteria:

Uses policy decision history and memory context.
Repeated signals are visible.
Trends affect recommendations only diagnostically at this stage.

Adoption readiness — Add human-friendly investigation front door¶

Goal¶

Expose shared adaptive diagnosis directly to humans through a thin command surface.

PR 5: Add failure investigation command¶

Suggested branch:

feature/investigate-failure-command

Suggested PR title:

Add failure investigation command

Commands:

python -m sdetkit investigate failure --log quality.log --format json
python -m sdetkit investigate failure --log quality.log --format markdown

Output fields:

classification
confidence
likely type
recommended next command
proof commands
safe-fix eligibility
memory lookup key

Example Markdown:

# Failure investigation

- classification: **MISSING_PUBLIC_API_PARITY**
- confidence: **high**
- likely type: **product/API gap**
- safe to auto-fix: **False**
- requires human review: **True**

## Why

The log shows an AttributeError for a missing async method while the sync method exists.

## Next action

Add async parity and helper-level coverage, then run the focused test slice.

## Proof commands

```bash
PYTHONPATH=src python -m pytest -q tests/test_netclient_envelope_parity.py

Tests:

```bash
PYTHONPATH=src python -m pytest -q \
  tests/test_investigate_failure.py \
  tests/test_adaptive_diagnosis.py
python -m pre_commit run -a

Acceptance criteria:

Calls adaptive_diagnosis; does not duplicate classification logic.
Supports JSON and Markdown.
Exits nonzero only for malformed inputs, not for diagnosed failures.

Scale readiness — Add repository investigation summary¶

Goal¶

Use boost and index evidence to choose where to investigate next.

PR 6: Add repository investigation summary¶

Suggested branch:

feature/investigate-repo-summary

Suggested PR title:

Add repository investigation summary

Command:

python -m sdetkit investigate repo --root . --format json

JSON contract:

{
  "schema_version": "sdetkit.investigate.repo.v1",
  "repo_shape": {
    "source_files": 328,
    "test_files": 591,
    "workflow_files": 18
  },
  "top_surfaces": [
    {
      "name": "netclient",
      "production_files": [
        "src/sdetkit/netclient.py"
      ],
      "test_files": [
        "tests/test_netclient.py"
      ],
      "reason": "bounded HTTP client surface with sync/async/API/CLI parity risk",
      "recommended_next_probe": "investigate surface --surface netclient"
    }
  ]
}

Tests:

PYTHONPATH=src python -m pytest -q tests/test_investigate_repo.py
./scripts/pr_preflight.sh

Acceptance criteria:

Wraps or consumes boost/index outputs.
Produces concise operator-facing surface choices.
Does not replace boost or index.

Phase 7 — Add focused surface investigation¶

Goal¶

Automate the manual narrowing work performed for netclient and similar surfaces.

PR 7: Add focused surface investigation¶

Suggested branch:

feature/investigate-surface

Suggested PR title:

Add focused surface investigation

Command:

python -m sdetkit investigate surface --root . --surface netclient --format json

JSON contract:

{
  "schema_version": "sdetkit.investigate.surface.v1",
  "surface": "netclient",
  "production_files": [
    "src/sdetkit/netclient.py",
    "src/sdetkit/apiclient.py",
    "src/sdetkit/apiget.py"
  ],
  "test_files": [
    "tests/test_netclient.py"
  ],
  "public_symbols": [
    "SdetHttpClient.get_json_list_paginated_envelope",
    "SdetAsyncHttpClient.get_json_list_paginated"
  ],
  "parity_risks": [
    {
      "kind": "sync_async_method_gap",
      "sync_symbol": "get_json_list_paginated_envelope",
      "async_symbol": "get_json_list_paginated_envelope",
      "status": "missing"
    }
  ],
  "recommended_probe": "write focused parity repro"
}

Tests:

PYTHONPATH=src python -m pytest -q tests/test_investigate_surface.py
./scripts/pr_preflight.sh

Phase 8 — Add deterministic parity detectors¶

Goal¶

Catch sync/async/helper/CLI/public-mode parity gaps before humans manually find them.

PR 8: Detect public API parity gaps¶

Suggested branch:

feature/public-api-parity-detectors

Suggested PR title:

Detect public API parity gaps

Detector families:

SYNC_ASYNC_METHOD_GAP
SYNC_ASYNC_HELPER_GAP
CLI_BACKEND_PARITY_GAP
PUBLIC_MODE_UNTESTED

JSON contract:

{
  "schema_version": "sdetkit.investigate.parity.v1",
  "surface": "netclient",
  "findings": [
    {
      "kind": "SYNC_ASYNC_METHOD_GAP",
      "severity": "warning",
      "sync_symbol": "SdetHttpClient.get_json_list_paginated_envelope",
      "async_symbol": "SdetAsyncHttpClient.get_json_list_paginated_envelope",
      "status": "missing",
      "recommended_test": "focused sync/async parity test"
    }
  ]
}

Tests:

PYTHONPATH=src python -m pytest -q tests/test_public_api_parity_detectors.py
./scripts/pr_preflight.sh

Acceptance criteria:

Deterministic AST-based checks.
No import-time side effects.
Detects known PR #1155-style gap from a fixture.

Phase 9 — Generate investigation evidence bundles¶

Goal¶

Write durable evidence artifacts for candidate freeze, audit result, proof commands, and investigation JSON.

PR 9: Write investigation candidate evidence¶

Suggested branch:

feature/investigation-evidence-writer

Suggested PR title:

Write investigation candidate evidence

Command:

python -m sdetkit investigate evidence \
  --classification MISSING_PUBLIC_API_PARITY \
  --surface netclient \
  --out-dir build/investigate/netclient

Generated files:

build/investigate/netclient/CANDIDATE_FREEZE.md
build/investigate/netclient/AUDIT_RESULT.md
build/investigate/netclient/proof-commands.md
build/investigate/netclient/investigation.json

Tests:

PYTHONPATH=src python -m pytest -q tests/test_investigate_evidence.py
./scripts/pr_preflight.sh

Baseline readiness0 — Route investigation diagnoses through safe-fix policy¶

Goal¶

Connect diagnosis classes to safe-fix eligibility without enabling broad automation.

PR 10: Route investigation diagnoses through safe-fix policy¶

Suggested branch:

feature/investigation-safe-fix-policy-routing

Suggested PR title:

Route investigation diagnoses through safe-fix policy

Policy matrix:

Diagnosis	Auto-fix?	Route
`PRE_COMMIT_FORMAT_DRIFT`	yes, later	safe mechanical
`RUFF_FIXABLE_LINT`	yes, narrow, later	safe mechanical
`GIT_BRANCH_DIVERGED`	no	command guidance
`REMOTE_BRANCH_DRIFT`	no	sync guidance
`MISSING_TEST_DEPENDENCY`	no	environment guidance
`PYTHON_RUNTIME_COMPATIBILITY`	no	compatibility PR
`LOCAL_ENVIRONMENT_FRICTION`	no	local environment guidance
`BROKEN_TEST_DOUBLE`	no by default	review-first test fix
`MISSING_PUBLIC_API_PARITY`	no	product implementation
`PRODUCT_LOGIC_FAILURE`	no	review-first product fix
`UNKNOWN_REVIEW_REQUIRED`	no	review-first

Tests:

PYTHONPATH=src python -m pytest -q tests/test_investigation_safe_fix_policy.py
./scripts/pr_preflight.sh

Acceptance criteria:

Broad diagnosis does not imply broad auto-fix.
Only mechanical classes can be candidates.
Candidate still requires history/proof/policy before execution.

Baseline readiness1 — Publish investigation summaries in PR comments¶

Goal¶

When CI fails, PR comments should show classification, confidence, safe-fix status, next proof command, and memory context.

PR 11: Publish investigation summaries for PR failures¶

Suggested branch:

feature/pr-investigation-summaries

Suggested PR title:

Publish investigation summaries for PR failures

Comment section example:

### Failure investigation

- classification: **PRE_COMMIT_FORMAT_DRIFT**
- confidence: **high**
- safe-fix status: **candidate later**
- next command: `python -m pre_commit run -a`
- memory: seen 2 times, fixed manually 2 times

Tests:

PYTHONPATH=src python -m pytest -q tests/test_pr_investigation_summary_workflow.py
./scripts/pr_preflight.sh

Baseline readiness2 — Remember investigation outcomes¶

Goal¶

Turn investigation outputs into durable memory that improves future recommendations, eligibility, action plans, safe-fix candidates, and risk scoring.

PR 12: Record investigation outcome memory¶

Suggested branch:

feature/investigation-outcome-memory

Suggested PR title:

Record investigation outcome memory

Memory fields:

{
  "schema_version": "sdetkit.investigation.outcome_memory.v1",
  "records": [
    {
      "classification": "PRE_COMMIT_FORMAT_DRIFT",
      "surface": "tests",
      "affected_files": ["tests/test_example.py"],
      "proof_command": "python -m pre_commit run -a",
      "safe_fix_outcome": "manual_success",
      "manual_fix_outcome": "merged",
      "pr_number": 1152,
      "merged": true,
      "time_to_green_seconds": 420
    }
  ]
}

Tests:

PYTHONPATH=src python -m pytest -q tests/test_investigation_outcome_memory.py
./scripts/pr_preflight.sh

Baseline readiness3 — Safe-fix candidate registry¶

Goal¶

Publish candidate status for classes that may eventually become automatable.

PR 13: Publish safe-fix candidate registry¶

Suggested branch:

feature/safe-fix-candidate-registry

Suggested PR title:

Publish safe-fix candidate registry

JSON contract:

{
  "schema_version": "sdetkit.safe_fix.candidates.v1",
  "automation_allowed": false,
  "candidates": [
    {
      "candidate_key": "diagnosis:PRE_COMMIT_FORMAT_DRIFT",
      "category": "formatting",
      "risk_level": "low",
      "required_history_count": 3,
      "required_success_count": 3,
      "allowed_commands": ["python -m pre_commit run -a"],
      "forbidden_paths": [".github/workflows"],
      "rollback_required": true,
      "current_status": "OBSERVE_MORE"
    }
  ]
}

Baseline readiness4 — Auto-fix probation report¶

Goal¶

Decide which candidates are not ready, need more observation, are blocked, or are ready for a policy PR.

PR 14: Publish auto-fix probation report¶

Suggested branch:

feature/auto-fix-probation-report

Suggested PR title:

Publish auto-fix probation report

Statuses:

NOT_READY
OBSERVE_MORE
READY_FOR_POLICY_PR
BLOCKED

Baseline readiness5 — Policy proposal generator¶

Goal¶

Generate proposed policy changes when proof exists. Do not execute them.

PR 15: Publish maintenance policy proposals¶

Suggested branch:

feature/maintenance-policy-proposals

Suggested PR title:

Publish maintenance policy proposals

Example output:

# Maintenance policy proposal

## Proposal

Allow Ruff format drift to be fixed automatically in PR-only mode.

## Why

- 5 repeated reviewed successes
- no human edits after auto-format
- preflight passed every time

## Scope

- tests only
- no workflow files
- no security-sensitive files

## Required checks

- `python -m pre_commit run -a`
- `./scripts/pr_preflight.sh`

## Rollback

Required.

Baseline readiness6 — Auto-fix dry-run planner¶

Goal¶

Show exact planned changes without modifying files.

PR 16: Publish auto-fix dry-run plan¶

Suggested branch:

feature/auto-fix-dry-run-plan

Suggested PR title:

Publish auto-fix dry-run plan

Safety:

no file writes
no commits
no PR creation
no allowlist expansion

Baseline readiness7 — Guarded PR-only auto-fix¶

Goal¶

Enable auto-fix only for approved safe mechanical classes and only through PRs.

PR 17: Enable guarded PR-only auto-fix¶

Suggested branch:

feature/guarded-pr-auto-fix

Suggested PR title:

Enable guarded PR-only auto-fix

Rules:

never push directly to main
only approved candidate classes
only allowed commands
only allowed paths
must show diff
must run preflight
must attach proof
must open PR
must record outcome

Baseline readiness8 — Auto-fix outcome memory¶

Goal¶

Record every guarded auto-fix attempt, success, failure, revert, human edit, and check outcome.

PR 18: Record auto-fix outcome memory¶

Suggested branch:

feature/auto-fix-outcome-memory

Suggested PR title:

Record auto-fix outcome memory

Memory fields:

{
  "schema_version": "sdetkit.auto_fix.outcome_memory.v1",
  "records": [
    {
      "candidate_key": "diagnosis:PRE_COMMIT_FORMAT_DRIFT",
      "attempted": true,
      "succeeded": true,
      "failed": false,
      "reverted": false,
      "human_edited": false,
      "checks_passed": true,
      "checks_failed": false,
      "pr_number": 1200
    }
  ]
}

Command surfaces to add¶

Add commands gradually, only when the underlying shared modules exist.

python -m sdetkit investigate failure --log quality.log --format json
python -m sdetkit investigate failure --log quality.log --format markdown
python -m sdetkit investigate repo --root . --format json
python -m sdetkit investigate surface --root . --surface netclient --format json
python -m sdetkit investigate evidence --classification MISSING_PUBLIC_API_PARITY --surface netclient --out-dir build/investigate/netclient

Potential package entrypoint later:

sdetkit investigate failure --log quality.log --format markdown
sdetkit investigate repo --root . --format markdown
sdetkit investigate surface --root . --surface netclient --format markdown

JSON and Markdown output conventions¶

Every roadmap artifact should follow these rules:

JSON conventions¶

Include schema_version.
Include diagnostic_only when artifact is not allowed to mutate behavior.
Include automation_allowed when relevant.
Include stable keys for memory lookup.
Use deterministic sorting for counts/maps.
Include enough fields for downstream consumers.
Avoid hidden behavior only present in code.

Markdown conventions¶

Start with a clear H1.
Show safety state near the top.
Include counts before details.
Use compact tables for operator scanning.
Include “What to do next” or equivalent.
Include proof commands when actionable.
Keep issue-comment sections truncation-safe.

Safety boundaries¶

Strict safety rules for the whole roadmap:

Product/API gaps stay review-first.
Broken test doubles stay review-first by default.
Runtime compatibility issues stay review-first.
Missing dependencies stay guidance/review-first.
Git branch drift stays command guidance.
Security findings stay review-first.
Unknown classifications stay review-first.
Auto-fix must never push directly to main.
Auto-fix must never broaden policy implicitly.
Auto-fix must never run without proof, policy, allowed commands, allowed paths, and PR-only guardrails.

What stays diagnostic-only¶

These layers should remain diagnostic-only unless a later policy PR explicitly changes behavior:

adaptive diagnosis classification
failure investigation command
repo investigation summary
surface investigation summary
parity detectors
maintenance action categories
proof checklist
signal trends
candidate registry
probation report
policy proposal generator
dry-run planner
PR/CI investigation summaries
outcome memory recording

What can become safe mechanical auto-fix later¶

Only narrow mechanical classes can become candidates, and only after proof/history/policy gates:

Class	Candidate?	Notes
`PRE_COMMIT_FORMAT_DRIFT`	yes	Only through pre-commit/formatters, PR-only, allowed paths
`RUFF_FIXABLE_LINT`	yes, narrow	Only approved Ruff fixable rules, PR-only, allowed paths
`GIT_BRANCH_DIVERGED`	no	Command guidance only
`REMOTE_BRANCH_DRIFT`	no	Sync guidance only
`MISSING_TEST_DEPENDENCY`	no	Dependency/environment guidance or explicit PR
`PYTHON_RUNTIME_COMPATIBILITY`	no	Compatibility PR required
`LOCAL_ENVIRONMENT_FRICTION`	no	Local guidance only
`BROKEN_TEST_DOUBLE`	no by default	Test behavior can be semantically wrong
`MISSING_PUBLIC_API_PARITY`	no	Product implementation required
`PRODUCT_LOGIC_FAILURE`	no	Product review required
`UNKNOWN_REVIEW_REQUIRED`	no	No automation until classified

How memory/history feeds recommendations¶

Memory should become the connective tissue of the system.

Inputs to remember¶

diagnosis class
source surface
affected files
failure log hash or signature
proof command
proof result
PR number
merged/not merged
checks passed/failed
whether human edited the fix
whether auto-fix was reverted
time to green
recurrence count
last seen timestamp

Consumers of memory¶

recommendations
priority rollups
eligibility diagnostics
action plans
proof checklists
category classifier
signal trends
candidate registry
probation report
policy proposals
PR quality comments
mission-control release bundles
surface risk scoring

Memory-driven promotion path¶

first observation
  ↓
review required
  ↓
proof attached
  ↓
repeated successful evidence
  ↓
candidate later
  ↓
probation
  ↓
ready for policy PR
  ↓
dry run
  ↓
guarded PR-only auto-fix
  ↓
outcome memory

First 5 PRs to execute¶

1. Expand adaptive diagnosis for local investigation failures¶

Branch:

feature/adaptive-diagnosis-local-investigation-failures

Title:

Expand adaptive diagnosis for local investigation failures

Why first:

It upgrades the shared brain.
It captures the real manual workflow.
It prevents action categories/proof checklists from inventing separate classification logic.

2. Classify maintenance actions with adaptive diagnosis classes¶

Branch:

feature/maintenance-action-diagnosis-categories

Title:

Classify maintenance actions with adaptive diagnosis classes

Why second:

Maintenance action plans need diagnosis classes.
Later auto-fix safety depends on category.

3. Publish maintenance proof checklist¶

Branch:

feature/maintenance-proof-checklist

Title:

Publish maintenance proof checklist

Why third:

Every future candidate needs explicit proof requirements.

4. Publish maintenance signal trend summary¶

Branch:

feature/maintenance-signal-trends

Title:

Publish maintenance signal trend summary

Why fourth:

The system needs repeated-history strength before candidate/probation logic.

5. Add failure investigation command¶

Branch:

feature/investigate-failure-command

Title:

Add failure investigation command

Why fifth:

Once the shared diagnosis brain is richer, expose it as a human-friendly command.

GitHub Project board mapping¶

Use GitHub Project #2 as the execution board for this roadmap.

Recommended project name:

SDETKit Adaptive Investigation Roadmap

Recommended views¶

View	Purpose	Group/sort by
Roadmap	Main execution board	Status
Phases	See roadmap progress by phase	Phase
Safety lane	Separate diagnostic-only work from future auto-fix work	Safety Route
PR queue	Track the next small PRs to execute	Priority, Status
Automation ladder	Track candidate/probation/policy/dry-run/auto-fix maturity	Safety Route

Recommended fields¶

Field	Values
Status	Backlog, Ready, In Progress, In Review, Merged, Blocked, Later
Phase	Baseline readiness through Baseline readiness8
Safety Route	Diagnostic Only, Review First, Safe Mechanical Candidate, Probation, Policy Proposal, Guarded PR Auto-Fix
Priority	P0, P1, P2
Artifact Type	JSON, Markdown, CLI, Workflow, Memory, Policy, Tests
Depends On	Linked issue or PR
Proof Status	Missing, Partial, Complete, Not Required
Automation Status	Not Eligible, Observe More, Candidate Later, Ready for Policy PR, Blocked
Owner	Maintainer or automation lane owner

Suggested initial project issues¶

Order	Issue title	Phase	Safety Route
1	Expand adaptive diagnosis for local investigation failures	Baseline readiness	Diagnostic Only
2	Classify maintenance actions with adaptive diagnosis classes	Release readiness	Diagnostic Only
3	Publish maintenance proof checklist	Platform readiness	Diagnostic Only
4	Publish maintenance signal trend summary	Operational readiness	Diagnostic Only
5	Add failure investigation command	Adoption readiness	Diagnostic Only
6	Add repository investigation summary	Scale readiness	Diagnostic Only
7	Add focused surface investigation	Phase 7	Diagnostic Only
8	Detect public API parity gaps	Phase 8	Review First
9	Write investigation candidate evidence	Phase 9	Diagnostic Only
10	Route investigation diagnoses through safe-fix policy	Baseline readiness0	Review First
11	Publish investigation summaries for PR failures	Baseline readiness1	Diagnostic Only
12	Record investigation outcome memory	Baseline readiness2	Diagnostic Only
13	Publish safe-fix candidate registry	Baseline readiness3	Safe Mechanical Candidate
14	Publish auto-fix probation report	Baseline readiness4	Probation
15	Publish maintenance policy proposals	Baseline readiness5	Policy Proposal
16	Publish auto-fix dry-run plan	Baseline readiness6	Policy Proposal
17	Enable guarded PR-only auto-fix	Baseline readiness7	Guarded PR Auto-Fix
18	Record auto-fix outcome memory	Baseline readiness8	Guarded PR Auto-Fix

Board execution rules¶

Keep only one or two P0 items in progress at a time.
Do not move an item to Ready unless its dependency issue or PR is merged.
Do not move any auto-fix item past Probation unless proof, trend, candidate registry, and policy proposal artifacts exist.
Keep Baseline readiness7 as a major milestone, not a near-term task. It should stay blocked until Phases 1-16 are stable.
Every issue should include the safety route and whether behavior is diagnostic-only, review-first, or an approved mechanical candidate.

Definition of done for every roadmap PR¶

Every PR in this roadmap should meet this checklist before merge.

Required for every PR¶

- PR scope is one small roadmap step.
- PR body explains why this step exists in the product spine.
- JSON schema is added or updated if the PR emits machine-readable output.
- Markdown output is added if the PR is operator-facing.
- Deterministic tests are added for every new diagnosis, policy, artifact, or command path.
- Existing behavior remains diagnostic-only unless the phase explicitly changes that.
- No auto-fix behavior is enabled unless the PR is a later policy-approved auto-fix phase.
- Rollback plan is included in the PR body.
- ./scripts/pr_preflight.sh passes.

Extra requirements for workflow PRs¶

- Workflow YAML is validated by pre-commit.
- Artifact upload paths are deterministic.
- Reporting steps use safe failure behavior when they must not create or mask CI failures.
- PR or maintenance comments stay truncation-safe.
- Comment sections are ordered so high-level diagnosis appears before low-level details.

Extra requirements for diagnosis/classification PRs¶

- Every new diagnosis class has at least one positive fixture.
- Unknown or ambiguous logs fall back to UNKNOWN_REVIEW_REQUIRED.
- Confidence is deterministic and explainable.
- Product/API/test/runtime/dependency/security/git-drift classes remain review-first.
- Only approved mechanical classes may set safe-to-auto-fix signals.

Extra requirements for memory/history PRs¶

- Records have stable IDs or stable memory lookup keys.
- Appends are idempotent where possible.
- Repeated-signal rollups are deterministic.
- Missing or malformed optional history files degrade safely.
- Memory never overrides safety policy without explicit proof and policy gates.

Extra requirements for safe-fix or auto-fix PRs¶

- The PR states the exact allowed commands.
- The PR states the exact allowed file/path scope.
- The PR states forbidden paths.
- The PR requires proof commands.
- The PR includes rollback behavior.
- The PR never pushes directly to main.
- The PR never mutates fork PRs.
- The PR records outcome memory.

Major milestone gate for guarded auto-fix¶

Baseline readiness7, guarded PR-only auto-fix, is a major milestone and must not start until the earlier investigation, proof, trend, candidate, probation, policy proposal, and dry-run layers are stable.

Before Baseline readiness7 starts, the project should have:

- Stable diagnosis classes for local investigation failures.
- Maintenance actions classified by diagnosis.
- Proof checklist artifacts.
- Signal trend artifacts.
- Investigation CLI front door.
- Repo and surface investigation summaries.
- Parity detectors.
- Investigation evidence writer.
- Safe-fix policy routing.
- PR/CI investigation summaries.
- Investigation outcome memory.
- Safe-fix candidate registry.
- Auto-fix probation report.
- Maintenance policy proposal generator.
- Auto-fix dry-run planner.

Only after those layers are proven should guarded PR-only auto-fix move from Blocked to Ready.

Final roadmap line¶

detect → diagnose → recommend → plan → prove → classify → trend → candidate → probation → policy proposal → dry run → guarded PR auto-fix → remember outcome

This is the product direction. Every PR should either strengthen one stage of this path or improve the evidence flow between stages.