Adaptive Intelligence Toolkit Roadmap¶
This roadmap converts the recent adaptive-diagnosis work into a product plan for making SDETKit feel like a serious, evidence-first engineering copilot instead of a static CI helper.
The goal is not to hard-code millions of brittle rules. The goal is to combine a seeded scenario catalog, repo-specific learning memory, generated evidence, and safe remediation policy so each run answers four questions:
- What happened? Detect the first real failure signal, not just any non-empty log.
- What is the most likely scenario? Rank candidate failure families across CI, test, dependency, environment, security, release, and docs lanes.
- What should a human check first? Produce the smallest review-first checklist and proof commands.
- What can be fixed safely? Allow automation only for narrow, proven, mechanical changes.
Current strengths after the latest kit run¶
| Strength | Why it matters | Current proof surface |
|---|---|---|
| Evidence-first adaptive diagnosis | Green logs stay clear, while real unclassified failures remain review-first. | adaptive_diagnosis.analyze_evidence() emits clear for no-signal evidence and UNKNOWN_REVIEW_REQUIRED only for failure-like evidence. |
| Seeded scenario intelligence | First runs no longer start from zero; they start from a catalog of common CI and quality failures. | SEEDED_SCENARIO_DB covers pytest, Ruff, mypy, coverage, package installs, policy gates, Docker, security, release, docs, platform, cache, network, and time-related failures. |
| Combinatorial odds coverage | The kit can reason over a billion-plus environment/scenario combinations without storing a billion static rows. | ODDS_EXPANSION_AXES multiplies scenario count, failure signals, runners, Python versions, dependency states, filesystems, network states, test shapes, runtime states, and change types. |
| Review-first safety posture | Unknown failures do not become safe auto-fix candidates by accident. | safe_to_auto_fix remains limited to the narrow safe allowlist. |
| Actionable first checks | Unknown review output includes candidate scenarios, checks, and proof commands instead of a generic warning. | Candidate evidence includes candidate_scenarios=... and candidate_odds=.... |
| Documentation and operator posture | The repo already has strong docs, adoption paths, evidence references, CI guidance, release/process docs, and roadmap structure. | MkDocs navigation exposes first-proof, operator/evidence, reference, advanced, and roadmap sections. |
What still needs to become stronger¶
| Gap | Risk if ignored | Upgrade direction |
|---|---|---|
| Scenario data is still embedded in Python | The catalog grows harder to review, version, extend, and ship as packs. | Move scenario definitions to versioned JSON/YAML rule packs with schema validation. |
| Candidate scoring is heuristic-only | Similar signals can rank confusingly when logs are noisy. | Add weighted scoring using historical outcomes, repo-local memory, and confidence calibration. |
| Learning memory is not yet fully closed-loop with scenario outcomes | The kit can suggest candidates, but it does not yet continuously promote/examplete scenarios based on whether fixes worked. | Record accepted diagnosis, applied fix, proof command result, recurrence, and false-positive feedback. |
| Remediation remains narrow | This is safe, but users will want more assisted fixes after trust builds. | Add staged remediation lanes: explain-only, patch-plan, dry-run patch, guarded same-repo PR, and post-fix proof. |
| Evidence UI can still be easier to consume | Large JSON is powerful but not always persuasive for new users. | Add compact dashboards, markdown summaries, and PR comment sections that show evidence progression. |
| Multi-repo and enterprise pack behavior needs stronger contracts | Teams need repeatable policy and learning across many repositories. | Add organization-level scenario packs, policy overlays, shared learning exports, and privacy-preserving aggregation. |
| Benchmarking and examples need more real failure fixtures | A powerful product needs believable proof, not only unit tests. | Add fixture suites for common CI failures and publish before/after case studies. |
Next upgrade roadmap¶
Baseline readiness — Externalize the adaptive intelligence database¶
Outcome: the seeded brain becomes a maintainable data product.
- Add
schemas/adaptive-scenario-pack.schema.json. - Move the built-in scenario catalog to
src/sdetkit/data/adaptive_scenarios.jsonoryaml. - Add loader validation with stable fields:
code,title,signals,keywords,checks,commands,risk_band,prior_weight, and optionaltags. - Support layered packs:
- built-in SDETKit pack,
- repo-local
.sdetkit/adaptive/scenarios.json, - organization pack,
- private enterprise pack.
- Add tests that reject malformed packs and prove deterministic ordering.
Release readiness — Close the learning loop¶
Outcome: the kit learns from actual run outcomes, not only from seed data.
- Record every adaptive diagnosis attempt as a learning event:
- matched signals,
- candidate scenarios,
- selected primary diagnosis,
- recommended checks,
- proof commands,
- whether proof passed,
- whether fix was accepted,
- recurrence count,
- false-positive marker.
- Add promotion/exampletion rules: Done: summaries now promote scenarios when proof/fix feedback succeeds, examplete false positives, increase risk for recurring failures, and lower confidence for thin evidence.
- Add
sdetkit adaptive learn summarizeto show top recurring scenarios and weakest lanes. Done: the CLI now rolls JSONL diagnosis events intotop_recurring_scenariosandweakest_lanes.
Platform readiness — Build the trust-grade operator experience¶
Outcome: anyone trying the repo can see why SDETKit is valuable in one run.
- Add a single generated
build/sdetkit/operator-brief.mdcontaining: Done viasdetkit adaptive brief. - gate result,
- adaptive diagnosis,
- scenario candidates,
- first proof command,
- safe-fix decision,
- next owner action.
- Add a short PR comment mode: Done via
sdetkit adaptive brief --format comment. - green run: no fake adaptive block,
- known safe mechanical issue: scoped auto-fix path,
- unknown failure: review-first candidate scenarios and checks.
- Add screenshots or sample PR comments in docs for the top 10 scenarios. Done:
docs/adaptive-product-proof-gallery.mdnow shows green, safe-fix, unknown-review, recurring-learning, top-scenario, and portfolio rollup examples.
Operational readiness — Expand safe remediation without weakening safety¶
Outcome: more fixes are assisted, but unknown failures remain human-reviewed.
- Keep current safe auto-fix route narrow.
- Add a second lane: assisted patch plan for non-mechanical cases. Done:
sdetkit adaptive patch-planemits review-only patch steps, guardrails, proof commands, and rollback notes without allowing mutation. - Require four gates before any non-format PR automation:
- deterministic reproduction,
- scenario confidence threshold,
- changed-file scope limit,
- post-fix proof command. Done: fix-audit summaries now flag missing proof, block failed proof, and emit release recommendations.
- Add fix-audit records for every automated change. Done:
sdetkit adaptive fix-audit recordstores safe-fix and assisted patch-plan decisions with guardrails, proof commands, changed-file scope, rollback notes, and outcomes.
Adoption readiness — Make it enterprise-scale¶
Outcome: SDETKit becomes a cross-repo quality intelligence layer.
- Add portfolio rollups:
- top failing scenario families,
- recurrence by repo,
- flake hotspots,
- dependency drift hotspots,
- remediation success rate,
- mean time to first actionable proof.
- Add governance controls: Done:
sdetkit adaptive enterprise-governance reportandanonymize-learningcover pack approvals, policy override boundaries, security-sensitive scenario isolation, and anonymized learning export. - pack approval workflow,
- policy overrides,
- security-sensitive scenario isolation,
- anonymized learning export.
- Add adapters for GitHub Actions, GitLab, Jenkins, and local-only operation. Done:
sdetkit adaptive integration-adapter validatechecks required adaptive artifact inputs and provider upload targets.
Completion checkpoint¶
The Adaptive Intelligence execution plan is now complete across the immediate backlog, Operational readiness safe-remediation expansion, and Adoption readiness enterprise-scale lanes. CLI discoverability for every adaptive lane is covered by regression tests so the next work can move into a fresh roadmap wave without losing these command surfaces. The next wave is tracked in adaptive-next-wave-roadmap.md.
Big-win differentiators to protect¶
- No fake intelligence. If evidence is green, stay quiet.
- Unknown is review-first. Unknown failure evidence must never be guessed into safe auto-fix.
- Proof commands are part of the product. A diagnosis without a proof path is not enough.
- Learning is local and inspectable. Teams should see what the kit learned and why.
- Scenario packs are versioned assets. The brain should be reviewable, testable, and portable.
- Automation earns trust. Start with diagnosis, then safe mechanical fixes, then guarded patch plans.
Suggested immediate backlog¶
| Priority | Work item | Acceptance check |
|---|---|---|
| P0 | Extract scenario DB to a schema-validated pack | Done: built-in scenarios now load from src/sdetkit/data/adaptive_scenarios.json, validate through loader rules, and are documented against schemas/adaptive-scenario-pack.schema.json. |
| P0 | Add learning event records for adaptive diagnosis | Done: sdetkit adaptive learn record writes JSONL events with matched signals, candidates, selected primary diagnosis, checks, proof commands, recurrence count, and outcome placeholders. |
| P0 | Add operator brief artifact | Done: python -m sdetkit adaptive brief generates build/sdetkit/operator-brief.md from gate, diagnosis, learning, and safe-fix artifacts. |
| P1 | Add fixture corpus for top scenarios | Done: tests/fixtures/adaptive_logs/ covers 20 realistic log fixtures with expected primary diagnosis, first proof command, candidate scenario, and safe-fix posture assertions. |
| P1 | Add candidate confidence calibration | Done: adaptive diagnosis can consume learning-summary calibration to boost/examplete candidate scenario ranking and emit candidate_calibration evidence. |
| P1 | Add docs example gallery | Done: docs/adaptive-product-proof-gallery.md shows green, safe-fix, unknown-review, recurring-learning, top-10 scenario, and portfolio rollup examples. |
| P2 | Add org-level pack overlay | Done: layered packs emit source metadata, governance validation rejects unapproved duplicate-code overrides, and docs/governance-and-org-packs.md documents approval expectations. |
| P2 | Add portfolio rollup | Done: sdetkit adaptive portfolio-rollup rolls multiple adaptive diagnosis outputs into a top-risk scenario report with recurrence by repo, candidate mentions, release recommendation, and next owner action. |
Progress tracking¶
Current measurable progress, latest movement, next-PR recommendation, and follow-up queue are tracked in adaptive-intelligence-progress-tracker.md.
Definition of “real big win”¶
SDETKit becomes a big win when a new repo can run one command and get:
- a trustworthy green/noise-free result when everything passes,
- a human-readable diagnosis when something fails,
- candidate scenarios that feel like they came from an experienced SDET,
- proof commands that narrow the fix immediately,
- safe automation only where the risk is genuinely mechanical,
- learning memory that gets better with every run,
- and a roadmap of evidence that engineering leaders can trust.