v1.3.1: Exhaustive decision-tree search + UX refinements

The v1.3.0 saturation termination silently capped the search after only the heuristic-favored part of the tree, leaving most per-set ceiling cells stuck at "0 specs" and hiding genuinely-feasible 3-spec plans in maximize-count mode. Replace with full exhaustive enumeration plus a batch of UX refinements that emerged during testing. Algorithm: - Drop the saturation early-termination entirely. Search now runs the full open-set cartesian product to completion; the iteration cap is also removed so no scenario exits partial. - Add mode-dependent DFS child ordering: priority-order keeps the priority-target-first heuristic; maximize-count orders children by descending count of qualifications for reachable specs (generalist courses tried first). - Make the (count, priorityScore) comparator mode-aware: priority-order ranks by (priorityScore, count) so the user's top spec surfaces; maximize-count ranks by (count, priorityScore) so the highest count wins. The same rule drives both top-K position and per-cell ceiling selection (and the Recommended badge). - Add an evaluated boolean to each ChoiceOutcome and set it on first leaf evaluation. Distinguishes "still searching" from "evaluated, no specs achieved" so the UI never shows misleading 0 specs for a cell the search hasn't reached yet. - Throttled progress events (~100ms) carrying iterations / total leaf count, drive both the per-set spinner and the global progress bar. UI: - Top Plans header shows a horizontal progress bar with "iterations / total · NN%" while the search runs; collapses to "Search complete · N explored" on completion. - Per-set spinner next to each elective set heading while any choice in that set is unevaluated. - Per-cell pulsing dot + "searching" text for unevaluated cells. - Replace the "(HCR, BNK, ...)" text labels on each course with color-coded SpecTag pills using a new fixed per-spec palette (app/src/data/specColors.ts). Same palette applied to the Top Plans achievement badges so the two views are visually consistent. - "Top outcome if picked ↓" caption above the right side of each open elective set so the spec tags are clearly identified as decision-tree outcomes (not the course's own qualifications). - Recommended badge moved inline next to the course name (instead of on a separate row below) to keep button heights stable. Tests: - Replace the saturation early-termination test with an exhaustion test asserting every cell ends with evaluated: true and partial: false. - Add mode-dependent ordering test (max-count visits Climate Finance before Corporate Governance in fall3). - Add evaluated-flag transition test. - Add throttled progress-event test (>= ~100ms between consecutive emits). - Performance smoke updated to a 60s budget for the exhaustive user-scenario search; 8-open-set typical case completes in ~7s. Files: solver/decisionTree.ts, solver/priority.ts (already shipped), data/specColors.ts (new), components/{TopPlans,CourseSelection}.tsx, state/appState.ts, workers/decisionTree.worker.ts, __tests__/searchDecisionTree.test.ts, vite.config.ts, CHANGELOG.md, openspec/changes/decision-tree-exhaustive-search/* (full change spec).
2026-05-09 15:47:56 -04:00
parent 4b80fac500
commit cb49123930
16 changed files with 780 additions and 110 deletions
@@ -0,0 +1,99 @@
+## Context
+
+v1.3.0 introduced a streaming decision-tree search (`searchDecisionTree`) with two early-termination criteria: a hard iteration cap (`MAX_TREE_ITERATIONS = 10000`) and a saturation criterion (`SATURATION_LIMIT = 500` iterations of no top-K change). The saturation criterion fires too eagerly because the top-K updates only when a *strictly better* outcome is inserted; many leaves produce duplicate outcome classes that don't update top-K but still increment the saturation counter.
+
+Two visible consequences in the user's testing:
+
+1. Per-set ceiling table shows "0 specs" for many courses. Root cause: cells are initialized with `{count: 0, specs: []}` and only updated when a leaf containing them is evaluated. The DFS, ordered by the priority-target heuristic, exhausts target-favoring branches first. Saturation fires before the DFS backtracks to non-target choices in early sets.
+2. Top Plans in maximize-count mode shows only 2-spec outcomes when 3-spec combinations exist. Root cause: the same saturation fires before the search reaches the part of the tree where 3-spec-feasible combinations live (typically generalist-course-heavy combinations).
+
+Investigation in `app/src/solver/decisionTree.ts:204-225` confirmed both behaviors. A diagnostic showed the user's reproduction case (8 open sets, 49,152 leaves) saturates at ~500 iterations, leaving most cells unevaluated.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Per-set ceiling cells reflect the true best outcome for every (set, course) pair after search completes
+- Top-K reflects the genuinely-best plans achievable for the given pin/ranking, in either mode
+- UI distinguishes "still searching" from "search complete, this course achieves nothing"
+- Search remains responsive: high-quality results appear in the stream within the first ~hundred iterations even though full search takes seconds
+
+**Non-Goals:**
+- Sub-second exhaustive search for the 8-open-set worst case (50K leaves × ~1ms LP = ~50s is acceptable in a worker)
+- Replacing the LP solver or re-architecting the optimizer
+- Caching across runs
+- Configurable iteration cap from the UI
+
+## Decisions
+
+### Drop saturation termination entirely (not "make it smarter")
+
+The user explicitly chose Approach A: exhaustive. Smarter saturation criteria (e.g., "stable for N iters AND every cell visited at least once") add complexity without reaching demonstrated correctness — there's always a pathological combination that defeats the heuristic. Exhaustive is simpler, demonstrably correct, and feasible in a worker.
+
+**Alternative considered:** Two-phase search (fast heuristic + background exhaustive sweep). Rejected — added complexity (phase transition events, partial state), and the user's preference for "exhaustive" was explicit. The progress UI absorbs the same UX pain (user sees the search running) without the implementation cost.
+
+### Mode-dependent enumeration ordering
+
+The current target-first heuristic biases toward priority-order mode's expected outcome (high-priority spec achieved early). For maximize-count mode, ordering by qualification breadth is a better fit because generalist courses lead to higher-count plans. Both heuristics are cheap to compute (one upper-bounds map lookup, then a per-course count) and run once per analysis.
+
+For maximize-count, the score per course is `count of (specId in course.qualifications) where upperBounds[specId] >= 9`. Sorting children by this score descending puts the generalist courses first. Stable sort keeps declaration order on ties.
+
+**Alternative considered:** A single unified ordering (e.g., always order by qualification breadth). Rejected — for priority-order mode, the user's priority is the meaningful signal; using breadth would suppress the priority spec early in the stream.
+
+### `evaluated: boolean` on `ChoiceOutcome`
+
+Adds one boolean per (set, course) cell — negligible overhead. Cleaner than a sentinel value (e.g., `ceilingCount = -1`) that consumers might forget to handle and would need to be filtered everywhere ceilingCount is read. Boolean has obvious semantics in the UI ("not yet known" vs "known, value is 0").
+
+**Alternative considered:** Omit cells entirely until evaluated. Rejected — the UI needs the courseName to render the row; restructuring to fetch course names from `coursesBySet` would push more responsibility into the renderer for no clear win.
+
+### Per-set "Recommended" derivation in UI, not in worker
+
+The recommended choice is a function of `analysis.choices` and the comparator. Computing in the UI keeps the worker protocol simple (no new field), avoids a duplicate computation, and lets the UI re-render cheaply on each choiceUpdate.
+
+The comparator: `(ceilingCount desc, priorityScore desc)`. Same as the top-K. The "Recommended" course in a set is the one whose ceiling best matches the user's overall objective.
+
+### Throttled `progress` event (≈100ms)
+
+Without throttling, the worker would emit a progress event per iteration — 50,000 events × 1ms = 50s of message overhead. With ~100ms throttling: ~500 events per search, each tiny. Implementation: track `lastProgressEmit` timestamp; emit if `Date.now() - lastProgressEmit >= 100`.
+
+**Alternative considered:** No progress events; rely on `topKUpdate` for activity signal. Rejected — top-K updates fire only when something changes; long stretches of "exploring duplicates" would look like a frozen UI.
+
+### Mode-aware comparator (emerged during implementation)
+
+After dropping saturation, exhaustive search surfaces 3-spec non-HCR plans that beat 2-spec HCR plans on the original (count, priorityScore) comparator. This conflicted with v1.3.0 spec scenarios that asserted HCR appears at `topK[0]` in priority-order mode with HCR ranked first. Resolution: make both the top-K and per-cell ceiling comparators mode-dependent.
+
+- `priority-order` mode: `(priorityScore desc, count desc, key asc)` — surfaces the user's top-priority spec even when higher-count alternatives exist
+- `maximize-count` mode: `(count desc, priorityScore desc, key asc)` — surfaces the maximum number of specs achievable
+
+Both comparators share the deterministic `assignmentKey` tiebreaker for streaming stability. The CourseSelection "Recommended" badge uses the same mode-dependent rule so cell recommendations align with the top-K ranking.
+
+**Alternative considered:** Keep the count-first comparator and let exhaustive search reveal high-count alternatives. Rejected — contradicts user-stated intent ("HCR top priority should surface") and breaks v1.3.0 spec scenarios.
+
+### `MAX_TREE_ITERATIONS = 100,000`
+
+Empirically, 8-open-set worst case is ~50K leaves. 100K provides 2× headroom. Larger scenarios (10+ open sets) would still be capped, with `partial: true` displayed. The fallback to "empty choices" already exists for `openSetIds.length > MAX_OPEN_SETS_FOR_ENUMERATION` (= 9), so this cap rarely fires in practice.
+
+## Risks / Trade-offs
+
+- **50s search feels slow** → Progress UI + streamed top-K make it feel active; user can adopt a plan partway through if they like what's shown
+- **Worker CPU usage during search** → Acceptable; runs in a worker thread, doesn't block UI; user can change pins to abort and restart
+- **Throttled progress means iteration count "jumps"** → Cosmetic only; UI doesn't depend on monotonic small steps
+- **`evaluated: false` initial state for every cell** → Slightly verbose payloads; choiceUpdate already sends the full set's choices array, so the change is one boolean field per cell (negligible)
+- **Mode switch mid-search** → Current behavior already terminates and restarts the worker on any pin/ranking/mode change; unchanged
+- **Tests need amendments** → Saturation tests removed (3 tests); exhaustion test added; mode-ordering test added; per-cell evaluated transition test added
+
+## Migration Plan
+
+Single-PR change. No data migration. Steps:
+
+1. Algorithm + worker + state changes; tests updated
+2. UI updates: per-cell evaluated rendering, per-set spinner, global progress, recommended badge
+3. Browser-verify both modes against the v1.3.0 reproduction scenario; confirm exhaustive search completes and all cells are populated
+4. Bump version (`1.3.1`); CHANGELOG entry; ship
+
+Rollback: revert the change; v1.3.0 behavior restored. No persistent state to migrate.
+
+## Open Questions
+
+- **Recommended badge for ties** — if two choices in a set have identical `(count, priorityScore)`, currently the comparator's deterministic tiebreaker (assignmentKey) picks one. UI shows just one Recommended. Acceptable for v1; could be revisited if confusing.
+- **Should "Recommended" still show before search completes** — derived from current ceilings, so it updates as the search streams. Possibly confusing if the recommendation flips mid-search. Initial behavior: show as soon as any choice has `evaluated: true`; let it update with the stream.
+- **Future: progressive `partial` flag during search** — out of scope. Today, `partial` only matters at the cap, which fires rarely.