v1.3.0: Streamed top-K decision-tree plans + priority-aware ceiling

Fixes the bug where a specialization could show "Achievable" while no per-set ceiling cell surfaces a path to it. Reproduction: pin SP2=Business of Health & Medical Care, SP4=Foundations of Fintech, SP5=Corporate Finance, SE1=GIE; rank HCR first. Healthcare showed Achievable but every ceiling cell excluded HCR. Root cause: computeCeiling used strict > on count alone, so the first equal-count combination found won permanently and HCR-including outcomes were never recorded. Changes: - Replace per-(set, choice) computeCeiling loop with a single full-tree searchDecisionTree DFS. Both the per-set ceiling table and a new ranked top-K plan list (default K=10) are populated from one enumeration. - Comparison rule everywhere is (count desc, priority score desc, deterministic-tiebreak). priorityScore extracted from optimizer.ts into a shared priority.ts module used by both call sites. - Heuristic enumeration ordering: select the first reachable ranked spec as priorityTarget; reorder DFS children at every level so target- qualifying courses are tried first. High-priority outcomes surface in early iterations instead of being blocked by less-relevant equal-count results. - Bounded search: terminate on saturation (top-K stable for 500 iterations) or hard cap (10000 iterations); set partial=true if cap hit. Mitigates the worst-case enumeration cost. - Worker protocol: tagged-union response with topKUpdate, choiceUpdate (per-cell, replaces per-set setComplete), and allComplete events. - App state adds topPlans/topPlansPartial slices and an adoptPlan action that pins a plan's full course assignment in one click. Also fixes loadState's stale "ranking.length !== 14" check (now uses SPECIALIZATIONS.length so HCR-era saved state restores correctly). - New TopPlans component renders the ranked list with adopt buttons, placed above CourseSelection in the right column. - 17 new tests in searchDecisionTree.test.ts covering priority scoring, bounded ranked list, comparison rule, target selection, the user's reproduction scenario, streaming monotonicity, saturation termination, and a performance smoke test (< 5s for the 8-open-set case). - Existing decisionTree.test.ts: one test amended for per-cell streaming semantics; remaining 3 unchanged and passing.
2026-05-09 14:51:32 -04:00
parent 4d6f81d1e5
commit 4b80fac500
15 changed files with 1099 additions and 145 deletions
@@ -0,0 +1,130 @@
+## Context
+
+The EMBA Specialization Solver's "Decision Tree" view computes, for each open elective set, the ceiling outcome (best achievable specialization count and which specs) for each course choice. Implementation: `analyzeDecisionTree` (`app/src/solver/decisionTree.ts:90`) runs a per-(set, choice) loop calling `computeCeiling`, which itself enumerates the cartesian product of remaining open sets, runs the optimizer per leaf, and returns the best result by count.
+
+After adding the Healthcare specialization (J27 update), a contradiction surfaced: HCR shows status "Achievable" but no per-set ceiling cell shows HCR as part of its outcome. Reproduction:
+
+```
+Pin: SP2=spr2-health-medical, SP4=spr4-fintech,
+     SP5=spr5-corporate-finance, SE1=sum1-global-immersion
+Rank: HCR first
+Result: HCR status = 'achievable' (upper bound = 10 ≥ 9)
+        Decision tree: 0 of 32 ceilings include HCR
+```
+
+Diagnostic test confirmed: `priorityOrder` returns `[HCR, BNK]` when fed an HCR-friendly 12-course pin set, so HCR genuinely *is* achievable. The bug is in `computeCeiling`'s comparison (`decisionTree.ts:55`):
+
+```ts
+if (result.achieved.length > bestCount) {
+  bestCount = result.achieved.length;
+  bestSpecs = result.achieved;
+}
+```
+
+Strict `>` means the first equal-count result found wins permanently. Combined with declaration-order enumeration, finance-heavy combinations (which appear early in the tree) yield non-HCR `[FIN, MTO]` outcomes that block HCR-including outcomes from ever being recorded.
+
+The user also wants a richer view than per-set ceilings: a streamed ranked list of complete plans (`PlanOutcome`s, top K=10), each with its full course assignment, achieved specs, and priority score, so they can pick a complete plan rather than reasoning about set choices independently.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Decision-tree outcomes that include the user's top-priority spec surface naturally — both in the per-set table and in a new ranked top-K plan list
+- One enumeration produces both views (no duplicated work)
+- Both views update progressively with monotonic improvement (entries only enter or move up)
+- Search is bounded: terminates on saturation (top-K stable) or hard iteration cap, with a `partial` flag if cap hit
+- "Achievable" status stays permissive (per user's intent: it indicates reachability anywhere in the tree, regardless of whether a path has been found)
+
+**Non-Goals:**
+- Replacing the per-set ceiling table — both views remain
+- Restructuring the optimizer or LP feasibility checker
+- Changing optimizer score weights or rank tiebreakers
+- Designing the visual placement of the new "Top Plans" panel — out of scope here, follow-up brainstorm
+- User-configurable K — fixed at 10 for this change
+
+## Decisions
+
+### Single full-tree DFS instead of nested per-choice loop
+
+Today's structure: outer loop over (setId, choice), each calling `computeCeiling`, which itself enumerates remaining sets. That's `O(sets × choices × ∏ other-sets-courses)` redundant work — every full path is enumerated up to `setCount` times.
+
+New structure: one DFS over the cartesian product of all open-set courses. Each leaf evaluates the optimizer once. Per-set ceilings update as side effects ("for each (setId, courseId) in this combination, is this leaf's outcome better than the current ceiling for that cell?"). Top-K updates as side effects too.
+
+**Alternative considered:** Keep the nested loop and just fix the comparison. Rejected — the algorithm needs to materialize complete plans anyway for the top-K view, and the nested loop's per-choice context isn't useful for that. Switching paradigms is cleaner than bolting top-K onto two enumeration layers.
+
+### Comparison tuple `(count, priorityScore, deterministic-tiebreak)`
+
+`priorityScore(specs, ranking)` matches the optimizer's existing definition (`optimizer.ts:71-74`): `sum over specs of (15 - rankIndex(spec))`. Same formula in both modules to avoid drift; extracted into a shared utility.
+
+Tiebreaker on a deterministic hash of `courseAssignments` ensures streaming order is stable across runs and across worker restarts. Without it, two equally-ranked plans could "swap" position on every emit, causing UI flicker.
+
+**Alternative considered:** Compare only `(count, priorityScore)` and accept whichever inserted first when equal. Rejected — non-deterministic order makes monotonicity tests unstable and produces visible flicker if two plans tie.
+
+### `priorityTarget` heuristic = first reachable spec in user's ranking
+
+Selected once per `analyzeDecisionTree` call. We walk the ranking in order and pick the first specId whose `upperBound >= 9`. If no spec is reachable, `priorityTarget = null` and reordering is skipped (no-op).
+
+Why "reachable" not just "ranking[0]": if the user's #1 spec has no possible path to 9 credits given the pinned + open universe, prioritizing it would just delay finding good results. Walking to the first reachable one is cheap (one upper-bound array lookup per spec).
+
+**Alternative considered:** Always use `ranking[0]` regardless of reachability. Rejected — wastes the heuristic on impossible specs in cases where the user has a long ranking and their top picks are gated by missed required courses.
+
+### Heuristic ordering of DFS children
+
+Per open set, courses qualifying for `priorityTarget` move to the front (stable sort, ties keep declaration order). Cancelled courses still skipped (existing behavior).
+
+This causes the FIRST combinations evaluated to include all `priorityTarget`-qualifying choices simultaneously. With the user's ranking (HCR first), the optimizer evaluates an HCR-feasible pin set on iteration 1 and inserts an HCR-achieving outcome immediately into top-K and the relevant per-set ceilings.
+
+**Alternative considered:** Branch-and-bound style pruning. Rejected — significantly more code, harder to verify correct, and the simple reordering already gives ~order-of-magnitude speedup for the common case.
+
+### Two complementary terminators: hard cap + saturation
+
+- `MAX_TREE_ITERATIONS = 10000`: absolute upper bound. Returns `{ partial: true }` if hit.
+- `SATURATION_LIMIT = 500`: stop if top-K hasn't changed in the last 500 iterations.
+
+Saturation handles the typical case (top-K converges quickly with the heuristic). Hard cap handles pathological cases (large open-set count, long search space).
+
+**Alternative considered:** Time-based cap (e.g., 5000ms). Rejected — JS time measurement in a worker is fiddly, and iteration count is a more deterministic test surface. Time cap could be added later if needed.
+
+**Alternative considered:** Run to exhaustion. Rejected — for ≥8 open sets the cartesian product is in the tens of thousands; full enumeration is O(seconds–minutes) and provides diminishing returns once top-K saturates.
+
+### `BoundedRankedList<T>` as a sorted array, not a heap
+
+K ≤ 50 in practice. Insertion sort is `O(K)` per insert. A heap would shave a constant factor but complicates the "did the visible list change?" check (which drives the streaming emits). The simpler structure is fast enough and easier to reason about.
+
+### Worker emits per-cell `choiceUpdate`, not per-set `setComplete`
+
+Today, the worker emits one event when an entire set's analysis finishes. Under streaming, a set's ceilings update incrementally as combinations are evaluated. Per-cell events let the UI re-render exactly the changed cell instead of re-rendering the whole set's row.
+
+**Alternative considered:** Coalesce per-set events on a 100ms timer. Rejected for now — per-cell is simpler and the message volume (a few hundred events per analysis, each <1KB) is well within worker `postMessage` throughput. Coalescing can be added later in the UI layer if needed.
+
+### "Achievable" status semantics unchanged
+
+Per user's stated intent: "Achievable" should mean "the spec is reachable somewhere in the remaining decision tree, regardless of priority." The current implementation (`optimizer.ts:185-194`) already does this — it checks the upper bound and returns `achievable` when open sets exist, without verifying joint feasibility with achieved specs.
+
+This change preserves that semantics. The UX contradiction the user reported ("Achievable but no path shows it") is fixed by making the top-K and per-set views actually find the path, not by tightening the status check.
+
+## Risks / Trade-offs
+
+- **Performance regression risk** → Mitigation: heuristic ordering should make typical case faster than today (saturates well before hard cap); performance smoke test verifies user's scenario completes in <5s for K=10 in worker
+- **Worker message volume** (50–500 small events per analysis) → Mitigation: each event <1KB; UI can coalesce with `requestAnimationFrame` if profiling shows main-thread pressure; defer
+- **Stable streaming order** depends on deterministic hash of `courseAssignments` → Mitigation: explicit tiebreaker test; document the hash function as part of the public contract
+- **Two views displaying inconsistent info briefly** during streaming (top-K shows HCR plan, per-set table cell still shows old ceiling for one beat) → Acceptable; both converge on the same data within a few hundred ms
+- **K=10 fixed** → User-facing limitation; if 10 isn't enough we can ship a follow-up making it configurable. Defer.
+
+## Migration Plan
+
+Single-PR change. No data migration. Steps:
+
+1. Land algorithm + worker + state changes; new "Top Plans" component starts hidden behind a feature flag (or simply absent from the layout) — user-facing UI is added in a sibling commit/PR
+2. Verify all existing decision-tree tests pass (with priority-tiebreak amendments)
+3. Verify regression test for user's scenario passes
+4. Add Top Plans component to layout
+5. Browser-verify both views update progressively
+6. Bump version (`1.3.0`), CHANGELOG entry, ship
+
+Rollback: revert. The change is internal to the decision-tree module and worker protocol; no persistent state to migrate back.
+
+## Open Questions
+
+- **UI layout** for the Top Plans panel — handled in a follow-up brainstorm focused on UX
+- **`MAX_TREE_ITERATIONS = 10000` / `SATURATION_LIMIT = 500`** — initial values; may need tuning after browser-side measurement on representative inputs
+- **Worker message coalescing** — defer until profiling shows it's needed