- Skip already-completed model-day pairs in create_job()
- Return warnings for skipped simulations
- Raise error if all simulations are already completed
- Update create_job() return type from str to Dict[str, Any]
- Update all callers to handle new dict return type
- Add comprehensive test coverage for duplicate detection
- Log warnings when simulations are skipped
When a Docker container is shutdown and restarted, jobs with status
'pending', 'downloading_data', or 'running' remained in the database,
preventing new jobs from starting due to concurrency control checks.
This commit adds automatic cleanup of stale jobs during FastAPI startup:
- New cleanup_stale_jobs() method in JobManager (api/job_manager.py:702-779)
- Integrated into FastAPI lifespan startup (api/main.py:164-168)
- Intelligent status determination based on completion percentage:
- 'partial' if any model-days completed (preserves progress data)
- 'failed' if no progress made
- Detailed error messages with original status and completion counts
- Marks incomplete job_details as 'failed' with clear error messages
- Deployment-aware: skips cleanup in DEV mode when DB is reset
- Comprehensive logging at warning level for visibility
Testing:
- 6 new unit tests covering all cleanup scenarios (451-609)
- All 30 existing job_manager tests still pass
- Tests verify pending, running, downloading_data, partial progress,
no stale jobs, and multiple stale jobs scenarios
Resolves issue where container restarts left stale jobs blocking the
can_start_new_job() concurrency check.