feat: transform to REST API service with SQLite persistence (v0.3.0)

Major architecture transformation from batch-only to API service with database persistence for Windmill integration. ## REST API Implementation - POST /simulate/trigger - Start simulation jobs - GET /simulate/status/{job_id} - Monitor job progress - GET /results - Query results with filters (job_id, date, model) - GET /health - Service health checks ## Database Layer - SQLite persistence with 6 tables (jobs, job_details, positions, holdings, reasoning_logs, tool_usage) - Foreign key constraints with cascade deletes - Replaces JSONL file storage ## Backend Components - JobManager: Job lifecycle management with concurrency control - RuntimeConfigManager: Thread-safe isolated runtime configs - ModelDayExecutor: Single model-day execution engine - SimulationWorker: Date-sequential, model-parallel orchestration ## Testing - 102 unit and integration tests (85% coverage) - Database: 98% coverage - Job manager: 98% coverage - API endpoints: 81% coverage - Pydantic models: 100% coverage - TDD approach throughout ## Docker Deployment - Dual-mode: API server (persistent) + batch (one-time) - Health checks with 30s interval - Volume persistence for database and logs - Separate entrypoints for each mode ## Validation Tools - scripts/validate_docker_build.sh - Build validation - scripts/test_api_endpoints.sh - Complete API testing - scripts/test_batch_mode.sh - Batch mode validation - DOCKER_API.md - Deployment guide - TESTING_GUIDE.md - Testing procedures ## Configuration - API_PORT environment variable (default: 8080) - Backwards compatible with existing configs - FastAPI, uvicorn, pydantic>=2.0 dependencies Co-Authored-By: AI Assistant <noreply@example.com>
2026-04-01 17:17:24 -04:00 · 2025-10-31 11:47:10 -04:00
parent 5da02b4ba0
commit fb9583b374
45 changed files with 13775 additions and 18 deletions
--- a/docs/ENHANCED-SPECIFICATIONS-SUMMARY.md
+++ b/docs/ENHANCED-SPECIFICATIONS-SUMMARY.md
@@ -0,0 +1,631 @@
+# AI-Trader API Service - Enhanced Specifications Summary
+
+## Changes from Original Specifications
+
+Based on user feedback, the specifications have been enhanced with:
+
+1. **SQLite-backed results storage** (instead of reading position.jsonl on-demand)
+2. **Comprehensive Python testing suite** with pytest
+3. **Defined testing thresholds** for coverage, performance, and quality gates
+
+---
+
+## Document Index
+
+### Core Specifications (Original)
+1. **[api-specification.md](./api-specification.md)** - REST API endpoints and data models
+2. **[job-manager-specification.md](./job-manager-specification.md)** - Job tracking and database layer
+3. **[worker-specification.md](./worker-specification.md)** - Background worker architecture
+4. **[implementation-specifications.md](./implementation-specifications.md)** - Agent, Docker, Windmill integration
+
+### Enhanced Specifications (New)
+5. **[database-enhanced-specification.md](./database-enhanced-specification.md)** - SQLite results storage
+6. **[testing-specification.md](./testing-specification.md)** - Comprehensive testing suite
+
+### Summary Documents
+7. **[README-SPECS.md](./README-SPECS.md)** - Original specifications overview
+8. **[ENHANCED-SPECIFICATIONS-SUMMARY.md](./ENHANCED-SPECIFICATIONS-SUMMARY.md)** - This document
+
+---
+
+## Key Enhancement #1: SQLite Results Storage
+
+### What Changed
+
+**Before:**
+- `/results` endpoint reads `position.jsonl` files on-demand
+- File I/O on every API request
+- No support for advanced queries (date ranges, aggregations)
+
+**After:**
+- Simulation results written to SQLite during execution
+- Fast database queries (10-100x faster than file I/O)
+- Advanced analytics: timeseries, leaderboards, aggregations
+
+### New Database Tables
+
+```sql
+-- Results storage
+CREATE TABLE positions (
+    id INTEGER PRIMARY KEY,
+    job_id TEXT,
+    date TEXT,
+    model TEXT,
+    action_id INTEGER,
+    action_type TEXT,
+    symbol TEXT,
+    amount INTEGER,
+    price REAL,
+    cash REAL,
+    portfolio_value REAL,
+    daily_profit REAL,
+    daily_return_pct REAL,
+    cumulative_profit REAL,
+    cumulative_return_pct REAL,
+    created_at TEXT,
+    FOREIGN KEY (job_id) REFERENCES jobs(job_id)
+);
+
+CREATE TABLE holdings (
+    id INTEGER PRIMARY KEY,
+    position_id INTEGER,
+    symbol TEXT,
+    quantity INTEGER,
+    FOREIGN KEY (position_id) REFERENCES positions(id)
+);
+
+CREATE TABLE reasoning_logs (
+    id INTEGER PRIMARY KEY,
+    job_id TEXT,
+    date TEXT,
+    model TEXT,
+    step_number INTEGER,
+    timestamp TEXT,
+    role TEXT,
+    content TEXT,
+    tool_name TEXT,
+    FOREIGN KEY (job_id) REFERENCES jobs(job_id)
+);
+
+CREATE TABLE tool_usage (
+    id INTEGER PRIMARY KEY,
+    job_id TEXT,
+    date TEXT,
+    model TEXT,
+    tool_name TEXT,
+    call_count INTEGER,
+    total_duration_seconds REAL,
+    FOREIGN KEY (job_id) REFERENCES jobs(job_id)
+);
+```
+
+### New API Endpoints
+
+```python
+# Enhanced results endpoint (now reads from SQLite)
+GET /results?date=2025-01-16&model=gpt-5&detail=minimal|full
+
+# New analytics endpoints
+GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31
+GET /leaderboard?date=2025-01-16  # Rankings by portfolio value
+```
+
+### Migration Strategy
+
+**Phase 1:** Dual-write mode
+- Agent writes to `position.jsonl` (existing code)
+- Executor writes to SQLite after agent completes
+- Ensures backward compatibility
+
+**Phase 2:** Verification
+- Compare SQLite data vs JSONL data
+- Fix any discrepancies
+
+**Phase 3:** Switch over
+- `/results` endpoint reads from SQLite
+- JSONL writes become optional (can deprecate later)
+
+### Performance Improvement
+
+| Operation | Before (JSONL) | After (SQLite) | Speedup |
+|-----------|----------------|----------------|---------|
+| Get results for 1 date | 200-500ms | 20-50ms | **10x faster** |
+| Get timeseries (30 days) | 6-15 seconds | 100-300ms | **50x faster** |
+| Get leaderboard | 5-10 seconds | 50-100ms | **100x faster** |
+
+---
+
+## Key Enhancement #2: Comprehensive Testing Suite
+
+### Testing Thresholds
+
+| Metric | Minimum | Target | Enforcement |
+|--------|---------|--------|-------------|
+| **Code Coverage** | 85% | 90% | CI fails if below |
+| **Critical Path Coverage** | 90% | 95% | Manual review |
+| **Unit Test Speed** | <10s | <5s | Benchmark tracking |
+| **Integration Test Speed** | <60s | <30s | Benchmark tracking |
+| **API Response Times** | <500ms | <200ms | Load testing |
+
+### Test Suite Structure
+
+```
+tests/
+├── unit/                          # 80 tests, <10 seconds
+│   ├── test_job_manager.py        # 95% coverage target
+│   ├── test_database.py
+│   ├── test_runtime_manager.py
+│   ├── test_results_service.py    # 95% coverage target
+│   └── test_models.py
+│
+├── integration/                   # 30 tests, <60 seconds
+│   ├── test_api_endpoints.py      # Full FastAPI testing
+│   ├── test_worker.py
+│   ├── test_executor.py
+│   └── test_end_to_end.py
+│
+├── performance/                   # 20 tests
+│   ├── test_database_benchmarks.py
+│   ├── test_api_load.py           # Locust load testing
+│   └── test_simulation_timing.py
+│
+├── security/                      # 10 tests
+│   ├── test_api_security.py       # SQL injection, XSS, path traversal
+│   └── test_auth.py               # Future: API key validation
+│
+└── e2e/                           # 10 tests, Docker required
+    └── test_docker_workflow.py    # Full Docker compose scenario
+```
+
+### Quality Gates
+
+**All PRs must pass:**
+1. ✅ All tests passing (unit + integration)
+2. ✅ Code coverage ≥ 85%
+3. ✅ No critical security vulnerabilities (Bandit scan)
+4. ✅ Linting passes (Ruff or Flake8)
+5. ✅ Type checking passes (mypy strict mode)
+6. ✅ No performance regressions (±10% tolerance)
+
+**Release checklist:**
+1. ✅ All quality gates pass
+2. ✅ End-to-end tests pass in Docker
+3. ✅ Load testing passes (100 concurrent requests)
+4. ✅ Security scan passes (OWASP ZAP)
+5. ✅ Manual smoke tests complete
+
+### CI/CD Integration
+
+```yaml
+# .github/workflows/test.yml
+name: Test Suite
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Run unit tests
+        run: pytest tests/unit/ --cov=api --cov-fail-under=85
+      - name: Run integration tests
+        run: pytest tests/integration/
+      - name: Security scan
+        run: bandit -r api/ -ll
+      - name: Upload coverage
+        uses: codecov/codecov-action@v3
+```
+
+### Test Coverage Breakdown
+
+| Component | Minimum | Target | Tests |
+|-----------|---------|--------|-------|
+| `api/job_manager.py` | 90% | 95% | 25 tests |
+| `api/worker.py` | 85% | 90% | 15 tests |
+| `api/executor.py` | 85% | 90% | 12 tests |
+| `api/results_service.py` | 90% | 95% | 18 tests |
+| `api/database.py` | 95% | 100% | 10 tests |
+| `api/runtime_manager.py` | 85% | 90% | 8 tests |
+| `api/main.py` | 80% | 85% | 20 tests |
+| **Total** | **85%** | **90%** | **~150 tests** |
+
+---
+
+## Updated Implementation Plan
+
+### Phase 1: API Foundation (Days 1-2)
+- [x] Create `api/` directory structure
+- [ ] Implement `api/models.py` with Pydantic models
+- [ ] Implement `api/database.py` with **enhanced schema** (6 tables)
+- [ ] Implement `api/job_manager.py` with job CRUD operations
+- [ ] **NEW:** Write unit tests for job_manager (target: 95% coverage)
+- [ ] Test database operations manually
+
+**Testing Deliverables:**
+- 25 unit tests for job_manager
+- 10 unit tests for database utilities
+- 85%+ coverage for Phase 1 code
+
+---
+
+### Phase 2: Worker & Executor (Days 3-4)
+- [ ] Implement `api/runtime_manager.py`
+- [ ] Implement `api/executor.py` for single model-day execution
+- [ ] **NEW:** Add SQLite write logic to executor (`_store_results_to_db()`)
+- [ ] Implement `api/worker.py` for job orchestration
+- [ ] **NEW:** Write unit tests for worker and executor (target: 85% coverage)
+- [ ] Test runtime config isolation
+
+**Testing Deliverables:**
+- 15 unit tests for worker
+- 12 unit tests for executor
+- 8 unit tests for runtime_manager
+- 85%+ coverage for Phase 2 code
+
+---
+
+### Phase 3: Results Service & FastAPI Endpoints (Days 5-6)
+- [ ] **NEW:** Implement `api/results_service.py` (SQLite-backed)
+  - [ ] `get_results(date, model, detail)`
+  - [ ] `get_portfolio_timeseries(model, start_date, end_date)`
+  - [ ] `get_leaderboard(date)`
+- [ ] Implement `api/main.py` with all endpoints
+  - [ ] `/simulate/trigger` with background tasks
+  - [ ] `/simulate/status/{job_id}`
+  - [ ] `/simulate/current`
+  - [ ] `/results` (now reads from SQLite)
+  - [ ] **NEW:** `/portfolio/timeseries`
+  - [ ] **NEW:** `/leaderboard`
+  - [ ] `/health` with MCP checks
+- [ ] **NEW:** Write unit tests for results_service (target: 95% coverage)
+- [ ] **NEW:** Write integration tests for API endpoints (target: 80% coverage)
+- [ ] Test all endpoints with Postman/curl
+
+**Testing Deliverables:**
+- 18 unit tests for results_service
+- 20 integration tests for API endpoints
+- Performance benchmarks for database queries
+- 85%+ coverage for Phase 3 code
+
+---
+
+### Phase 4: Docker Integration (Day 7)
+- [ ] Update `Dockerfile`
+- [ ] Create `docker-entrypoint-api.sh`
+- [ ] Create `requirements-api.txt`
+- [ ] Update `docker-compose.yml`
+- [ ] Test Docker build
+- [ ] Test container startup and health checks
+- [ ] **NEW:** Run E2E tests in Docker environment
+- [ ] Test end-to-end simulation via API in Docker
+
+**Testing Deliverables:**
+- 10 E2E tests with Docker
+- Docker health check validation
+- Performance testing in containerized environment
+
+---
+
+### Phase 5: Windmill Integration (Days 8-9)
+- [ ] Create Windmill scripts (trigger, poll, store)
+- [ ] **UPDATED:** Modify `store_simulation_results.py` to use new `/results` endpoint
+- [ ] Test scripts locally against Docker API
+- [ ] Deploy scripts to Windmill instance
+- [ ] Create Windmill workflow
+- [ ] Test workflow end-to-end
+- [ ] Create Windmill dashboard (using new `/portfolio/timeseries` and `/leaderboard` endpoints)
+- [ ] Document Windmill setup process
+
+**Testing Deliverables:**
+- Integration tests for Windmill scripts
+- End-to-end workflow validation
+- Dashboard functionality verification
+
+---
+
+### Phase 6: Testing, Security & Documentation (Day 10)
+- [ ] **NEW:** Run full test suite and verify all thresholds met
+  - [ ] Code coverage ≥ 85%
+  - [ ] All ~150 tests passing
+  - [ ] Performance benchmarks within limits
+- [ ] **NEW:** Security testing
+  - [ ] Bandit scan (Python security issues)
+  - [ ] SQL injection tests
+  - [ ] Input validation tests
+  - [ ] OWASP ZAP scan (optional)
+- [ ] **NEW:** Load testing with Locust
+  - [ ] 100 concurrent users
+  - [ ] API endpoints within performance thresholds
+- [ ] Integration tests for complete workflow
+- [ ] Update README.md with API usage
+- [ ] Create API documentation (Swagger/OpenAPI - auto-generated by FastAPI)
+- [ ] Create deployment guide
+- [ ] Create troubleshooting guide
+- [ ] **NEW:** Generate test coverage report
+
+**Testing Deliverables:**
+- Full test suite execution report
+- Security scan results
+- Load testing results
+- Coverage report (HTML + XML)
+- CI/CD pipeline configuration
+
+---
+
+## New Files Created
+
+### Database & Results
+- `api/results_service.py` - SQLite-backed results retrieval
+- `api/import_historical_data.py` - Migration script for existing position.jsonl files
+
+### Testing Suite
+- `tests/conftest.py` - Shared pytest fixtures
+- `tests/unit/test_job_manager.py` - 25 tests
+- `tests/unit/test_database.py` - 10 tests
+- `tests/unit/test_runtime_manager.py` - 8 tests
+- `tests/unit/test_results_service.py` - 18 tests
+- `tests/unit/test_models.py` - 5 tests
+- `tests/integration/test_api_endpoints.py` - 20 tests
+- `tests/integration/test_worker.py` - 15 tests
+- `tests/integration/test_executor.py` - 12 tests
+- `tests/integration/test_end_to_end.py` - 5 tests
+- `tests/performance/test_database_benchmarks.py` - 10 tests
+- `tests/performance/test_api_load.py` - Locust load testing
+- `tests/security/test_api_security.py` - 10 tests
+- `tests/e2e/test_docker_workflow.py` - 10 tests
+- `pytest.ini` - Test configuration
+- `requirements-dev.txt` - Testing dependencies
+
+### CI/CD
+- `.github/workflows/test.yml` - GitHub Actions workflow
+
+---
+
+## Updated File Structure
+
+```
+AI-Trader/
+├── api/
+│   ├── __init__.py
+│   ├── main.py                      # FastAPI application
+│   ├── models.py                    # Pydantic request/response models
+│   ├── job_manager.py               # Job lifecycle management
+│   ├── database.py                  # SQLite utilities (enhanced schema)
+│   ├── worker.py                    # Background simulation worker
+│   ├── executor.py                  # Single model-day execution (+ SQLite writes)
+│   ├── runtime_manager.py           # Runtime config isolation
+│   ├── results_service.py           # NEW: SQLite-backed results retrieval
+│   └── import_historical_data.py    # NEW: JSONL → SQLite migration
+│
+├── tests/                           # NEW: Comprehensive test suite
+│   ├── conftest.py
+│   ├── unit/                        # 80 tests, <10s
+│   ├── integration/                 # 30 tests, <60s
+│   ├── performance/                 # 20 tests
+│   ├── security/                    # 10 tests
+│   └── e2e/                         # 10 tests
+│
+├── docs/
+│   ├── api-specification.md
+│   ├── job-manager-specification.md
+│   ├── worker-specification.md
+│   ├── implementation-specifications.md
+│   ├── database-enhanced-specification.md    # NEW
+│   ├── testing-specification.md              # NEW
+│   ├── README-SPECS.md
+│   └── ENHANCED-SPECIFICATIONS-SUMMARY.md    # NEW (this file)
+│
+├── data/
+│   ├── jobs.db                      # SQLite database (6 tables)
+│   ├── runtime_env*.json            # Runtime configs (temporary)
+│   ├── agent_data/                  # Existing position/log data
+│   └── merged.jsonl                 # Existing price data
+│
+├── pytest.ini                       # NEW: Test configuration
+├── requirements-dev.txt             # NEW: Testing dependencies
+├── .github/workflows/test.yml       # NEW: CI/CD pipeline
+└── ... (existing files)
+```
+
+---
+
+## Benefits Summary
+
+### Performance
+- **10-100x faster** results queries (SQLite vs file I/O)
+- **Advanced analytics** - timeseries, leaderboards, aggregations in milliseconds
+- **Optimized indexes** for common queries
+
+### Quality
+- **85% minimum coverage** enforced by CI/CD
+- **150 comprehensive tests** across unit, integration, performance, security
+- **Quality gates** prevent regressions
+- **Type safety** with mypy strict mode
+
+### Maintainability
+- **SQLite single source of truth** - easier backup, restore, migration
+- **Automated testing** catches bugs early
+- **CI/CD integration** provides fast feedback on every commit
+- **Security scanning** prevents vulnerabilities
+
+### Analytics Capabilities
+
+**New queries enabled by SQLite:**
+
+```python
+# Portfolio timeseries for charting
+GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31
+
+# Model leaderboard
+GET /leaderboard?date=2025-01-31
+
+# Advanced filtering (future)
+SELECT * FROM positions
+WHERE daily_return_pct > 2.0
+ORDER BY portfolio_value DESC;
+
+# Aggregations (future)
+SELECT model, AVG(daily_return_pct) as avg_return
+FROM positions
+GROUP BY model
+ORDER BY avg_return DESC;
+```
+
+---
+
+## Migration from Original Spec
+
+If you've already started implementation based on original specs:
+
+### Step 1: Database Schema Migration
+```sql
+-- Run enhanced schema creation
+-- See database-enhanced-specification.md Section 2.1
+```
+
+### Step 2: Add Results Service
+```bash
+# Create new file
+touch api/results_service.py
+# Implement as per database-enhanced-specification.md Section 4.1
+```
+
+### Step 3: Update Executor
+```python
+# In api/executor.py, add after agent.run_trading_session():
+self._store_results_to_db(job_id, date, model_sig)
+```
+
+### Step 4: Update API Endpoints
+```python
+# In api/main.py, update /results endpoint to use ResultsService
+from api.results_service import ResultsService
+results_service = ResultsService()
+
+@app.get("/results")
+async def get_results(...):
+    return results_service.get_results(date, model, detail)
+```
+
+### Step 5: Add Test Suite
+```bash
+mkdir -p tests/{unit,integration,performance,security,e2e}
+# Create test files as per testing-specification.md Section 4-8
+```
+
+### Step 6: Configure CI/CD
+```bash
+mkdir -p .github/workflows
+# Create test.yml as per testing-specification.md Section 10.1
+```
+
+---
+
+## Testing Execution Guide
+
+### Run Unit Tests
+```bash
+pytest tests/unit/ -v --cov=api --cov-report=term-missing
+```
+
+### Run Integration Tests
+```bash
+pytest tests/integration/ -v
+```
+
+### Run All Tests (Except E2E)
+```bash
+pytest tests/ -v --ignore=tests/e2e/ --cov=api --cov-report=html
+```
+
+### Run E2E Tests (Requires Docker)
+```bash
+pytest tests/e2e/ -v -s
+```
+
+### Run Performance Benchmarks
+```bash
+pytest tests/performance/ --benchmark-only
+```
+
+### Run Security Tests
+```bash
+pytest tests/security/ -v
+bandit -r api/ -ll
+```
+
+### Generate Coverage Report
+```bash
+pytest tests/unit/ tests/integration/ --cov=api --cov-report=html
+open htmlcov/index.html  # View in browser
+```
+
+### Run Load Tests
+```bash
+locust -f tests/performance/test_api_load.py --host=http://localhost:8080
+# Open http://localhost:8089 for Locust UI
+```
+
+---
+
+## Questions & Next Steps
+
+### Review Checklist
+
+Please review:
+1. ✅ **Enhanced database schema** with 6 tables for comprehensive results storage
+2. ✅ **Migration strategy** for backward compatibility (dual-write mode)
+3. ✅ **Testing thresholds** (85% coverage minimum, performance benchmarks)
+4. ✅ **Test suite structure** (150 tests across 5 categories)
+5. ✅ **CI/CD integration** with quality gates
+6. ✅ **Updated implementation plan** (10 days, 6 phases)
+
+### Questions to Consider
+
+1. **Database migration timing:** Start with dual-write mode immediately, or add in Phase 2?
+2. **Testing priorities:** Should we implement tests alongside features (TDD) or after each phase?
+3. **CI/CD platform:** GitHub Actions (as specified) or different platform?
+4. **Performance baselines:** Should we run benchmarks before implementation to track improvement?
+5. **Security priorities:** Which security tests are MVP vs nice-to-have?
+
+### Ready to Implement?
+
+**Option A:** Approve specifications and begin Phase 1 implementation
+- Create API directory structure
+- Implement enhanced database schema
+- Write unit tests for database layer
+- Target: 2 days, 90%+ coverage for database code
+
+**Option B:** Request modifications to specifications
+- Clarify any unclear requirements
+- Adjust testing thresholds
+- Modify implementation timeline
+
+**Option C:** Implement in parallel workstreams
+- Workstream 1: Core API (Phases 1-3)
+- Workstream 2: Testing suite (parallel with Phase 1-3)
+- Workstream 3: Docker + Windmill (Phases 4-5)
+- Benefits: Faster delivery, more parallelization
+- Requires: Clear interfaces between components
+
+---
+
+## Summary
+
+**Enhanced specifications** add:
+1. 🗄️ **SQLite results storage** - 10-100x faster queries, advanced analytics
+2. 🧪 **Comprehensive testing** - 150 tests, 85% coverage, quality gates
+3. 🔒 **Security testing** - SQL injection, XSS, input validation
+4. ⚡ **Performance benchmarks** - Catch regressions early
+5. 🚀 **CI/CD pipeline** - Automated quality checks on every commit
+
+**Total effort:** Still ~10 days, but with significantly higher code quality and confidence in deployments.
+
+**Risk mitigation:** Extensive testing catches bugs before production, preventing costly hotfixes.
+
+**Long-term value:** Maintainable, well-tested codebase enables rapid feature development.
+
+---
+
+Ready to proceed? Please provide feedback or approval to begin implementation!