Major architecture transformation from batch-only to API service with
database persistence for Windmill integration.
## REST API Implementation
- POST /simulate/trigger - Start simulation jobs
- GET /simulate/status/{job_id} - Monitor job progress
- GET /results - Query results with filters (job_id, date, model)
- GET /health - Service health checks
## Database Layer
- SQLite persistence with 6 tables (jobs, job_details, positions,
holdings, reasoning_logs, tool_usage)
- Foreign key constraints with cascade deletes
- Replaces JSONL file storage
## Backend Components
- JobManager: Job lifecycle management with concurrency control
- RuntimeConfigManager: Thread-safe isolated runtime configs
- ModelDayExecutor: Single model-day execution engine
- SimulationWorker: Date-sequential, model-parallel orchestration
## Testing
- 102 unit and integration tests (85% coverage)
- Database: 98% coverage
- Job manager: 98% coverage
- API endpoints: 81% coverage
- Pydantic models: 100% coverage
- TDD approach throughout
## Docker Deployment
- Dual-mode: API server (persistent) + batch (one-time)
- Health checks with 30s interval
- Volume persistence for database and logs
- Separate entrypoints for each mode
## Validation Tools
- scripts/validate_docker_build.sh - Build validation
- scripts/test_api_endpoints.sh - Complete API testing
- scripts/test_batch_mode.sh - Batch mode validation
- DOCKER_API.md - Deployment guide
- TESTING_GUIDE.md - Testing procedures
## Configuration
- API_PORT environment variable (default: 8080)
- Backwards compatible with existing configs
- FastAPI, uvicorn, pydantic>=2.0 dependencies
Co-Authored-By: AI Assistant <noreply@example.com>
20 KiB
AI-Trader API Service - Enhanced Specifications Summary
Changes from Original Specifications
Based on user feedback, the specifications have been enhanced with:
- SQLite-backed results storage (instead of reading position.jsonl on-demand)
- Comprehensive Python testing suite with pytest
- Defined testing thresholds for coverage, performance, and quality gates
Document Index
Core Specifications (Original)
- api-specification.md - REST API endpoints and data models
- job-manager-specification.md - Job tracking and database layer
- worker-specification.md - Background worker architecture
- implementation-specifications.md - Agent, Docker, Windmill integration
Enhanced Specifications (New)
- database-enhanced-specification.md - SQLite results storage
- testing-specification.md - Comprehensive testing suite
Summary Documents
- README-SPECS.md - Original specifications overview
- ENHANCED-SPECIFICATIONS-SUMMARY.md - This document
Key Enhancement #1: SQLite Results Storage
What Changed
Before:
/resultsendpoint readsposition.jsonlfiles on-demand- File I/O on every API request
- No support for advanced queries (date ranges, aggregations)
After:
- Simulation results written to SQLite during execution
- Fast database queries (10-100x faster than file I/O)
- Advanced analytics: timeseries, leaderboards, aggregations
New Database Tables
-- Results storage
CREATE TABLE positions (
id INTEGER PRIMARY KEY,
job_id TEXT,
date TEXT,
model TEXT,
action_id INTEGER,
action_type TEXT,
symbol TEXT,
amount INTEGER,
price REAL,
cash REAL,
portfolio_value REAL,
daily_profit REAL,
daily_return_pct REAL,
cumulative_profit REAL,
cumulative_return_pct REAL,
created_at TEXT,
FOREIGN KEY (job_id) REFERENCES jobs(job_id)
);
CREATE TABLE holdings (
id INTEGER PRIMARY KEY,
position_id INTEGER,
symbol TEXT,
quantity INTEGER,
FOREIGN KEY (position_id) REFERENCES positions(id)
);
CREATE TABLE reasoning_logs (
id INTEGER PRIMARY KEY,
job_id TEXT,
date TEXT,
model TEXT,
step_number INTEGER,
timestamp TEXT,
role TEXT,
content TEXT,
tool_name TEXT,
FOREIGN KEY (job_id) REFERENCES jobs(job_id)
);
CREATE TABLE tool_usage (
id INTEGER PRIMARY KEY,
job_id TEXT,
date TEXT,
model TEXT,
tool_name TEXT,
call_count INTEGER,
total_duration_seconds REAL,
FOREIGN KEY (job_id) REFERENCES jobs(job_id)
);
New API Endpoints
# Enhanced results endpoint (now reads from SQLite)
GET /results?date=2025-01-16&model=gpt-5&detail=minimal|full
# New analytics endpoints
GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31
GET /leaderboard?date=2025-01-16 # Rankings by portfolio value
Migration Strategy
Phase 1: Dual-write mode
- Agent writes to
position.jsonl(existing code) - Executor writes to SQLite after agent completes
- Ensures backward compatibility
Phase 2: Verification
- Compare SQLite data vs JSONL data
- Fix any discrepancies
Phase 3: Switch over
/resultsendpoint reads from SQLite- JSONL writes become optional (can deprecate later)
Performance Improvement
| Operation | Before (JSONL) | After (SQLite) | Speedup |
|---|---|---|---|
| Get results for 1 date | 200-500ms | 20-50ms | 10x faster |
| Get timeseries (30 days) | 6-15 seconds | 100-300ms | 50x faster |
| Get leaderboard | 5-10 seconds | 50-100ms | 100x faster |
Key Enhancement #2: Comprehensive Testing Suite
Testing Thresholds
| Metric | Minimum | Target | Enforcement |
|---|---|---|---|
| Code Coverage | 85% | 90% | CI fails if below |
| Critical Path Coverage | 90% | 95% | Manual review |
| Unit Test Speed | <10s | <5s | Benchmark tracking |
| Integration Test Speed | <60s | <30s | Benchmark tracking |
| API Response Times | <500ms | <200ms | Load testing |
Test Suite Structure
tests/
├── unit/ # 80 tests, <10 seconds
│ ├── test_job_manager.py # 95% coverage target
│ ├── test_database.py
│ ├── test_runtime_manager.py
│ ├── test_results_service.py # 95% coverage target
│ └── test_models.py
│
├── integration/ # 30 tests, <60 seconds
│ ├── test_api_endpoints.py # Full FastAPI testing
│ ├── test_worker.py
│ ├── test_executor.py
│ └── test_end_to_end.py
│
├── performance/ # 20 tests
│ ├── test_database_benchmarks.py
│ ├── test_api_load.py # Locust load testing
│ └── test_simulation_timing.py
│
├── security/ # 10 tests
│ ├── test_api_security.py # SQL injection, XSS, path traversal
│ └── test_auth.py # Future: API key validation
│
└── e2e/ # 10 tests, Docker required
└── test_docker_workflow.py # Full Docker compose scenario
Quality Gates
All PRs must pass:
- ✅ All tests passing (unit + integration)
- ✅ Code coverage ≥ 85%
- ✅ No critical security vulnerabilities (Bandit scan)
- ✅ Linting passes (Ruff or Flake8)
- ✅ Type checking passes (mypy strict mode)
- ✅ No performance regressions (±10% tolerance)
Release checklist:
- ✅ All quality gates pass
- ✅ End-to-end tests pass in Docker
- ✅ Load testing passes (100 concurrent requests)
- ✅ Security scan passes (OWASP ZAP)
- ✅ Manual smoke tests complete
CI/CD Integration
# .github/workflows/test.yml
name: Test Suite
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run unit tests
run: pytest tests/unit/ --cov=api --cov-fail-under=85
- name: Run integration tests
run: pytest tests/integration/
- name: Security scan
run: bandit -r api/ -ll
- name: Upload coverage
uses: codecov/codecov-action@v3
Test Coverage Breakdown
| Component | Minimum | Target | Tests |
|---|---|---|---|
api/job_manager.py |
90% | 95% | 25 tests |
api/worker.py |
85% | 90% | 15 tests |
api/executor.py |
85% | 90% | 12 tests |
api/results_service.py |
90% | 95% | 18 tests |
api/database.py |
95% | 100% | 10 tests |
api/runtime_manager.py |
85% | 90% | 8 tests |
api/main.py |
80% | 85% | 20 tests |
| Total | 85% | 90% | ~150 tests |
Updated Implementation Plan
Phase 1: API Foundation (Days 1-2)
- Create
api/directory structure - Implement
api/models.pywith Pydantic models - Implement
api/database.pywith enhanced schema (6 tables) - Implement
api/job_manager.pywith job CRUD operations - NEW: Write unit tests for job_manager (target: 95% coverage)
- Test database operations manually
Testing Deliverables:
- 25 unit tests for job_manager
- 10 unit tests for database utilities
- 85%+ coverage for Phase 1 code
Phase 2: Worker & Executor (Days 3-4)
- Implement
api/runtime_manager.py - Implement
api/executor.pyfor single model-day execution - NEW: Add SQLite write logic to executor (
_store_results_to_db()) - Implement
api/worker.pyfor job orchestration - NEW: Write unit tests for worker and executor (target: 85% coverage)
- Test runtime config isolation
Testing Deliverables:
- 15 unit tests for worker
- 12 unit tests for executor
- 8 unit tests for runtime_manager
- 85%+ coverage for Phase 2 code
Phase 3: Results Service & FastAPI Endpoints (Days 5-6)
- NEW: Implement
api/results_service.py(SQLite-backed)get_results(date, model, detail)get_portfolio_timeseries(model, start_date, end_date)get_leaderboard(date)
- Implement
api/main.pywith all endpoints/simulate/triggerwith background tasks/simulate/status/{job_id}/simulate/current/results(now reads from SQLite)- NEW:
/portfolio/timeseries - NEW:
/leaderboard /healthwith MCP checks
- NEW: Write unit tests for results_service (target: 95% coverage)
- NEW: Write integration tests for API endpoints (target: 80% coverage)
- Test all endpoints with Postman/curl
Testing Deliverables:
- 18 unit tests for results_service
- 20 integration tests for API endpoints
- Performance benchmarks for database queries
- 85%+ coverage for Phase 3 code
Phase 4: Docker Integration (Day 7)
- Update
Dockerfile - Create
docker-entrypoint-api.sh - Create
requirements-api.txt - Update
docker-compose.yml - Test Docker build
- Test container startup and health checks
- NEW: Run E2E tests in Docker environment
- Test end-to-end simulation via API in Docker
Testing Deliverables:
- 10 E2E tests with Docker
- Docker health check validation
- Performance testing in containerized environment
Phase 5: Windmill Integration (Days 8-9)
- Create Windmill scripts (trigger, poll, store)
- UPDATED: Modify
store_simulation_results.pyto use new/resultsendpoint - Test scripts locally against Docker API
- Deploy scripts to Windmill instance
- Create Windmill workflow
- Test workflow end-to-end
- Create Windmill dashboard (using new
/portfolio/timeseriesand/leaderboardendpoints) - Document Windmill setup process
Testing Deliverables:
- Integration tests for Windmill scripts
- End-to-end workflow validation
- Dashboard functionality verification
Phase 6: Testing, Security & Documentation (Day 10)
- NEW: Run full test suite and verify all thresholds met
- Code coverage ≥ 85%
- All ~150 tests passing
- Performance benchmarks within limits
- NEW: Security testing
- Bandit scan (Python security issues)
- SQL injection tests
- Input validation tests
- OWASP ZAP scan (optional)
- NEW: Load testing with Locust
- 100 concurrent users
- API endpoints within performance thresholds
- Integration tests for complete workflow
- Update README.md with API usage
- Create API documentation (Swagger/OpenAPI - auto-generated by FastAPI)
- Create deployment guide
- Create troubleshooting guide
- NEW: Generate test coverage report
Testing Deliverables:
- Full test suite execution report
- Security scan results
- Load testing results
- Coverage report (HTML + XML)
- CI/CD pipeline configuration
New Files Created
Database & Results
api/results_service.py- SQLite-backed results retrievalapi/import_historical_data.py- Migration script for existing position.jsonl files
Testing Suite
tests/conftest.py- Shared pytest fixturestests/unit/test_job_manager.py- 25 teststests/unit/test_database.py- 10 teststests/unit/test_runtime_manager.py- 8 teststests/unit/test_results_service.py- 18 teststests/unit/test_models.py- 5 teststests/integration/test_api_endpoints.py- 20 teststests/integration/test_worker.py- 15 teststests/integration/test_executor.py- 12 teststests/integration/test_end_to_end.py- 5 teststests/performance/test_database_benchmarks.py- 10 teststests/performance/test_api_load.py- Locust load testingtests/security/test_api_security.py- 10 teststests/e2e/test_docker_workflow.py- 10 testspytest.ini- Test configurationrequirements-dev.txt- Testing dependencies
CI/CD
.github/workflows/test.yml- GitHub Actions workflow
Updated File Structure
AI-Trader/
├── api/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── models.py # Pydantic request/response models
│ ├── job_manager.py # Job lifecycle management
│ ├── database.py # SQLite utilities (enhanced schema)
│ ├── worker.py # Background simulation worker
│ ├── executor.py # Single model-day execution (+ SQLite writes)
│ ├── runtime_manager.py # Runtime config isolation
│ ├── results_service.py # NEW: SQLite-backed results retrieval
│ └── import_historical_data.py # NEW: JSONL → SQLite migration
│
├── tests/ # NEW: Comprehensive test suite
│ ├── conftest.py
│ ├── unit/ # 80 tests, <10s
│ ├── integration/ # 30 tests, <60s
│ ├── performance/ # 20 tests
│ ├── security/ # 10 tests
│ └── e2e/ # 10 tests
│
├── docs/
│ ├── api-specification.md
│ ├── job-manager-specification.md
│ ├── worker-specification.md
│ ├── implementation-specifications.md
│ ├── database-enhanced-specification.md # NEW
│ ├── testing-specification.md # NEW
│ ├── README-SPECS.md
│ └── ENHANCED-SPECIFICATIONS-SUMMARY.md # NEW (this file)
│
├── data/
│ ├── jobs.db # SQLite database (6 tables)
│ ├── runtime_env*.json # Runtime configs (temporary)
│ ├── agent_data/ # Existing position/log data
│ └── merged.jsonl # Existing price data
│
├── pytest.ini # NEW: Test configuration
├── requirements-dev.txt # NEW: Testing dependencies
├── .github/workflows/test.yml # NEW: CI/CD pipeline
└── ... (existing files)
Benefits Summary
Performance
- 10-100x faster results queries (SQLite vs file I/O)
- Advanced analytics - timeseries, leaderboards, aggregations in milliseconds
- Optimized indexes for common queries
Quality
- 85% minimum coverage enforced by CI/CD
- 150 comprehensive tests across unit, integration, performance, security
- Quality gates prevent regressions
- Type safety with mypy strict mode
Maintainability
- SQLite single source of truth - easier backup, restore, migration
- Automated testing catches bugs early
- CI/CD integration provides fast feedback on every commit
- Security scanning prevents vulnerabilities
Analytics Capabilities
New queries enabled by SQLite:
# Portfolio timeseries for charting
GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31
# Model leaderboard
GET /leaderboard?date=2025-01-31
# Advanced filtering (future)
SELECT * FROM positions
WHERE daily_return_pct > 2.0
ORDER BY portfolio_value DESC;
# Aggregations (future)
SELECT model, AVG(daily_return_pct) as avg_return
FROM positions
GROUP BY model
ORDER BY avg_return DESC;
Migration from Original Spec
If you've already started implementation based on original specs:
Step 1: Database Schema Migration
-- Run enhanced schema creation
-- See database-enhanced-specification.md Section 2.1
Step 2: Add Results Service
# Create new file
touch api/results_service.py
# Implement as per database-enhanced-specification.md Section 4.1
Step 3: Update Executor
# In api/executor.py, add after agent.run_trading_session():
self._store_results_to_db(job_id, date, model_sig)
Step 4: Update API Endpoints
# In api/main.py, update /results endpoint to use ResultsService
from api.results_service import ResultsService
results_service = ResultsService()
@app.get("/results")
async def get_results(...):
return results_service.get_results(date, model, detail)
Step 5: Add Test Suite
mkdir -p tests/{unit,integration,performance,security,e2e}
# Create test files as per testing-specification.md Section 4-8
Step 6: Configure CI/CD
mkdir -p .github/workflows
# Create test.yml as per testing-specification.md Section 10.1
Testing Execution Guide
Run Unit Tests
pytest tests/unit/ -v --cov=api --cov-report=term-missing
Run Integration Tests
pytest tests/integration/ -v
Run All Tests (Except E2E)
pytest tests/ -v --ignore=tests/e2e/ --cov=api --cov-report=html
Run E2E Tests (Requires Docker)
pytest tests/e2e/ -v -s
Run Performance Benchmarks
pytest tests/performance/ --benchmark-only
Run Security Tests
pytest tests/security/ -v
bandit -r api/ -ll
Generate Coverage Report
pytest tests/unit/ tests/integration/ --cov=api --cov-report=html
open htmlcov/index.html # View in browser
Run Load Tests
locust -f tests/performance/test_api_load.py --host=http://localhost:8080
# Open http://localhost:8089 for Locust UI
Questions & Next Steps
Review Checklist
Please review:
- ✅ Enhanced database schema with 6 tables for comprehensive results storage
- ✅ Migration strategy for backward compatibility (dual-write mode)
- ✅ Testing thresholds (85% coverage minimum, performance benchmarks)
- ✅ Test suite structure (150 tests across 5 categories)
- ✅ CI/CD integration with quality gates
- ✅ Updated implementation plan (10 days, 6 phases)
Questions to Consider
- Database migration timing: Start with dual-write mode immediately, or add in Phase 2?
- Testing priorities: Should we implement tests alongside features (TDD) or after each phase?
- CI/CD platform: GitHub Actions (as specified) or different platform?
- Performance baselines: Should we run benchmarks before implementation to track improvement?
- Security priorities: Which security tests are MVP vs nice-to-have?
Ready to Implement?
Option A: Approve specifications and begin Phase 1 implementation
- Create API directory structure
- Implement enhanced database schema
- Write unit tests for database layer
- Target: 2 days, 90%+ coverage for database code
Option B: Request modifications to specifications
- Clarify any unclear requirements
- Adjust testing thresholds
- Modify implementation timeline
Option C: Implement in parallel workstreams
- Workstream 1: Core API (Phases 1-3)
- Workstream 2: Testing suite (parallel with Phase 1-3)
- Workstream 3: Docker + Windmill (Phases 4-5)
- Benefits: Faster delivery, more parallelization
- Requires: Clear interfaces between components
Summary
Enhanced specifications add:
- 🗄️ SQLite results storage - 10-100x faster queries, advanced analytics
- 🧪 Comprehensive testing - 150 tests, 85% coverage, quality gates
- 🔒 Security testing - SQL injection, XSS, input validation
- ⚡ Performance benchmarks - Catch regressions early
- 🚀 CI/CD pipeline - Automated quality checks on every commit
Total effort: Still ~10 days, but with significantly higher code quality and confidence in deployments.
Risk mitigation: Extensive testing catches bugs before production, preventing costly hotfixes.
Long-term value: Maintainable, well-tested codebase enables rapid feature development.
Ready to proceed? Please provide feedback or approval to begin implementation!