Files
AI-Trader/docs/ENHANCED-SPECIFICATIONS-SUMMARY.md
Bill fb9583b374 feat: transform to REST API service with SQLite persistence (v0.3.0)
Major architecture transformation from batch-only to API service with
database persistence for Windmill integration.

## REST API Implementation
- POST /simulate/trigger - Start simulation jobs
- GET /simulate/status/{job_id} - Monitor job progress
- GET /results - Query results with filters (job_id, date, model)
- GET /health - Service health checks

## Database Layer
- SQLite persistence with 6 tables (jobs, job_details, positions,
  holdings, reasoning_logs, tool_usage)
- Foreign key constraints with cascade deletes
- Replaces JSONL file storage

## Backend Components
- JobManager: Job lifecycle management with concurrency control
- RuntimeConfigManager: Thread-safe isolated runtime configs
- ModelDayExecutor: Single model-day execution engine
- SimulationWorker: Date-sequential, model-parallel orchestration

## Testing
- 102 unit and integration tests (85% coverage)
- Database: 98% coverage
- Job manager: 98% coverage
- API endpoints: 81% coverage
- Pydantic models: 100% coverage
- TDD approach throughout

## Docker Deployment
- Dual-mode: API server (persistent) + batch (one-time)
- Health checks with 30s interval
- Volume persistence for database and logs
- Separate entrypoints for each mode

## Validation Tools
- scripts/validate_docker_build.sh - Build validation
- scripts/test_api_endpoints.sh - Complete API testing
- scripts/test_batch_mode.sh - Batch mode validation
- DOCKER_API.md - Deployment guide
- TESTING_GUIDE.md - Testing procedures

## Configuration
- API_PORT environment variable (default: 8080)
- Backwards compatible with existing configs
- FastAPI, uvicorn, pydantic>=2.0 dependencies

Co-Authored-By: AI Assistant <noreply@example.com>
2025-10-31 11:47:10 -04:00

20 KiB

AI-Trader API Service - Enhanced Specifications Summary

Changes from Original Specifications

Based on user feedback, the specifications have been enhanced with:

  1. SQLite-backed results storage (instead of reading position.jsonl on-demand)
  2. Comprehensive Python testing suite with pytest
  3. Defined testing thresholds for coverage, performance, and quality gates

Document Index

Core Specifications (Original)

  1. api-specification.md - REST API endpoints and data models
  2. job-manager-specification.md - Job tracking and database layer
  3. worker-specification.md - Background worker architecture
  4. implementation-specifications.md - Agent, Docker, Windmill integration

Enhanced Specifications (New)

  1. database-enhanced-specification.md - SQLite results storage
  2. testing-specification.md - Comprehensive testing suite

Summary Documents

  1. README-SPECS.md - Original specifications overview
  2. ENHANCED-SPECIFICATIONS-SUMMARY.md - This document

Key Enhancement #1: SQLite Results Storage

What Changed

Before:

  • /results endpoint reads position.jsonl files on-demand
  • File I/O on every API request
  • No support for advanced queries (date ranges, aggregations)

After:

  • Simulation results written to SQLite during execution
  • Fast database queries (10-100x faster than file I/O)
  • Advanced analytics: timeseries, leaderboards, aggregations

New Database Tables

-- Results storage
CREATE TABLE positions (
    id INTEGER PRIMARY KEY,
    job_id TEXT,
    date TEXT,
    model TEXT,
    action_id INTEGER,
    action_type TEXT,
    symbol TEXT,
    amount INTEGER,
    price REAL,
    cash REAL,
    portfolio_value REAL,
    daily_profit REAL,
    daily_return_pct REAL,
    cumulative_profit REAL,
    cumulative_return_pct REAL,
    created_at TEXT,
    FOREIGN KEY (job_id) REFERENCES jobs(job_id)
);

CREATE TABLE holdings (
    id INTEGER PRIMARY KEY,
    position_id INTEGER,
    symbol TEXT,
    quantity INTEGER,
    FOREIGN KEY (position_id) REFERENCES positions(id)
);

CREATE TABLE reasoning_logs (
    id INTEGER PRIMARY KEY,
    job_id TEXT,
    date TEXT,
    model TEXT,
    step_number INTEGER,
    timestamp TEXT,
    role TEXT,
    content TEXT,
    tool_name TEXT,
    FOREIGN KEY (job_id) REFERENCES jobs(job_id)
);

CREATE TABLE tool_usage (
    id INTEGER PRIMARY KEY,
    job_id TEXT,
    date TEXT,
    model TEXT,
    tool_name TEXT,
    call_count INTEGER,
    total_duration_seconds REAL,
    FOREIGN KEY (job_id) REFERENCES jobs(job_id)
);

New API Endpoints

# Enhanced results endpoint (now reads from SQLite)
GET /results?date=2025-01-16&model=gpt-5&detail=minimal|full

# New analytics endpoints
GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31
GET /leaderboard?date=2025-01-16  # Rankings by portfolio value

Migration Strategy

Phase 1: Dual-write mode

  • Agent writes to position.jsonl (existing code)
  • Executor writes to SQLite after agent completes
  • Ensures backward compatibility

Phase 2: Verification

  • Compare SQLite data vs JSONL data
  • Fix any discrepancies

Phase 3: Switch over

  • /results endpoint reads from SQLite
  • JSONL writes become optional (can deprecate later)

Performance Improvement

Operation Before (JSONL) After (SQLite) Speedup
Get results for 1 date 200-500ms 20-50ms 10x faster
Get timeseries (30 days) 6-15 seconds 100-300ms 50x faster
Get leaderboard 5-10 seconds 50-100ms 100x faster

Key Enhancement #2: Comprehensive Testing Suite

Testing Thresholds

Metric Minimum Target Enforcement
Code Coverage 85% 90% CI fails if below
Critical Path Coverage 90% 95% Manual review
Unit Test Speed <10s <5s Benchmark tracking
Integration Test Speed <60s <30s Benchmark tracking
API Response Times <500ms <200ms Load testing

Test Suite Structure

tests/
├── unit/                          # 80 tests, <10 seconds
│   ├── test_job_manager.py        # 95% coverage target
│   ├── test_database.py
│   ├── test_runtime_manager.py
│   ├── test_results_service.py    # 95% coverage target
│   └── test_models.py
│
├── integration/                   # 30 tests, <60 seconds
│   ├── test_api_endpoints.py      # Full FastAPI testing
│   ├── test_worker.py
│   ├── test_executor.py
│   └── test_end_to_end.py
│
├── performance/                   # 20 tests
│   ├── test_database_benchmarks.py
│   ├── test_api_load.py           # Locust load testing
│   └── test_simulation_timing.py
│
├── security/                      # 10 tests
│   ├── test_api_security.py       # SQL injection, XSS, path traversal
│   └── test_auth.py               # Future: API key validation
│
└── e2e/                           # 10 tests, Docker required
    └── test_docker_workflow.py    # Full Docker compose scenario

Quality Gates

All PRs must pass:

  1. All tests passing (unit + integration)
  2. Code coverage ≥ 85%
  3. No critical security vulnerabilities (Bandit scan)
  4. Linting passes (Ruff or Flake8)
  5. Type checking passes (mypy strict mode)
  6. No performance regressions (±10% tolerance)

Release checklist:

  1. All quality gates pass
  2. End-to-end tests pass in Docker
  3. Load testing passes (100 concurrent requests)
  4. Security scan passes (OWASP ZAP)
  5. Manual smoke tests complete

CI/CD Integration

# .github/workflows/test.yml
name: Test Suite

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run unit tests
        run: pytest tests/unit/ --cov=api --cov-fail-under=85
      - name: Run integration tests
        run: pytest tests/integration/
      - name: Security scan
        run: bandit -r api/ -ll
      - name: Upload coverage
        uses: codecov/codecov-action@v3

Test Coverage Breakdown

Component Minimum Target Tests
api/job_manager.py 90% 95% 25 tests
api/worker.py 85% 90% 15 tests
api/executor.py 85% 90% 12 tests
api/results_service.py 90% 95% 18 tests
api/database.py 95% 100% 10 tests
api/runtime_manager.py 85% 90% 8 tests
api/main.py 80% 85% 20 tests
Total 85% 90% ~150 tests

Updated Implementation Plan

Phase 1: API Foundation (Days 1-2)

  • Create api/ directory structure
  • Implement api/models.py with Pydantic models
  • Implement api/database.py with enhanced schema (6 tables)
  • Implement api/job_manager.py with job CRUD operations
  • NEW: Write unit tests for job_manager (target: 95% coverage)
  • Test database operations manually

Testing Deliverables:

  • 25 unit tests for job_manager
  • 10 unit tests for database utilities
  • 85%+ coverage for Phase 1 code

Phase 2: Worker & Executor (Days 3-4)

  • Implement api/runtime_manager.py
  • Implement api/executor.py for single model-day execution
  • NEW: Add SQLite write logic to executor (_store_results_to_db())
  • Implement api/worker.py for job orchestration
  • NEW: Write unit tests for worker and executor (target: 85% coverage)
  • Test runtime config isolation

Testing Deliverables:

  • 15 unit tests for worker
  • 12 unit tests for executor
  • 8 unit tests for runtime_manager
  • 85%+ coverage for Phase 2 code

Phase 3: Results Service & FastAPI Endpoints (Days 5-6)

  • NEW: Implement api/results_service.py (SQLite-backed)
    • get_results(date, model, detail)
    • get_portfolio_timeseries(model, start_date, end_date)
    • get_leaderboard(date)
  • Implement api/main.py with all endpoints
    • /simulate/trigger with background tasks
    • /simulate/status/{job_id}
    • /simulate/current
    • /results (now reads from SQLite)
    • NEW: /portfolio/timeseries
    • NEW: /leaderboard
    • /health with MCP checks
  • NEW: Write unit tests for results_service (target: 95% coverage)
  • NEW: Write integration tests for API endpoints (target: 80% coverage)
  • Test all endpoints with Postman/curl

Testing Deliverables:

  • 18 unit tests for results_service
  • 20 integration tests for API endpoints
  • Performance benchmarks for database queries
  • 85%+ coverage for Phase 3 code

Phase 4: Docker Integration (Day 7)

  • Update Dockerfile
  • Create docker-entrypoint-api.sh
  • Create requirements-api.txt
  • Update docker-compose.yml
  • Test Docker build
  • Test container startup and health checks
  • NEW: Run E2E tests in Docker environment
  • Test end-to-end simulation via API in Docker

Testing Deliverables:

  • 10 E2E tests with Docker
  • Docker health check validation
  • Performance testing in containerized environment

Phase 5: Windmill Integration (Days 8-9)

  • Create Windmill scripts (trigger, poll, store)
  • UPDATED: Modify store_simulation_results.py to use new /results endpoint
  • Test scripts locally against Docker API
  • Deploy scripts to Windmill instance
  • Create Windmill workflow
  • Test workflow end-to-end
  • Create Windmill dashboard (using new /portfolio/timeseries and /leaderboard endpoints)
  • Document Windmill setup process

Testing Deliverables:

  • Integration tests for Windmill scripts
  • End-to-end workflow validation
  • Dashboard functionality verification

Phase 6: Testing, Security & Documentation (Day 10)

  • NEW: Run full test suite and verify all thresholds met
    • Code coverage ≥ 85%
    • All ~150 tests passing
    • Performance benchmarks within limits
  • NEW: Security testing
    • Bandit scan (Python security issues)
    • SQL injection tests
    • Input validation tests
    • OWASP ZAP scan (optional)
  • NEW: Load testing with Locust
    • 100 concurrent users
    • API endpoints within performance thresholds
  • Integration tests for complete workflow
  • Update README.md with API usage
  • Create API documentation (Swagger/OpenAPI - auto-generated by FastAPI)
  • Create deployment guide
  • Create troubleshooting guide
  • NEW: Generate test coverage report

Testing Deliverables:

  • Full test suite execution report
  • Security scan results
  • Load testing results
  • Coverage report (HTML + XML)
  • CI/CD pipeline configuration

New Files Created

Database & Results

  • api/results_service.py - SQLite-backed results retrieval
  • api/import_historical_data.py - Migration script for existing position.jsonl files

Testing Suite

  • tests/conftest.py - Shared pytest fixtures
  • tests/unit/test_job_manager.py - 25 tests
  • tests/unit/test_database.py - 10 tests
  • tests/unit/test_runtime_manager.py - 8 tests
  • tests/unit/test_results_service.py - 18 tests
  • tests/unit/test_models.py - 5 tests
  • tests/integration/test_api_endpoints.py - 20 tests
  • tests/integration/test_worker.py - 15 tests
  • tests/integration/test_executor.py - 12 tests
  • tests/integration/test_end_to_end.py - 5 tests
  • tests/performance/test_database_benchmarks.py - 10 tests
  • tests/performance/test_api_load.py - Locust load testing
  • tests/security/test_api_security.py - 10 tests
  • tests/e2e/test_docker_workflow.py - 10 tests
  • pytest.ini - Test configuration
  • requirements-dev.txt - Testing dependencies

CI/CD

  • .github/workflows/test.yml - GitHub Actions workflow

Updated File Structure

AI-Trader/
├── api/
│   ├── __init__.py
│   ├── main.py                      # FastAPI application
│   ├── models.py                    # Pydantic request/response models
│   ├── job_manager.py               # Job lifecycle management
│   ├── database.py                  # SQLite utilities (enhanced schema)
│   ├── worker.py                    # Background simulation worker
│   ├── executor.py                  # Single model-day execution (+ SQLite writes)
│   ├── runtime_manager.py           # Runtime config isolation
│   ├── results_service.py           # NEW: SQLite-backed results retrieval
│   └── import_historical_data.py    # NEW: JSONL → SQLite migration
│
├── tests/                           # NEW: Comprehensive test suite
│   ├── conftest.py
│   ├── unit/                        # 80 tests, <10s
│   ├── integration/                 # 30 tests, <60s
│   ├── performance/                 # 20 tests
│   ├── security/                    # 10 tests
│   └── e2e/                         # 10 tests
│
├── docs/
│   ├── api-specification.md
│   ├── job-manager-specification.md
│   ├── worker-specification.md
│   ├── implementation-specifications.md
│   ├── database-enhanced-specification.md    # NEW
│   ├── testing-specification.md              # NEW
│   ├── README-SPECS.md
│   └── ENHANCED-SPECIFICATIONS-SUMMARY.md    # NEW (this file)
│
├── data/
│   ├── jobs.db                      # SQLite database (6 tables)
│   ├── runtime_env*.json            # Runtime configs (temporary)
│   ├── agent_data/                  # Existing position/log data
│   └── merged.jsonl                 # Existing price data
│
├── pytest.ini                       # NEW: Test configuration
├── requirements-dev.txt             # NEW: Testing dependencies
├── .github/workflows/test.yml       # NEW: CI/CD pipeline
└── ... (existing files)

Benefits Summary

Performance

  • 10-100x faster results queries (SQLite vs file I/O)
  • Advanced analytics - timeseries, leaderboards, aggregations in milliseconds
  • Optimized indexes for common queries

Quality

  • 85% minimum coverage enforced by CI/CD
  • 150 comprehensive tests across unit, integration, performance, security
  • Quality gates prevent regressions
  • Type safety with mypy strict mode

Maintainability

  • SQLite single source of truth - easier backup, restore, migration
  • Automated testing catches bugs early
  • CI/CD integration provides fast feedback on every commit
  • Security scanning prevents vulnerabilities

Analytics Capabilities

New queries enabled by SQLite:

# Portfolio timeseries for charting
GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31

# Model leaderboard
GET /leaderboard?date=2025-01-31

# Advanced filtering (future)
SELECT * FROM positions
WHERE daily_return_pct > 2.0
ORDER BY portfolio_value DESC;

# Aggregations (future)
SELECT model, AVG(daily_return_pct) as avg_return
FROM positions
GROUP BY model
ORDER BY avg_return DESC;

Migration from Original Spec

If you've already started implementation based on original specs:

Step 1: Database Schema Migration

-- Run enhanced schema creation
-- See database-enhanced-specification.md Section 2.1

Step 2: Add Results Service

# Create new file
touch api/results_service.py
# Implement as per database-enhanced-specification.md Section 4.1

Step 3: Update Executor

# In api/executor.py, add after agent.run_trading_session():
self._store_results_to_db(job_id, date, model_sig)

Step 4: Update API Endpoints

# In api/main.py, update /results endpoint to use ResultsService
from api.results_service import ResultsService
results_service = ResultsService()

@app.get("/results")
async def get_results(...):
    return results_service.get_results(date, model, detail)

Step 5: Add Test Suite

mkdir -p tests/{unit,integration,performance,security,e2e}
# Create test files as per testing-specification.md Section 4-8

Step 6: Configure CI/CD

mkdir -p .github/workflows
# Create test.yml as per testing-specification.md Section 10.1

Testing Execution Guide

Run Unit Tests

pytest tests/unit/ -v --cov=api --cov-report=term-missing

Run Integration Tests

pytest tests/integration/ -v

Run All Tests (Except E2E)

pytest tests/ -v --ignore=tests/e2e/ --cov=api --cov-report=html

Run E2E Tests (Requires Docker)

pytest tests/e2e/ -v -s

Run Performance Benchmarks

pytest tests/performance/ --benchmark-only

Run Security Tests

pytest tests/security/ -v
bandit -r api/ -ll

Generate Coverage Report

pytest tests/unit/ tests/integration/ --cov=api --cov-report=html
open htmlcov/index.html  # View in browser

Run Load Tests

locust -f tests/performance/test_api_load.py --host=http://localhost:8080
# Open http://localhost:8089 for Locust UI

Questions & Next Steps

Review Checklist

Please review:

  1. Enhanced database schema with 6 tables for comprehensive results storage
  2. Migration strategy for backward compatibility (dual-write mode)
  3. Testing thresholds (85% coverage minimum, performance benchmarks)
  4. Test suite structure (150 tests across 5 categories)
  5. CI/CD integration with quality gates
  6. Updated implementation plan (10 days, 6 phases)

Questions to Consider

  1. Database migration timing: Start with dual-write mode immediately, or add in Phase 2?
  2. Testing priorities: Should we implement tests alongside features (TDD) or after each phase?
  3. CI/CD platform: GitHub Actions (as specified) or different platform?
  4. Performance baselines: Should we run benchmarks before implementation to track improvement?
  5. Security priorities: Which security tests are MVP vs nice-to-have?

Ready to Implement?

Option A: Approve specifications and begin Phase 1 implementation

  • Create API directory structure
  • Implement enhanced database schema
  • Write unit tests for database layer
  • Target: 2 days, 90%+ coverage for database code

Option B: Request modifications to specifications

  • Clarify any unclear requirements
  • Adjust testing thresholds
  • Modify implementation timeline

Option C: Implement in parallel workstreams

  • Workstream 1: Core API (Phases 1-3)
  • Workstream 2: Testing suite (parallel with Phase 1-3)
  • Workstream 3: Docker + Windmill (Phases 4-5)
  • Benefits: Faster delivery, more parallelization
  • Requires: Clear interfaces between components

Summary

Enhanced specifications add:

  1. 🗄️ SQLite results storage - 10-100x faster queries, advanced analytics
  2. 🧪 Comprehensive testing - 150 tests, 85% coverage, quality gates
  3. 🔒 Security testing - SQL injection, XSS, input validation
  4. Performance benchmarks - Catch regressions early
  5. 🚀 CI/CD pipeline - Automated quality checks on every commit

Total effort: Still ~10 days, but with significantly higher code quality and confidence in deployments.

Risk mitigation: Extensive testing catches bugs before production, preventing costly hotfixes.

Long-term value: Maintainable, well-tested codebase enables rapid feature development.


Ready to proceed? Please provide feedback or approval to begin implementation!