Files
AI-Trader/docs/README-SPECS.md
Bill fb9583b374 feat: transform to REST API service with SQLite persistence (v0.3.0)
Major architecture transformation from batch-only to API service with
database persistence for Windmill integration.

## REST API Implementation
- POST /simulate/trigger - Start simulation jobs
- GET /simulate/status/{job_id} - Monitor job progress
- GET /results - Query results with filters (job_id, date, model)
- GET /health - Service health checks

## Database Layer
- SQLite persistence with 6 tables (jobs, job_details, positions,
  holdings, reasoning_logs, tool_usage)
- Foreign key constraints with cascade deletes
- Replaces JSONL file storage

## Backend Components
- JobManager: Job lifecycle management with concurrency control
- RuntimeConfigManager: Thread-safe isolated runtime configs
- ModelDayExecutor: Single model-day execution engine
- SimulationWorker: Date-sequential, model-parallel orchestration

## Testing
- 102 unit and integration tests (85% coverage)
- Database: 98% coverage
- Job manager: 98% coverage
- API endpoints: 81% coverage
- Pydantic models: 100% coverage
- TDD approach throughout

## Docker Deployment
- Dual-mode: API server (persistent) + batch (one-time)
- Health checks with 30s interval
- Volume persistence for database and logs
- Separate entrypoints for each mode

## Validation Tools
- scripts/validate_docker_build.sh - Build validation
- scripts/test_api_endpoints.sh - Complete API testing
- scripts/test_batch_mode.sh - Batch mode validation
- DOCKER_API.md - Deployment guide
- TESTING_GUIDE.md - Testing procedures

## Configuration
- API_PORT environment variable (default: 8080)
- Backwards compatible with existing configs
- FastAPI, uvicorn, pydantic>=2.0 dependencies

Co-Authored-By: AI Assistant <noreply@example.com>
2025-10-31 11:47:10 -04:00

12 KiB
Raw Permalink Blame History

AI-Trader API Service - Technical Specifications Summary

Overview

This directory contains comprehensive technical specifications for transforming the AI-Trader batch simulation system into an API service compatible with Windmill automation.

Specification Documents

1. API Specification

Purpose: Defines all API endpoints, request/response formats, and data models

Key Contents:

  • 5 REST Endpoints:
    • POST /simulate/trigger - Queue catch-up simulation job
    • GET /simulate/status/{job_id} - Poll job progress
    • GET /simulate/current - Get latest job
    • GET /results - Retrieve simulation results (minimal/full detail)
    • GET /health - Service health check
  • Pydantic Models for type-safe request/response handling
  • Error Handling strategies and HTTP status codes
  • SQLite Schema for jobs and job_details tables
  • Configuration Management via environment variables

Status Codes: 200 OK, 202 Accepted, 400 Bad Request, 404 Not Found, 409 Conflict, 503 Service Unavailable


2. Job Manager Specification

Purpose: Details the job tracking and database layer

Key Contents:

  • SQLite Database Schema:
    • jobs table - High-level job metadata
    • job_details table - Per model-day execution tracking
  • JobManager Class Interface:
    • create_job() - Create new simulation job
    • get_job() - Retrieve job by ID
    • update_job_status() - State transitions (pending → running → completed/partial/failed)
    • get_job_progress() - Detailed progress metrics
    • can_start_new_job() - Concurrency control
  • State Machine: Job status transitions and business logic
  • Concurrency Control: Single-job execution enforcement
  • Testing Strategy: Unit tests with temporary databases

Key Feature: Independent model execution - one model's failure doesn't block others (results in "partial" status)


3. Background Worker Specification

Purpose: Defines async job execution architecture

Key Contents:

  • Execution Pattern: Date-sequential, Model-parallel
    • All models for Date 1 run in parallel
    • Date 2 starts only after all models finish Date 1
    • Ensures position.jsonl integrity (no concurrent writes)
  • SimulationWorker Class:
    • Orchestrates job execution
    • Manages date sequencing
    • Handles job-level errors
  • ModelDayExecutor Class:
    • Executes single model-day simulation
    • Updates job_detail status
    • Isolates runtime configuration
  • RuntimeConfigManager:
    • Creates temporary runtime_env_{job_id}{model}{date}.json files
    • Prevents state collisions between concurrent models
    • Cleans up after execution
  • Error Handling: Graceful failure (models continue despite peer failures)
  • Logging: Structured JSON logging with job/model/date context

Performance: 3 models × 5 days = ~7-15 minutes (vs. ~22-45 minutes sequential)


4. Implementation Specification

Purpose: Complete implementation guide covering Agent, Docker, and Windmill

Key Contents:

Part 1: BaseAgent Refactoring

  • Analysis: Existing run_trading_session() already compatible with API mode
  • Required Changes: NONE! Existing code works as-is
  • Worker Integration: Calls agent.run_trading_session(date) directly

Part 2: Docker Configuration

  • Modified Dockerfile: Adds FastAPI dependencies, new entrypoint
  • docker-entrypoint-api.sh: Starts MCP services → launches uvicorn
  • Health Checks: Verifies MCP services and database connectivity
  • Volume Mounts: ./data, ./configs for persistence

Part 3: Windmill Integration

  • Flow 1: trigger_simulation.ts - Daily cron triggers API
  • Flow 2: poll_simulation_status.ts - Polls every 5 min until complete
  • Flow 3: store_simulation_results.py - Stores results in Windmill DB
  • Dashboard: Charts and tables showing portfolio performance
  • Workflow Orchestration: Complete YAML workflow definition

Part 4: File Structure

  • New api/ directory with 7 modules
  • New windmill/ directory with scripts and dashboard
  • New docs/ directory (this folder)
  • data/jobs.db for job tracking

Part 5: Implementation Checklist

10-day implementation plan broken into 6 phases


Architecture Highlights

Request Flow

1. Windmill → POST /simulate/trigger
2. API creates job in SQLite (status: pending)
3. API queues BackgroundTask
4. API returns 202 Accepted with job_id
   ↓
5. Worker starts (status: running)
6. For each date sequentially:
     For each model in parallel:
       - Create isolated runtime config
       - Execute agent.run_trading_session(date)
       - Update job_detail status
7. Worker finishes (status: completed/partial/failed)
   ↓
8. Windmill polls GET /simulate/status/{job_id}
9. When complete: Windmill calls GET /results?date=X
10. Windmill stores results in internal DB
11. Windmill dashboard displays performance

Data Flow

Input: configs/default_config.json
       ↓
API: Calculates date_range (last position → today)
       ↓
Worker: Executes simulations
       ↓
Output: data/agent_data/{model}/position/position.jsonl
        data/agent_data/{model}/log/{date}/log.jsonl
        data/jobs.db (job tracking)
       ↓
API: Reads position.jsonl + calculates P&L
       ↓
Windmill: Stores in internal DB → Dashboard visualization

Key Design Decisions

1. Pattern B: Lazy On-Demand Processing

  • Chosen: Windmill controls simulation timing via API calls
  • Benefit: Centralized scheduling in Windmill
  • Tradeoff: First Windmill call of the day triggers long-running job

2. SQLite vs. PostgreSQL

  • Chosen: SQLite for MVP
  • Rationale: Low concurrency (1 job at a time), simple deployment
  • Future: PostgreSQL for production with multiple concurrent jobs

3. Date-Sequential, Model-Parallel Execution

  • Chosen: Dates run sequentially, models run in parallel per date
  • Rationale: Prevents position.jsonl race conditions, faster than fully sequential
  • Performance: ~50% faster than sequential (3 models in parallel)

4. Independent Model Failures

  • Chosen: One model's failure doesn't block others
  • Benefit: Partial results better than no results
  • Implementation: Job status becomes "partial" if any model fails

5. Minimal BaseAgent Changes

  • Chosen: No modifications to agent code
  • Rationale: Existing run_trading_session() is perfect API interface
  • Benefit: Maintains backward compatibility with batch mode

Implementation Prerequisites

Required Environment Variables

OPENAI_API_BASE=...
OPENAI_API_KEY=...
ALPHAADVANTAGE_API_KEY=...
JINA_API_KEY=...
RUNTIME_ENV_PATH=/app/data/runtime_env.json
MATH_HTTP_PORT=8000
SEARCH_HTTP_PORT=8001
TRADE_HTTP_PORT=8002
GETPRICE_HTTP_PORT=8003
API_HOST=0.0.0.0
API_PORT=8080

Required Python Packages (new)

fastapi==0.109.0
uvicorn[standard]==0.27.0
pydantic==2.5.3

Docker Requirements

  • Docker Engine 20.10+
  • Docker Compose 2.0+
  • 2GB RAM minimum for container
  • 10GB disk space for data

Windmill Requirements

  • Windmill instance (self-hosted or cloud)
  • Network access from Windmill to AI-Trader API
  • Windmill CLI for deployment (optional)

Testing Strategy

Unit Tests

  • tests/test_job_manager.py - Database operations
  • tests/test_worker.py - Job execution logic
  • tests/test_executor.py - Model-day execution

Integration Tests

  • tests/test_api_endpoints.py - FastAPI endpoint behavior
  • tests/test_end_to_end.py - Full workflow (trigger → execute → retrieve)

Manual Testing

  • Docker container startup
  • Health check endpoint
  • Windmill workflow execution
  • Dashboard visualization

Performance Expectations

Single Model-Day Execution

  • Duration: 30-60 seconds (varies by AI model latency)
  • Bottlenecks: AI API calls, MCP tool latency

Multi-Model Job

  • Example: 3 models × 5 days = 15 model-days
  • Parallel Execution: ~7-15 minutes
  • Sequential Execution: ~22-45 minutes
  • Speedup: ~3x (number of models)

API Response Times

  • /simulate/trigger: < 1 second (just queues job)
  • /simulate/status: < 100ms (SQLite query)
  • /results?detail=minimal: < 500ms (file read + JSON parsing)
  • /results?detail=full: < 2 seconds (parse log files)

Security Considerations

MVP Security

  • Network Isolation: Docker network (no public exposure)
  • No Authentication: Assumes Windmill → API is trusted network

Future Enhancements

  • API key authentication (X-API-Key header)
  • Rate limiting per client
  • HTTPS/TLS encryption
  • Input sanitization for path traversal prevention

Deployment Steps

1. Build Docker Image

docker-compose build

2. Start API Service

docker-compose up -d

3. Verify Health

curl http://localhost:8080/health

4. Test Trigger

curl -X POST http://localhost:8080/simulate/trigger \
  -H "Content-Type: application/json" \
  -d '{"config_path": "configs/default_config.json"}'

5. Deploy Windmill Scripts

wmill script push windmill/trigger_simulation.ts
wmill script push windmill/poll_simulation_status.ts
wmill script push windmill/store_simulation_results.py

6. Create Windmill Workflow

  • Import windmill/daily_simulation_workflow.yaml
  • Configure resource ai_trader_api with API URL
  • Set cron schedule (daily 6 AM)

7. Create Windmill Dashboard

  • Import windmill/dashboard.json
  • Verify data visualization

Troubleshooting Guide

Issue: Health check fails

Symptoms: curl http://localhost:8080/health returns 503

Possible Causes:

  1. MCP services not running
  2. Database file permission error
  3. API server not started

Solutions:

# Check MCP services
docker-compose exec ai-trader curl http://localhost:8000/health

# Check API logs
docker-compose logs -f ai-trader

# Restart container
docker-compose restart

Issue: Job stuck in "running" status

Symptoms: Job never completes, status remains "running"

Possible Causes:

  1. Agent execution crashed
  2. Model API timeout
  3. Worker process died

Solutions:

# Check job details for error messages
curl http://localhost:8080/simulate/status/{job_id}

# Check container logs
docker-compose logs -f ai-trader

# If API restarted, stale jobs are marked as failed on startup
docker-compose restart

Issue: Windmill can't reach API

Symptoms: Connection refused from Windmill scripts

Solutions:

  • Verify Windmill and AI-Trader on same Docker network
  • Check firewall rules
  • Use container name (ai-trader) instead of localhost in Windmill resource
  • Verify API_PORT environment variable

Migration from Batch Mode

For Users Currently Running Batch Mode

Option 1: Dual Mode (Recommended)

  • Keep existing main.py for manual testing
  • Add new API mode for production automation
  • Use different config files for each mode

Option 2: API-Only

  • Replace batch execution entirely
  • All simulations via API calls
  • More consistent with production workflow

Migration Checklist

  • Backup existing data/ directory
  • Update .env with API configuration
  • Test API mode in separate environment first
  • Gradually migrate Windmill workflows
  • Monitor logs for errors
  • Validate results match batch mode output

Next Steps

  1. Review Specifications

    • Read all 4 specification documents
    • Ask clarifying questions
    • Approve design before implementation
  2. Implementation Phase 1 (Days 1-2)

    • Set up api/ directory structure
    • Implement database and job_manager
    • Write unit tests
  3. Implementation Phase 2 (Days 3-4)

    • Implement worker and executor
    • Test with mock agents
  4. Implementation Phase 3 (Days 5-6)

    • Implement FastAPI endpoints
    • Test with Postman/curl
  5. Implementation Phase 4 (Day 7)

    • Docker integration
    • End-to-end testing
  6. Implementation Phase 5 (Days 8-9)

    • Windmill integration
    • Dashboard creation
  7. Implementation Phase 6 (Day 10)

    • Final testing
    • Documentation

Questions or Feedback?

Please review all specifications and provide feedback on:

  1. API endpoint design
  2. Database schema
  3. Execution pattern (date-sequential, model-parallel)
  4. Error handling approach
  5. Windmill integration workflow
  6. Any concerns or suggested improvements

Ready to proceed with implementation? Confirm approval of specifications to begin Phase 1.