mirror of
https://github.com/Xe138/AI-Trader.git
synced 2026-04-02 01:27:24 -04:00
Major architecture transformation from batch-only to API service with
database persistence for Windmill integration.
## REST API Implementation
- POST /simulate/trigger - Start simulation jobs
- GET /simulate/status/{job_id} - Monitor job progress
- GET /results - Query results with filters (job_id, date, model)
- GET /health - Service health checks
## Database Layer
- SQLite persistence with 6 tables (jobs, job_details, positions,
holdings, reasoning_logs, tool_usage)
- Foreign key constraints with cascade deletes
- Replaces JSONL file storage
## Backend Components
- JobManager: Job lifecycle management with concurrency control
- RuntimeConfigManager: Thread-safe isolated runtime configs
- ModelDayExecutor: Single model-day execution engine
- SimulationWorker: Date-sequential, model-parallel orchestration
## Testing
- 102 unit and integration tests (85% coverage)
- Database: 98% coverage
- Job manager: 98% coverage
- API endpoints: 81% coverage
- Pydantic models: 100% coverage
- TDD approach throughout
## Docker Deployment
- Dual-mode: API server (persistent) + batch (one-time)
- Health checks with 30s interval
- Volume persistence for database and logs
- Separate entrypoints for each mode
## Validation Tools
- scripts/validate_docker_build.sh - Build validation
- scripts/test_api_endpoints.sh - Complete API testing
- scripts/test_batch_mode.sh - Batch mode validation
- DOCKER_API.md - Deployment guide
- TESTING_GUIDE.md - Testing procedures
## Configuration
- API_PORT environment variable (default: 8080)
- Backwards compatible with existing configs
- FastAPI, uvicorn, pydantic>=2.0 dependencies
Co-Authored-By: AI Assistant <noreply@example.com>
838 lines
21 KiB
Markdown
838 lines
21 KiB
Markdown
# AI-Trader API Service - Technical Specification
|
||
|
||
## 1. API Endpoints Specification
|
||
|
||
### 1.1 POST /simulate/trigger
|
||
|
||
**Purpose:** Trigger a catch-up simulation from the last completed date to the most recent trading day.
|
||
|
||
**Request:**
|
||
```http
|
||
POST /simulate/trigger HTTP/1.1
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**Response (202 Accepted):**
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "accepted",
|
||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||
"created_at": "2025-01-20T14:30:00Z",
|
||
"message": "Simulation job queued successfully"
|
||
}
|
||
```
|
||
|
||
**Response (200 OK - Job Already Running):**
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "running",
|
||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||
"progress": {
|
||
"total_model_days": 6,
|
||
"completed": 3,
|
||
"failed": 0,
|
||
"current": {
|
||
"date": "2025-01-17",
|
||
"model": "gpt-5"
|
||
}
|
||
},
|
||
"created_at": "2025-01-20T14:25:00Z",
|
||
"message": "Simulation already in progress"
|
||
}
|
||
```
|
||
|
||
**Response (200 OK - Already Up To Date):**
|
||
```json
|
||
{
|
||
"status": "current",
|
||
"message": "Simulation already up-to-date",
|
||
"last_simulation_date": "2025-01-20",
|
||
"next_trading_day": "2025-01-21"
|
||
}
|
||
```
|
||
|
||
**Response (409 Conflict):**
|
||
```json
|
||
{
|
||
"error": "conflict",
|
||
"message": "Different simulation already running",
|
||
"current_job_id": "previous-job-uuid",
|
||
"current_date_range": ["2025-01-10", "2025-01-15"]
|
||
}
|
||
```
|
||
|
||
**Business Logic:**
|
||
1. Load configuration from `config_path` (or default)
|
||
2. Determine last completed date from each model's `position.jsonl`
|
||
3. Calculate date range: `max(last_dates) + 1 day` → `most_recent_trading_day`
|
||
4. Filter for weekdays only (Monday-Friday)
|
||
5. If date_range is empty, return "already up-to-date"
|
||
6. Check for existing jobs with same date range → return existing job
|
||
7. Check for running jobs with different date range → return 409
|
||
8. Create new job in SQLite with status=`pending`
|
||
9. Queue background task to execute simulation
|
||
10. Return 202 with job details
|
||
|
||
---
|
||
|
||
### 1.2 GET /simulate/status/{job_id}
|
||
|
||
**Purpose:** Poll the status and progress of a simulation job.
|
||
|
||
**Request:**
|
||
```http
|
||
GET /simulate/status/550e8400-e29b-41d4-a716-446655440000 HTTP/1.1
|
||
```
|
||
|
||
**Response (200 OK - Running):**
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "running",
|
||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||
"progress": {
|
||
"total_model_days": 6,
|
||
"completed": 3,
|
||
"failed": 0,
|
||
"current": {
|
||
"date": "2025-01-17",
|
||
"model": "gpt-5"
|
||
},
|
||
"details": [
|
||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 42.1},
|
||
{"date": "2025-01-17", "model": "gpt-5", "status": "running", "duration_seconds": null}
|
||
]
|
||
},
|
||
"created_at": "2025-01-20T14:25:00Z",
|
||
"updated_at": "2025-01-20T14:27:15Z"
|
||
}
|
||
```
|
||
|
||
**Response (200 OK - Completed):**
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "completed",
|
||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||
"progress": {
|
||
"total_model_days": 6,
|
||
"completed": 6,
|
||
"failed": 0,
|
||
"details": [
|
||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 42.1},
|
||
{"date": "2025-01-17", "model": "gpt-5", "status": "completed", "duration_seconds": 40.3},
|
||
{"date": "2025-01-20", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 43.8},
|
||
{"date": "2025-01-20", "model": "gpt-5", "status": "completed", "duration_seconds": 39.1}
|
||
]
|
||
},
|
||
"created_at": "2025-01-20T14:25:00Z",
|
||
"completed_at": "2025-01-20T14:29:45Z",
|
||
"total_duration_seconds": 285.0
|
||
}
|
||
```
|
||
|
||
**Response (200 OK - Partial Failure):**
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "partial",
|
||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||
"progress": {
|
||
"total_model_days": 6,
|
||
"completed": 4,
|
||
"failed": 2,
|
||
"details": [
|
||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "failed", "error": "MCP service timeout after 3 retries", "duration_seconds": null},
|
||
{"date": "2025-01-17", "model": "gpt-5", "status": "completed", "duration_seconds": 40.3},
|
||
{"date": "2025-01-20", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 43.8},
|
||
{"date": "2025-01-20", "model": "gpt-5", "status": "failed", "error": "AI model API timeout", "duration_seconds": null}
|
||
]
|
||
},
|
||
"created_at": "2025-01-20T14:25:00Z",
|
||
"completed_at": "2025-01-20T14:29:45Z"
|
||
}
|
||
```
|
||
|
||
**Response (404 Not Found):**
|
||
```json
|
||
{
|
||
"error": "not_found",
|
||
"message": "Job not found",
|
||
"job_id": "invalid-job-id"
|
||
}
|
||
```
|
||
|
||
**Business Logic:**
|
||
1. Query SQLite jobs table for job_id
|
||
2. If not found, return 404
|
||
3. Return job metadata + progress from job_details table
|
||
4. Status transitions: `pending` → `running` → `completed`/`partial`/`failed`
|
||
|
||
---
|
||
|
||
### 1.3 GET /simulate/current
|
||
|
||
**Purpose:** Get the most recent simulation job (for Windmill to discover job_id).
|
||
|
||
**Request:**
|
||
```http
|
||
GET /simulate/current HTTP/1.1
|
||
```
|
||
|
||
**Response (200 OK):**
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "running",
|
||
"date_range": ["2025-01-16", "2025-01-17"],
|
||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||
"progress": {
|
||
"total_model_days": 4,
|
||
"completed": 2,
|
||
"failed": 0
|
||
},
|
||
"created_at": "2025-01-20T14:25:00Z"
|
||
}
|
||
```
|
||
|
||
**Response (404 Not Found):**
|
||
```json
|
||
{
|
||
"error": "not_found",
|
||
"message": "No simulation jobs found"
|
||
}
|
||
```
|
||
|
||
**Business Logic:**
|
||
1. Query SQLite: `SELECT * FROM jobs ORDER BY created_at DESC LIMIT 1`
|
||
2. Return job details with progress summary
|
||
|
||
---
|
||
|
||
### 1.4 GET /results
|
||
|
||
**Purpose:** Retrieve simulation results for a specific date and model.
|
||
|
||
**Request:**
|
||
```http
|
||
GET /results?date=2025-01-15&model=gpt-5&detail=minimal HTTP/1.1
|
||
```
|
||
|
||
**Query Parameters:**
|
||
- `date` (required): Trading date in YYYY-MM-DD format
|
||
- `model` (optional): Model signature (if omitted, returns all models)
|
||
- `detail` (optional): Response detail level
|
||
- `minimal` (default): Positions + daily P&L
|
||
- `full`: + trade history + AI reasoning logs + tool usage stats
|
||
|
||
**Response (200 OK - minimal):**
|
||
```json
|
||
{
|
||
"date": "2025-01-15",
|
||
"results": [
|
||
{
|
||
"model": "gpt-5",
|
||
"positions": {
|
||
"AAPL": 10,
|
||
"MSFT": 5,
|
||
"NVDA": 0,
|
||
"CASH": 8500.00
|
||
},
|
||
"daily_pnl": {
|
||
"profit": 150.50,
|
||
"return_pct": 1.5,
|
||
"portfolio_value": 10150.50
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Response (200 OK - full):**
|
||
```json
|
||
{
|
||
"date": "2025-01-15",
|
||
"results": [
|
||
{
|
||
"model": "gpt-5",
|
||
"positions": {
|
||
"AAPL": 10,
|
||
"MSFT": 5,
|
||
"CASH": 8500.00
|
||
},
|
||
"daily_pnl": {
|
||
"profit": 150.50,
|
||
"return_pct": 1.5,
|
||
"portfolio_value": 10150.50
|
||
},
|
||
"trades": [
|
||
{
|
||
"id": 1,
|
||
"action": "buy",
|
||
"symbol": "AAPL",
|
||
"amount": 10,
|
||
"price": 255.88,
|
||
"total": 2558.80
|
||
}
|
||
],
|
||
"ai_reasoning": {
|
||
"total_steps": 15,
|
||
"stop_signal_received": true,
|
||
"reasoning_summary": "Market analysis indicated strong buy signal for AAPL...",
|
||
"tool_usage": {
|
||
"search": 3,
|
||
"get_price": 5,
|
||
"math": 2,
|
||
"trade": 1
|
||
}
|
||
},
|
||
"log_file_path": "data/agent_data/gpt-5/log/2025-01-15/log.jsonl"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Response (400 Bad Request):**
|
||
```json
|
||
{
|
||
"error": "invalid_date",
|
||
"message": "Date must be in YYYY-MM-DD format"
|
||
}
|
||
```
|
||
|
||
**Response (404 Not Found):**
|
||
```json
|
||
{
|
||
"error": "no_data",
|
||
"message": "No simulation data found for date 2025-01-15 and model gpt-5"
|
||
}
|
||
```
|
||
|
||
**Business Logic:**
|
||
1. Validate date format
|
||
2. Read `position.jsonl` for specified model(s) and date
|
||
3. For `detail=minimal`: Return positions + calculate daily P&L
|
||
4. For `detail=full`:
|
||
- Parse `log.jsonl` to extract reasoning summary
|
||
- Count tool usage from log messages
|
||
- Extract trades from position file
|
||
5. Return aggregated results
|
||
|
||
---
|
||
|
||
### 1.5 GET /health
|
||
|
||
**Purpose:** Health check endpoint for Docker and monitoring.
|
||
|
||
**Request:**
|
||
```http
|
||
GET /health HTTP/1.1
|
||
```
|
||
|
||
**Response (200 OK):**
|
||
```json
|
||
{
|
||
"status": "healthy",
|
||
"timestamp": "2025-01-20T14:30:00Z",
|
||
"services": {
|
||
"mcp_math": {"status": "up", "url": "http://localhost:8000/mcp"},
|
||
"mcp_search": {"status": "up", "url": "http://localhost:8001/mcp"},
|
||
"mcp_trade": {"status": "up", "url": "http://localhost:8002/mcp"},
|
||
"mcp_getprice": {"status": "up", "url": "http://localhost:8003/mcp"}
|
||
},
|
||
"storage": {
|
||
"data_directory": "/app/data",
|
||
"writable": true,
|
||
"free_space_mb": 15234
|
||
},
|
||
"database": {
|
||
"status": "connected",
|
||
"path": "/app/data/jobs.db"
|
||
}
|
||
}
|
||
```
|
||
|
||
**Response (503 Service Unavailable):**
|
||
```json
|
||
{
|
||
"status": "unhealthy",
|
||
"timestamp": "2025-01-20T14:30:00Z",
|
||
"services": {
|
||
"mcp_math": {"status": "down", "url": "http://localhost:8000/mcp", "error": "Connection refused"},
|
||
"mcp_search": {"status": "up", "url": "http://localhost:8001/mcp"},
|
||
"mcp_trade": {"status": "up", "url": "http://localhost:8002/mcp"},
|
||
"mcp_getprice": {"status": "up", "url": "http://localhost:8003/mcp"}
|
||
},
|
||
"storage": {
|
||
"data_directory": "/app/data",
|
||
"writable": true
|
||
},
|
||
"database": {
|
||
"status": "connected"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 2. Data Models
|
||
|
||
### 2.1 SQLite Schema
|
||
|
||
**Table: jobs**
|
||
```sql
|
||
CREATE TABLE jobs (
|
||
job_id TEXT PRIMARY KEY,
|
||
config_path TEXT NOT NULL,
|
||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'partial', 'failed')),
|
||
date_range TEXT NOT NULL, -- JSON array of dates
|
||
models TEXT NOT NULL, -- JSON array of model signatures
|
||
created_at TEXT NOT NULL,
|
||
started_at TEXT,
|
||
completed_at TEXT,
|
||
total_duration_seconds REAL,
|
||
error TEXT
|
||
);
|
||
|
||
CREATE INDEX idx_jobs_status ON jobs(status);
|
||
CREATE INDEX idx_jobs_created_at ON jobs(created_at DESC);
|
||
```
|
||
|
||
**Table: job_details**
|
||
```sql
|
||
CREATE TABLE job_details (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
job_id TEXT NOT NULL,
|
||
date TEXT NOT NULL,
|
||
model TEXT NOT NULL,
|
||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed')),
|
||
started_at TEXT,
|
||
completed_at TEXT,
|
||
duration_seconds REAL,
|
||
error TEXT,
|
||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||
);
|
||
|
||
CREATE INDEX idx_job_details_job_id ON job_details(job_id);
|
||
CREATE INDEX idx_job_details_status ON job_details(status);
|
||
```
|
||
|
||
### 2.2 Pydantic Models
|
||
|
||
**Request Models:**
|
||
```python
|
||
from pydantic import BaseModel, Field
|
||
from typing import Optional, Literal
|
||
|
||
class TriggerSimulationRequest(BaseModel):
|
||
config_path: Optional[str] = Field(default="configs/default_config.json", description="Path to configuration file")
|
||
|
||
class ResultsQueryParams(BaseModel):
|
||
date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$", description="Date in YYYY-MM-DD format")
|
||
model: Optional[str] = Field(None, description="Model signature filter")
|
||
detail: Literal["minimal", "full"] = Field(default="minimal", description="Response detail level")
|
||
```
|
||
|
||
**Response Models:**
|
||
```python
|
||
class JobProgress(BaseModel):
|
||
total_model_days: int
|
||
completed: int
|
||
failed: int
|
||
current: Optional[dict] = None # {"date": str, "model": str}
|
||
details: Optional[list] = None # List of JobDetailResponse
|
||
|
||
class TriggerSimulationResponse(BaseModel):
|
||
job_id: str
|
||
status: str
|
||
date_range: list[str]
|
||
models: list[str]
|
||
created_at: str
|
||
message: str
|
||
progress: Optional[JobProgress] = None
|
||
|
||
class JobStatusResponse(BaseModel):
|
||
job_id: str
|
||
status: str
|
||
date_range: list[str]
|
||
models: list[str]
|
||
progress: JobProgress
|
||
created_at: str
|
||
updated_at: Optional[str] = None
|
||
completed_at: Optional[str] = None
|
||
total_duration_seconds: Optional[float] = None
|
||
|
||
class DailyPnL(BaseModel):
|
||
profit: float
|
||
return_pct: float
|
||
portfolio_value: float
|
||
|
||
class Trade(BaseModel):
|
||
id: int
|
||
action: str
|
||
symbol: str
|
||
amount: int
|
||
price: Optional[float] = None
|
||
total: Optional[float] = None
|
||
|
||
class AIReasoning(BaseModel):
|
||
total_steps: int
|
||
stop_signal_received: bool
|
||
reasoning_summary: str
|
||
tool_usage: dict[str, int]
|
||
|
||
class ModelResult(BaseModel):
|
||
model: str
|
||
positions: dict[str, float]
|
||
daily_pnl: DailyPnL
|
||
trades: Optional[list[Trade]] = None
|
||
ai_reasoning: Optional[AIReasoning] = None
|
||
log_file_path: Optional[str] = None
|
||
|
||
class ResultsResponse(BaseModel):
|
||
date: str
|
||
results: list[ModelResult]
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Configuration Management
|
||
|
||
### 3.1 Environment Variables
|
||
|
||
Required environment variables remain the same as batch mode:
|
||
```bash
|
||
# OpenAI API Configuration
|
||
OPENAI_API_BASE=https://api.openai.com/v1
|
||
OPENAI_API_KEY=sk-...
|
||
|
||
# Alpha Vantage API
|
||
ALPHAADVANTAGE_API_KEY=...
|
||
|
||
# Jina Search API
|
||
JINA_API_KEY=...
|
||
|
||
# Runtime Config Path (now shared by API and worker)
|
||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||
|
||
# MCP Service Ports
|
||
MATH_HTTP_PORT=8000
|
||
SEARCH_HTTP_PORT=8001
|
||
TRADE_HTTP_PORT=8002
|
||
GETPRICE_HTTP_PORT=8003
|
||
|
||
# API Server Configuration
|
||
API_HOST=0.0.0.0
|
||
API_PORT=8080
|
||
|
||
# Job Configuration
|
||
MAX_CONCURRENT_JOBS=1 # Only one simulation job at a time
|
||
```
|
||
|
||
### 3.2 Runtime State Management
|
||
|
||
**Challenge:** Multiple model-days running concurrently need isolated `runtime_env.json` state.
|
||
|
||
**Solution:** Per-job runtime config files
|
||
- `runtime_env_base.json` - Template
|
||
- `runtime_env_{job_id}_{model}_{date}.json` - Job-specific runtime config
|
||
- Worker passes custom `RUNTIME_ENV_PATH` to each simulation execution
|
||
|
||
**Modified `write_config_value()` and `get_config_value()`:**
|
||
- Accept optional `runtime_path` parameter
|
||
- Worker manages lifecycle: create → use → cleanup
|
||
|
||
---
|
||
|
||
## 4. Error Handling
|
||
|
||
### 4.1 Error Response Format
|
||
|
||
All errors follow this structure:
|
||
```json
|
||
{
|
||
"error": "error_code",
|
||
"message": "Human-readable error description",
|
||
"details": {
|
||
// Optional additional context
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4.2 HTTP Status Codes
|
||
|
||
- `200 OK` - Successful request
|
||
- `202 Accepted` - Job queued successfully
|
||
- `400 Bad Request` - Invalid input parameters
|
||
- `404 Not Found` - Resource not found (job, results)
|
||
- `409 Conflict` - Concurrent job conflict
|
||
- `500 Internal Server Error` - Unexpected server error
|
||
- `503 Service Unavailable` - Health check failed
|
||
|
||
### 4.3 Retry Strategy for Workers
|
||
|
||
Models run independently - failure of one model doesn't block others:
|
||
```python
|
||
async def run_model_day(job_id: str, date: str, model_config: dict):
|
||
try:
|
||
# Execute simulation for this model-day
|
||
await agent.run_trading_session(date)
|
||
update_job_detail_status(job_id, date, model, "completed")
|
||
except Exception as e:
|
||
# Log error, update status to failed, continue with next model-day
|
||
update_job_detail_status(job_id, date, model, "failed", error=str(e))
|
||
# Do NOT raise - let other models continue
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Concurrency & Locking
|
||
|
||
### 5.1 Job Execution Policy
|
||
|
||
**Rule:** Maximum 1 running job at a time (configurable via `MAX_CONCURRENT_JOBS`)
|
||
|
||
**Enforcement:**
|
||
```python
|
||
def can_start_new_job() -> bool:
|
||
running_jobs = db.query(
|
||
"SELECT COUNT(*) FROM jobs WHERE status IN ('pending', 'running')"
|
||
).fetchone()[0]
|
||
return running_jobs < MAX_CONCURRENT_JOBS
|
||
```
|
||
|
||
### 5.2 Position File Concurrency
|
||
|
||
**Challenge:** Multiple model-days writing to same model's `position.jsonl`
|
||
|
||
**Solution:** Sequential execution per model
|
||
```python
|
||
# For each date in date_range:
|
||
# For each model in parallel: ← Models run in parallel
|
||
# Execute model-day sequentially ← Dates for same model run sequentially
|
||
```
|
||
|
||
**Execution Pattern:**
|
||
```
|
||
Date 2025-01-16:
|
||
- Model A (running)
|
||
- Model B (running)
|
||
- Model C (running)
|
||
|
||
Date 2025-01-17: ← Starts only after all models finish 2025-01-16
|
||
- Model A (running)
|
||
- Model B (running)
|
||
- Model C (running)
|
||
```
|
||
|
||
**Rationale:**
|
||
- Models write to different position files → No conflict
|
||
- Same model's dates run sequentially → No race condition on position.jsonl
|
||
- Date-level parallelism across models → Faster overall execution
|
||
|
||
---
|
||
|
||
## 6. Performance Considerations
|
||
|
||
### 6.1 Execution Time Estimates
|
||
|
||
Based on current implementation:
|
||
- Single model-day: ~30-60 seconds (depends on AI model latency + tool calls)
|
||
- 3 models × 5 days = 15 model-days ≈ 7.5-15 minutes (parallel execution)
|
||
|
||
### 6.2 Timeout Configuration
|
||
|
||
**API Request Timeout:**
|
||
- `/simulate/trigger`: 10 seconds (just queue job)
|
||
- `/simulate/status`: 5 seconds (read from DB)
|
||
- `/results`: 30 seconds (file I/O + parsing)
|
||
|
||
**Worker Timeout:**
|
||
- Per model-day: 5 minutes (inherited from `max_retries` × `base_delay`)
|
||
- Entire job: No timeout (job runs until all model-days complete or fail)
|
||
|
||
### 6.3 Optimization Opportunities (Future)
|
||
|
||
1. **Results caching:** Store computed daily_pnl in SQLite to avoid recomputation
|
||
2. **Parallel date execution:** If position file locking is implemented, run dates in parallel
|
||
3. **Streaming responses:** For `/simulate/status`, use SSE to push updates instead of polling
|
||
|
||
---
|
||
|
||
## 7. Logging & Observability
|
||
|
||
### 7.1 Structured Logging
|
||
|
||
All API logs use JSON format:
|
||
```json
|
||
{
|
||
"timestamp": "2025-01-20T14:30:00Z",
|
||
"level": "INFO",
|
||
"logger": "api.worker",
|
||
"message": "Starting simulation for model-day",
|
||
"job_id": "550e8400-...",
|
||
"date": "2025-01-16",
|
||
"model": "gpt-5"
|
||
}
|
||
```
|
||
|
||
### 7.2 Log Levels
|
||
|
||
- `DEBUG` - Detailed execution flow (tool calls, price fetches)
|
||
- `INFO` - Job lifecycle events (created, started, completed)
|
||
- `WARNING` - Recoverable errors (retry attempts)
|
||
- `ERROR` - Model-day failures (logged but job continues)
|
||
- `CRITICAL` - System failures (MCP services down, DB corruption)
|
||
|
||
### 7.3 Audit Trail
|
||
|
||
All job state transitions logged to `api_audit.log`:
|
||
```json
|
||
{
|
||
"timestamp": "2025-01-20T14:30:00Z",
|
||
"event": "job_created",
|
||
"job_id": "550e8400-...",
|
||
"user": "windmill-service", // Future: from auth header
|
||
"details": {"date_range": [...], "models": [...]}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Security Considerations
|
||
|
||
### 8.1 Authentication (Future)
|
||
|
||
For MVP, API relies on network isolation (Docker network). Future enhancements:
|
||
- API key authentication via header: `X-API-Key: <token>`
|
||
- JWT tokens for Windmill integration
|
||
- Rate limiting per API key
|
||
|
||
### 8.2 Input Validation
|
||
|
||
- All date parameters validated with regex: `^\d{4}-\d{2}-\d{2}$`
|
||
- Config paths restricted to `configs/` directory (prevent path traversal)
|
||
- Model signatures sanitized (alphanumeric + hyphens only)
|
||
|
||
### 8.3 File Access Controls
|
||
|
||
- Results API only reads from `data/agent_data/` directory
|
||
- Config API only reads from `configs/` directory
|
||
- No arbitrary file read via API parameters
|
||
|
||
---
|
||
|
||
## 9. Deployment Configuration
|
||
|
||
### 9.1 Docker Compose
|
||
|
||
```yaml
|
||
version: '3.8'
|
||
|
||
services:
|
||
ai-trader-api:
|
||
build:
|
||
context: .
|
||
dockerfile: Dockerfile
|
||
ports:
|
||
- "8080:8080"
|
||
volumes:
|
||
- ./data:/app/data
|
||
- ./configs:/app/configs
|
||
env_file:
|
||
- .env
|
||
environment:
|
||
- MODE=api
|
||
- API_PORT=8080
|
||
healthcheck:
|
||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||
interval: 30s
|
||
timeout: 10s
|
||
retries: 3
|
||
start_period: 40s
|
||
restart: unless-stopped
|
||
```
|
||
|
||
### 9.2 Dockerfile Modifications
|
||
|
||
```dockerfile
|
||
# ... existing layers ...
|
||
|
||
# Install API dependencies
|
||
COPY requirements-api.txt /app/
|
||
RUN pip install --no-cache-dir -r requirements-api.txt
|
||
|
||
# Copy API application code
|
||
COPY api/ /app/api/
|
||
|
||
# Copy entrypoint script
|
||
COPY docker-entrypoint.sh /app/
|
||
RUN chmod +x /app/docker-entrypoint.sh
|
||
|
||
EXPOSE 8080
|
||
|
||
CMD ["/app/docker-entrypoint.sh"]
|
||
```
|
||
|
||
### 9.3 Entrypoint Script
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
set -e
|
||
|
||
echo "Starting MCP services..."
|
||
cd /app/agent_tools
|
||
python start_mcp_services.py &
|
||
MCP_PID=$!
|
||
|
||
echo "Waiting for MCP services to be ready..."
|
||
sleep 10
|
||
|
||
echo "Starting API server..."
|
||
cd /app
|
||
uvicorn api.main:app --host ${API_HOST:-0.0.0.0} --port ${API_PORT:-8080} --workers 1
|
||
|
||
# Cleanup on exit
|
||
trap "kill $MCP_PID 2>/dev/null || true" EXIT
|
||
```
|
||
|
||
---
|
||
|
||
## 10. API Versioning (Future)
|
||
|
||
For v2 and beyond:
|
||
- URL prefix: `/api/v1/simulate/trigger`, `/api/v2/simulate/trigger`
|
||
- Header-based: `Accept: application/vnd.ai-trader.v1+json`
|
||
|
||
MVP uses unversioned endpoints (implied v1).
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
After reviewing this specification, we'll proceed to:
|
||
1. **Component 2:** Job Manager & SQLite Schema Implementation
|
||
2. **Component 3:** Background Worker Architecture
|
||
3. **Component 4:** BaseAgent Refactoring for Single-Day Execution
|
||
4. **Component 5:** Docker & Deployment Configuration
|
||
5. **Component 6:** Windmill Integration Flows
|
||
|
||
Please review this API specification and provide feedback or approval to continue.
|