mirror of
https://github.com/Xe138/AI-Trader.git
synced 2026-04-01 17:17:24 -04:00
feat: transform to REST API service with SQLite persistence (v0.3.0)
Major architecture transformation from batch-only to API service with
database persistence for Windmill integration.
## REST API Implementation
- POST /simulate/trigger - Start simulation jobs
- GET /simulate/status/{job_id} - Monitor job progress
- GET /results - Query results with filters (job_id, date, model)
- GET /health - Service health checks
## Database Layer
- SQLite persistence with 6 tables (jobs, job_details, positions,
holdings, reasoning_logs, tool_usage)
- Foreign key constraints with cascade deletes
- Replaces JSONL file storage
## Backend Components
- JobManager: Job lifecycle management with concurrency control
- RuntimeConfigManager: Thread-safe isolated runtime configs
- ModelDayExecutor: Single model-day execution engine
- SimulationWorker: Date-sequential, model-parallel orchestration
## Testing
- 102 unit and integration tests (85% coverage)
- Database: 98% coverage
- Job manager: 98% coverage
- API endpoints: 81% coverage
- Pydantic models: 100% coverage
- TDD approach throughout
## Docker Deployment
- Dual-mode: API server (persistent) + batch (one-time)
- Health checks with 30s interval
- Volume persistence for database and logs
- Separate entrypoints for each mode
## Validation Tools
- scripts/validate_docker_build.sh - Build validation
- scripts/test_api_endpoints.sh - Complete API testing
- scripts/test_batch_mode.sh - Batch mode validation
- DOCKER_API.md - Deployment guide
- TESTING_GUIDE.md - Testing procedures
## Configuration
- API_PORT environment variable (default: 8080)
- Backwards compatible with existing configs
- FastAPI, uvicorn, pydantic>=2.0 dependencies
Co-Authored-By: AI Assistant <noreply@example.com>
This commit is contained in:
837
docs/api-specification.md
Normal file
837
docs/api-specification.md
Normal file
@@ -0,0 +1,837 @@
|
||||
# AI-Trader API Service - Technical Specification
|
||||
|
||||
## 1. API Endpoints Specification
|
||||
|
||||
### 1.1 POST /simulate/trigger
|
||||
|
||||
**Purpose:** Trigger a catch-up simulation from the last completed date to the most recent trading day.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
POST /simulate/trigger HTTP/1.1
|
||||
Content-Type: application/json
|
||||
|
||||
```
|
||||
|
||||
**Response (202 Accepted):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "accepted",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"created_at": "2025-01-20T14:30:00Z",
|
||||
"message": "Simulation job queued successfully"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Job Already Running):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 3,
|
||||
"failed": 0,
|
||||
"current": {
|
||||
"date": "2025-01-17",
|
||||
"model": "gpt-5"
|
||||
}
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"message": "Simulation already in progress"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Already Up To Date):**
|
||||
```json
|
||||
{
|
||||
"status": "current",
|
||||
"message": "Simulation already up-to-date",
|
||||
"last_simulation_date": "2025-01-20",
|
||||
"next_trading_day": "2025-01-21"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (409 Conflict):**
|
||||
```json
|
||||
{
|
||||
"error": "conflict",
|
||||
"message": "Different simulation already running",
|
||||
"current_job_id": "previous-job-uuid",
|
||||
"current_date_range": ["2025-01-10", "2025-01-15"]
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Load configuration from `config_path` (or default)
|
||||
2. Determine last completed date from each model's `position.jsonl`
|
||||
3. Calculate date range: `max(last_dates) + 1 day` → `most_recent_trading_day`
|
||||
4. Filter for weekdays only (Monday-Friday)
|
||||
5. If date_range is empty, return "already up-to-date"
|
||||
6. Check for existing jobs with same date range → return existing job
|
||||
7. Check for running jobs with different date range → return 409
|
||||
8. Create new job in SQLite with status=`pending`
|
||||
9. Queue background task to execute simulation
|
||||
10. Return 202 with job details
|
||||
|
||||
---
|
||||
|
||||
### 1.2 GET /simulate/status/{job_id}
|
||||
|
||||
**Purpose:** Poll the status and progress of a simulation job.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /simulate/status/550e8400-e29b-41d4-a716-446655440000 HTTP/1.1
|
||||
```
|
||||
|
||||
**Response (200 OK - Running):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 3,
|
||||
"failed": 0,
|
||||
"current": {
|
||||
"date": "2025-01-17",
|
||||
"model": "gpt-5"
|
||||
},
|
||||
"details": [
|
||||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 42.1},
|
||||
{"date": "2025-01-17", "model": "gpt-5", "status": "running", "duration_seconds": null}
|
||||
]
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"updated_at": "2025-01-20T14:27:15Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Completed):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "completed",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 6,
|
||||
"failed": 0,
|
||||
"details": [
|
||||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 42.1},
|
||||
{"date": "2025-01-17", "model": "gpt-5", "status": "completed", "duration_seconds": 40.3},
|
||||
{"date": "2025-01-20", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 43.8},
|
||||
{"date": "2025-01-20", "model": "gpt-5", "status": "completed", "duration_seconds": 39.1}
|
||||
]
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"completed_at": "2025-01-20T14:29:45Z",
|
||||
"total_duration_seconds": 285.0
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Partial Failure):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "partial",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 4,
|
||||
"failed": 2,
|
||||
"details": [
|
||||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "failed", "error": "MCP service timeout after 3 retries", "duration_seconds": null},
|
||||
{"date": "2025-01-17", "model": "gpt-5", "status": "completed", "duration_seconds": 40.3},
|
||||
{"date": "2025-01-20", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 43.8},
|
||||
{"date": "2025-01-20", "model": "gpt-5", "status": "failed", "error": "AI model API timeout", "duration_seconds": null}
|
||||
]
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"completed_at": "2025-01-20T14:29:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (404 Not Found):**
|
||||
```json
|
||||
{
|
||||
"error": "not_found",
|
||||
"message": "Job not found",
|
||||
"job_id": "invalid-job-id"
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Query SQLite jobs table for job_id
|
||||
2. If not found, return 404
|
||||
3. Return job metadata + progress from job_details table
|
||||
4. Status transitions: `pending` → `running` → `completed`/`partial`/`failed`
|
||||
|
||||
---
|
||||
|
||||
### 1.3 GET /simulate/current
|
||||
|
||||
**Purpose:** Get the most recent simulation job (for Windmill to discover job_id).
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /simulate/current HTTP/1.1
|
||||
```
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"date_range": ["2025-01-16", "2025-01-17"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 4,
|
||||
"completed": 2,
|
||||
"failed": 0
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (404 Not Found):**
|
||||
```json
|
||||
{
|
||||
"error": "not_found",
|
||||
"message": "No simulation jobs found"
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Query SQLite: `SELECT * FROM jobs ORDER BY created_at DESC LIMIT 1`
|
||||
2. Return job details with progress summary
|
||||
|
||||
---
|
||||
|
||||
### 1.4 GET /results
|
||||
|
||||
**Purpose:** Retrieve simulation results for a specific date and model.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /results?date=2025-01-15&model=gpt-5&detail=minimal HTTP/1.1
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
- `date` (required): Trading date in YYYY-MM-DD format
|
||||
- `model` (optional): Model signature (if omitted, returns all models)
|
||||
- `detail` (optional): Response detail level
|
||||
- `minimal` (default): Positions + daily P&L
|
||||
- `full`: + trade history + AI reasoning logs + tool usage stats
|
||||
|
||||
**Response (200 OK - minimal):**
|
||||
```json
|
||||
{
|
||||
"date": "2025-01-15",
|
||||
"results": [
|
||||
{
|
||||
"model": "gpt-5",
|
||||
"positions": {
|
||||
"AAPL": 10,
|
||||
"MSFT": 5,
|
||||
"NVDA": 0,
|
||||
"CASH": 8500.00
|
||||
},
|
||||
"daily_pnl": {
|
||||
"profit": 150.50,
|
||||
"return_pct": 1.5,
|
||||
"portfolio_value": 10150.50
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - full):**
|
||||
```json
|
||||
{
|
||||
"date": "2025-01-15",
|
||||
"results": [
|
||||
{
|
||||
"model": "gpt-5",
|
||||
"positions": {
|
||||
"AAPL": 10,
|
||||
"MSFT": 5,
|
||||
"CASH": 8500.00
|
||||
},
|
||||
"daily_pnl": {
|
||||
"profit": 150.50,
|
||||
"return_pct": 1.5,
|
||||
"portfolio_value": 10150.50
|
||||
},
|
||||
"trades": [
|
||||
{
|
||||
"id": 1,
|
||||
"action": "buy",
|
||||
"symbol": "AAPL",
|
||||
"amount": 10,
|
||||
"price": 255.88,
|
||||
"total": 2558.80
|
||||
}
|
||||
],
|
||||
"ai_reasoning": {
|
||||
"total_steps": 15,
|
||||
"stop_signal_received": true,
|
||||
"reasoning_summary": "Market analysis indicated strong buy signal for AAPL...",
|
||||
"tool_usage": {
|
||||
"search": 3,
|
||||
"get_price": 5,
|
||||
"math": 2,
|
||||
"trade": 1
|
||||
}
|
||||
},
|
||||
"log_file_path": "data/agent_data/gpt-5/log/2025-01-15/log.jsonl"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response (400 Bad Request):**
|
||||
```json
|
||||
{
|
||||
"error": "invalid_date",
|
||||
"message": "Date must be in YYYY-MM-DD format"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (404 Not Found):**
|
||||
```json
|
||||
{
|
||||
"error": "no_data",
|
||||
"message": "No simulation data found for date 2025-01-15 and model gpt-5"
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Validate date format
|
||||
2. Read `position.jsonl` for specified model(s) and date
|
||||
3. For `detail=minimal`: Return positions + calculate daily P&L
|
||||
4. For `detail=full`:
|
||||
- Parse `log.jsonl` to extract reasoning summary
|
||||
- Count tool usage from log messages
|
||||
- Extract trades from position file
|
||||
5. Return aggregated results
|
||||
|
||||
---
|
||||
|
||||
### 1.5 GET /health
|
||||
|
||||
**Purpose:** Health check endpoint for Docker and monitoring.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /health HTTP/1.1
|
||||
```
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"services": {
|
||||
"mcp_math": {"status": "up", "url": "http://localhost:8000/mcp"},
|
||||
"mcp_search": {"status": "up", "url": "http://localhost:8001/mcp"},
|
||||
"mcp_trade": {"status": "up", "url": "http://localhost:8002/mcp"},
|
||||
"mcp_getprice": {"status": "up", "url": "http://localhost:8003/mcp"}
|
||||
},
|
||||
"storage": {
|
||||
"data_directory": "/app/data",
|
||||
"writable": true,
|
||||
"free_space_mb": 15234
|
||||
},
|
||||
"database": {
|
||||
"status": "connected",
|
||||
"path": "/app/data/jobs.db"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response (503 Service Unavailable):**
|
||||
```json
|
||||
{
|
||||
"status": "unhealthy",
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"services": {
|
||||
"mcp_math": {"status": "down", "url": "http://localhost:8000/mcp", "error": "Connection refused"},
|
||||
"mcp_search": {"status": "up", "url": "http://localhost:8001/mcp"},
|
||||
"mcp_trade": {"status": "up", "url": "http://localhost:8002/mcp"},
|
||||
"mcp_getprice": {"status": "up", "url": "http://localhost:8003/mcp"}
|
||||
},
|
||||
"storage": {
|
||||
"data_directory": "/app/data",
|
||||
"writable": true
|
||||
},
|
||||
"database": {
|
||||
"status": "connected"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Data Models
|
||||
|
||||
### 2.1 SQLite Schema
|
||||
|
||||
**Table: jobs**
|
||||
```sql
|
||||
CREATE TABLE jobs (
|
||||
job_id TEXT PRIMARY KEY,
|
||||
config_path TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'partial', 'failed')),
|
||||
date_range TEXT NOT NULL, -- JSON array of dates
|
||||
models TEXT NOT NULL, -- JSON array of model signatures
|
||||
created_at TEXT NOT NULL,
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
total_duration_seconds REAL,
|
||||
error TEXT
|
||||
);
|
||||
|
||||
CREATE INDEX idx_jobs_status ON jobs(status);
|
||||
CREATE INDEX idx_jobs_created_at ON jobs(created_at DESC);
|
||||
```
|
||||
|
||||
**Table: job_details**
|
||||
```sql
|
||||
CREATE TABLE job_details (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed')),
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
duration_seconds REAL,
|
||||
error TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_job_details_job_id ON job_details(job_id);
|
||||
CREATE INDEX idx_job_details_status ON job_details(status);
|
||||
```
|
||||
|
||||
### 2.2 Pydantic Models
|
||||
|
||||
**Request Models:**
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, Literal
|
||||
|
||||
class TriggerSimulationRequest(BaseModel):
|
||||
config_path: Optional[str] = Field(default="configs/default_config.json", description="Path to configuration file")
|
||||
|
||||
class ResultsQueryParams(BaseModel):
|
||||
date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$", description="Date in YYYY-MM-DD format")
|
||||
model: Optional[str] = Field(None, description="Model signature filter")
|
||||
detail: Literal["minimal", "full"] = Field(default="minimal", description="Response detail level")
|
||||
```
|
||||
|
||||
**Response Models:**
|
||||
```python
|
||||
class JobProgress(BaseModel):
|
||||
total_model_days: int
|
||||
completed: int
|
||||
failed: int
|
||||
current: Optional[dict] = None # {"date": str, "model": str}
|
||||
details: Optional[list] = None # List of JobDetailResponse
|
||||
|
||||
class TriggerSimulationResponse(BaseModel):
|
||||
job_id: str
|
||||
status: str
|
||||
date_range: list[str]
|
||||
models: list[str]
|
||||
created_at: str
|
||||
message: str
|
||||
progress: Optional[JobProgress] = None
|
||||
|
||||
class JobStatusResponse(BaseModel):
|
||||
job_id: str
|
||||
status: str
|
||||
date_range: list[str]
|
||||
models: list[str]
|
||||
progress: JobProgress
|
||||
created_at: str
|
||||
updated_at: Optional[str] = None
|
||||
completed_at: Optional[str] = None
|
||||
total_duration_seconds: Optional[float] = None
|
||||
|
||||
class DailyPnL(BaseModel):
|
||||
profit: float
|
||||
return_pct: float
|
||||
portfolio_value: float
|
||||
|
||||
class Trade(BaseModel):
|
||||
id: int
|
||||
action: str
|
||||
symbol: str
|
||||
amount: int
|
||||
price: Optional[float] = None
|
||||
total: Optional[float] = None
|
||||
|
||||
class AIReasoning(BaseModel):
|
||||
total_steps: int
|
||||
stop_signal_received: bool
|
||||
reasoning_summary: str
|
||||
tool_usage: dict[str, int]
|
||||
|
||||
class ModelResult(BaseModel):
|
||||
model: str
|
||||
positions: dict[str, float]
|
||||
daily_pnl: DailyPnL
|
||||
trades: Optional[list[Trade]] = None
|
||||
ai_reasoning: Optional[AIReasoning] = None
|
||||
log_file_path: Optional[str] = None
|
||||
|
||||
class ResultsResponse(BaseModel):
|
||||
date: str
|
||||
results: list[ModelResult]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Configuration Management
|
||||
|
||||
### 3.1 Environment Variables
|
||||
|
||||
Required environment variables remain the same as batch mode:
|
||||
```bash
|
||||
# OpenAI API Configuration
|
||||
OPENAI_API_BASE=https://api.openai.com/v1
|
||||
OPENAI_API_KEY=sk-...
|
||||
|
||||
# Alpha Vantage API
|
||||
ALPHAADVANTAGE_API_KEY=...
|
||||
|
||||
# Jina Search API
|
||||
JINA_API_KEY=...
|
||||
|
||||
# Runtime Config Path (now shared by API and worker)
|
||||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
|
||||
# MCP Service Ports
|
||||
MATH_HTTP_PORT=8000
|
||||
SEARCH_HTTP_PORT=8001
|
||||
TRADE_HTTP_PORT=8002
|
||||
GETPRICE_HTTP_PORT=8003
|
||||
|
||||
# API Server Configuration
|
||||
API_HOST=0.0.0.0
|
||||
API_PORT=8080
|
||||
|
||||
# Job Configuration
|
||||
MAX_CONCURRENT_JOBS=1 # Only one simulation job at a time
|
||||
```
|
||||
|
||||
### 3.2 Runtime State Management
|
||||
|
||||
**Challenge:** Multiple model-days running concurrently need isolated `runtime_env.json` state.
|
||||
|
||||
**Solution:** Per-job runtime config files
|
||||
- `runtime_env_base.json` - Template
|
||||
- `runtime_env_{job_id}_{model}_{date}.json` - Job-specific runtime config
|
||||
- Worker passes custom `RUNTIME_ENV_PATH` to each simulation execution
|
||||
|
||||
**Modified `write_config_value()` and `get_config_value()`:**
|
||||
- Accept optional `runtime_path` parameter
|
||||
- Worker manages lifecycle: create → use → cleanup
|
||||
|
||||
---
|
||||
|
||||
## 4. Error Handling
|
||||
|
||||
### 4.1 Error Response Format
|
||||
|
||||
All errors follow this structure:
|
||||
```json
|
||||
{
|
||||
"error": "error_code",
|
||||
"message": "Human-readable error description",
|
||||
"details": {
|
||||
// Optional additional context
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 HTTP Status Codes
|
||||
|
||||
- `200 OK` - Successful request
|
||||
- `202 Accepted` - Job queued successfully
|
||||
- `400 Bad Request` - Invalid input parameters
|
||||
- `404 Not Found` - Resource not found (job, results)
|
||||
- `409 Conflict` - Concurrent job conflict
|
||||
- `500 Internal Server Error` - Unexpected server error
|
||||
- `503 Service Unavailable` - Health check failed
|
||||
|
||||
### 4.3 Retry Strategy for Workers
|
||||
|
||||
Models run independently - failure of one model doesn't block others:
|
||||
```python
|
||||
async def run_model_day(job_id: str, date: str, model_config: dict):
|
||||
try:
|
||||
# Execute simulation for this model-day
|
||||
await agent.run_trading_session(date)
|
||||
update_job_detail_status(job_id, date, model, "completed")
|
||||
except Exception as e:
|
||||
# Log error, update status to failed, continue with next model-day
|
||||
update_job_detail_status(job_id, date, model, "failed", error=str(e))
|
||||
# Do NOT raise - let other models continue
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Concurrency & Locking
|
||||
|
||||
### 5.1 Job Execution Policy
|
||||
|
||||
**Rule:** Maximum 1 running job at a time (configurable via `MAX_CONCURRENT_JOBS`)
|
||||
|
||||
**Enforcement:**
|
||||
```python
|
||||
def can_start_new_job() -> bool:
|
||||
running_jobs = db.query(
|
||||
"SELECT COUNT(*) FROM jobs WHERE status IN ('pending', 'running')"
|
||||
).fetchone()[0]
|
||||
return running_jobs < MAX_CONCURRENT_JOBS
|
||||
```
|
||||
|
||||
### 5.2 Position File Concurrency
|
||||
|
||||
**Challenge:** Multiple model-days writing to same model's `position.jsonl`
|
||||
|
||||
**Solution:** Sequential execution per model
|
||||
```python
|
||||
# For each date in date_range:
|
||||
# For each model in parallel: ← Models run in parallel
|
||||
# Execute model-day sequentially ← Dates for same model run sequentially
|
||||
```
|
||||
|
||||
**Execution Pattern:**
|
||||
```
|
||||
Date 2025-01-16:
|
||||
- Model A (running)
|
||||
- Model B (running)
|
||||
- Model C (running)
|
||||
|
||||
Date 2025-01-17: ← Starts only after all models finish 2025-01-16
|
||||
- Model A (running)
|
||||
- Model B (running)
|
||||
- Model C (running)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Models write to different position files → No conflict
|
||||
- Same model's dates run sequentially → No race condition on position.jsonl
|
||||
- Date-level parallelism across models → Faster overall execution
|
||||
|
||||
---
|
||||
|
||||
## 6. Performance Considerations
|
||||
|
||||
### 6.1 Execution Time Estimates
|
||||
|
||||
Based on current implementation:
|
||||
- Single model-day: ~30-60 seconds (depends on AI model latency + tool calls)
|
||||
- 3 models × 5 days = 15 model-days ≈ 7.5-15 minutes (parallel execution)
|
||||
|
||||
### 6.2 Timeout Configuration
|
||||
|
||||
**API Request Timeout:**
|
||||
- `/simulate/trigger`: 10 seconds (just queue job)
|
||||
- `/simulate/status`: 5 seconds (read from DB)
|
||||
- `/results`: 30 seconds (file I/O + parsing)
|
||||
|
||||
**Worker Timeout:**
|
||||
- Per model-day: 5 minutes (inherited from `max_retries` × `base_delay`)
|
||||
- Entire job: No timeout (job runs until all model-days complete or fail)
|
||||
|
||||
### 6.3 Optimization Opportunities (Future)
|
||||
|
||||
1. **Results caching:** Store computed daily_pnl in SQLite to avoid recomputation
|
||||
2. **Parallel date execution:** If position file locking is implemented, run dates in parallel
|
||||
3. **Streaming responses:** For `/simulate/status`, use SSE to push updates instead of polling
|
||||
|
||||
---
|
||||
|
||||
## 7. Logging & Observability
|
||||
|
||||
### 7.1 Structured Logging
|
||||
|
||||
All API logs use JSON format:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"level": "INFO",
|
||||
"logger": "api.worker",
|
||||
"message": "Starting simulation for model-day",
|
||||
"job_id": "550e8400-...",
|
||||
"date": "2025-01-16",
|
||||
"model": "gpt-5"
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 Log Levels
|
||||
|
||||
- `DEBUG` - Detailed execution flow (tool calls, price fetches)
|
||||
- `INFO` - Job lifecycle events (created, started, completed)
|
||||
- `WARNING` - Recoverable errors (retry attempts)
|
||||
- `ERROR` - Model-day failures (logged but job continues)
|
||||
- `CRITICAL` - System failures (MCP services down, DB corruption)
|
||||
|
||||
### 7.3 Audit Trail
|
||||
|
||||
All job state transitions logged to `api_audit.log`:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"event": "job_created",
|
||||
"job_id": "550e8400-...",
|
||||
"user": "windmill-service", // Future: from auth header
|
||||
"details": {"date_range": [...], "models": [...]}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Security Considerations
|
||||
|
||||
### 8.1 Authentication (Future)
|
||||
|
||||
For MVP, API relies on network isolation (Docker network). Future enhancements:
|
||||
- API key authentication via header: `X-API-Key: <token>`
|
||||
- JWT tokens for Windmill integration
|
||||
- Rate limiting per API key
|
||||
|
||||
### 8.2 Input Validation
|
||||
|
||||
- All date parameters validated with regex: `^\d{4}-\d{2}-\d{2}$`
|
||||
- Config paths restricted to `configs/` directory (prevent path traversal)
|
||||
- Model signatures sanitized (alphanumeric + hyphens only)
|
||||
|
||||
### 8.3 File Access Controls
|
||||
|
||||
- Results API only reads from `data/agent_data/` directory
|
||||
- Config API only reads from `configs/` directory
|
||||
- No arbitrary file read via API parameters
|
||||
|
||||
---
|
||||
|
||||
## 9. Deployment Configuration
|
||||
|
||||
### 9.1 Docker Compose
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
ai-trader-api:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
- ./configs:/app/configs
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
- MODE=api
|
||||
- API_PORT=8080
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
### 9.2 Dockerfile Modifications
|
||||
|
||||
```dockerfile
|
||||
# ... existing layers ...
|
||||
|
||||
# Install API dependencies
|
||||
COPY requirements-api.txt /app/
|
||||
RUN pip install --no-cache-dir -r requirements-api.txt
|
||||
|
||||
# Copy API application code
|
||||
COPY api/ /app/api/
|
||||
|
||||
# Copy entrypoint script
|
||||
COPY docker-entrypoint.sh /app/
|
||||
RUN chmod +x /app/docker-entrypoint.sh
|
||||
|
||||
EXPOSE 8080
|
||||
|
||||
CMD ["/app/docker-entrypoint.sh"]
|
||||
```
|
||||
|
||||
### 9.3 Entrypoint Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "Starting MCP services..."
|
||||
cd /app/agent_tools
|
||||
python start_mcp_services.py &
|
||||
MCP_PID=$!
|
||||
|
||||
echo "Waiting for MCP services to be ready..."
|
||||
sleep 10
|
||||
|
||||
echo "Starting API server..."
|
||||
cd /app
|
||||
uvicorn api.main:app --host ${API_HOST:-0.0.0.0} --port ${API_PORT:-8080} --workers 1
|
||||
|
||||
# Cleanup on exit
|
||||
trap "kill $MCP_PID 2>/dev/null || true" EXIT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. API Versioning (Future)
|
||||
|
||||
For v2 and beyond:
|
||||
- URL prefix: `/api/v1/simulate/trigger`, `/api/v2/simulate/trigger`
|
||||
- Header-based: `Accept: application/vnd.ai-trader.v1+json`
|
||||
|
||||
MVP uses unversioned endpoints (implied v1).
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After reviewing this specification, we'll proceed to:
|
||||
1. **Component 2:** Job Manager & SQLite Schema Implementation
|
||||
2. **Component 3:** Background Worker Architecture
|
||||
3. **Component 4:** BaseAgent Refactoring for Single-Day Execution
|
||||
4. **Component 5:** Docker & Deployment Configuration
|
||||
5. **Component 6:** Windmill Integration Flows
|
||||
|
||||
Please review this API specification and provide feedback or approval to continue.
|
||||
5. **Component 6:** Windmill Integration Flows
|
||||
|
||||
Please review this API specification and provide feedback or approval to continue.
|
||||
Reference in New Issue
Block a user