mirror of
https://github.com/Xe138/AI-Trader.git
synced 2026-04-01 17:17:24 -04:00
docs: restructure documentation for improved clarity and navigation
Reorganize documentation into user-focused, developer-focused, and deployment-focused sections. **New structure:** - Root: README.md (streamlined), QUICK_START.md, API_REFERENCE.md - docs/user-guide/: configuration, API usage, integrations, troubleshooting - docs/developer/: contributing, development setup, testing, architecture - docs/deployment/: Docker deployment, production checklist, monitoring - docs/reference/: environment variables, MCP tools, data formats **Changes:** - Streamline README.md from 831 to 469 lines - Create QUICK_START.md for 5-minute onboarding - Create API_REFERENCE.md as single source of truth for API - Remove 9 outdated specification docs (v0.2.0 API design) - Remove DOCKER_API.md (content consolidated into new structure) - Remove docs/plans/ directory with old design documents - Update CLAUDE.md with documentation structure guide - Remove orchestration-specific references **Benefits:** - Clear entry points for different audiences - No content duplication - Better discoverability through logical hierarchy - All content reflects current v0.3.0 API
This commit is contained in:
739
API_REFERENCE.md
Normal file
739
API_REFERENCE.md
Normal file
@@ -0,0 +1,739 @@
|
||||
# AI-Trader API Reference
|
||||
|
||||
Complete reference for the AI-Trader REST API service.
|
||||
|
||||
**Base URL:** `http://localhost:8080` (default)
|
||||
|
||||
**API Version:** 1.0.0
|
||||
|
||||
---
|
||||
|
||||
## Endpoints
|
||||
|
||||
### POST /simulate/trigger
|
||||
|
||||
Trigger a new simulation job for a specified date range and models.
|
||||
|
||||
**Request Body:**
|
||||
|
||||
```json
|
||||
{
|
||||
"start_date": "2025-01-16",
|
||||
"end_date": "2025-01-17",
|
||||
"models": ["gpt-4", "claude-3.7-sonnet"]
|
||||
}
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `start_date` | string | Yes | Start date in YYYY-MM-DD format |
|
||||
| `end_date` | string | No | End date in YYYY-MM-DD format. If omitted, simulates single day (uses `start_date`) |
|
||||
| `models` | array[string] | No | Model signatures to run. If omitted, uses all enabled models from server config |
|
||||
|
||||
**Response (200 OK):**
|
||||
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "pending",
|
||||
"total_model_days": 4,
|
||||
"message": "Simulation job created with 2 trading dates"
|
||||
}
|
||||
```
|
||||
|
||||
**Response Fields:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `job_id` | string | Unique UUID for this simulation job |
|
||||
| `status` | string | Job status: `pending`, `running`, `completed`, `partial`, or `failed` |
|
||||
| `total_model_days` | integer | Total number of model-day combinations to execute |
|
||||
| `message` | string | Human-readable status message |
|
||||
|
||||
**Error Responses:**
|
||||
|
||||
**400 Bad Request** - Invalid parameters or validation failure
|
||||
```json
|
||||
{
|
||||
"detail": "Invalid date format: 2025-1-16. Expected YYYY-MM-DD"
|
||||
}
|
||||
```
|
||||
|
||||
**400 Bad Request** - Another job is already running
|
||||
```json
|
||||
{
|
||||
"detail": "Another simulation job is already running or pending. Please wait for it to complete."
|
||||
}
|
||||
```
|
||||
|
||||
**500 Internal Server Error** - Server configuration issue
|
||||
```json
|
||||
{
|
||||
"detail": "Server configuration file not found: configs/default_config.json"
|
||||
}
|
||||
```
|
||||
|
||||
**503 Service Unavailable** - Price data download failed
|
||||
```json
|
||||
{
|
||||
"detail": "Failed to download any price data. Check ALPHAADVANTAGE_API_KEY."
|
||||
}
|
||||
```
|
||||
|
||||
**Validation Rules:**
|
||||
|
||||
- **Date format:** Must be YYYY-MM-DD
|
||||
- **Date validity:** Must be valid calendar dates
|
||||
- **Date order:** `start_date` must be <= `end_date`
|
||||
- **Future dates:** Cannot simulate future dates (must be <= today)
|
||||
- **Date range limit:** Maximum 30 days (configurable via `MAX_SIMULATION_DAYS`)
|
||||
- **Model signatures:** Must match models defined in server configuration
|
||||
- **Concurrency:** Only one simulation job can run at a time
|
||||
|
||||
**Behavior:**
|
||||
|
||||
1. Validates date range and parameters
|
||||
2. Determines which models to run (from request or server config)
|
||||
3. Checks for missing price data in date range
|
||||
4. Downloads missing data if `AUTO_DOWNLOAD_PRICE_DATA=true` (default)
|
||||
5. Identifies trading dates with complete price data (all symbols available)
|
||||
6. Creates job in database with status `pending`
|
||||
7. Starts background worker thread
|
||||
8. Returns immediately with job ID
|
||||
|
||||
**Examples:**
|
||||
|
||||
Single day, single model:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"start_date": "2025-01-16",
|
||||
"models": ["gpt-4"]
|
||||
}'
|
||||
```
|
||||
|
||||
Date range, all enabled models:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"start_date": "2025-01-16",
|
||||
"end_date": "2025-01-20"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /simulate/status/{job_id}
|
||||
|
||||
Get status and progress of a simulation job.
|
||||
|
||||
**URL Parameters:**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `job_id` | string | Job UUID from trigger response |
|
||||
|
||||
**Response (200 OK):**
|
||||
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"progress": {
|
||||
"total_model_days": 4,
|
||||
"completed": 2,
|
||||
"failed": 0,
|
||||
"pending": 2
|
||||
},
|
||||
"date_range": ["2025-01-16", "2025-01-17"],
|
||||
"models": ["gpt-4", "claude-3.7-sonnet"],
|
||||
"created_at": "2025-01-16T10:00:00Z",
|
||||
"started_at": "2025-01-16T10:00:05Z",
|
||||
"completed_at": null,
|
||||
"total_duration_seconds": null,
|
||||
"error": null,
|
||||
"details": [
|
||||
{
|
||||
"model_signature": "gpt-4",
|
||||
"trading_date": "2025-01-16",
|
||||
"status": "completed",
|
||||
"start_time": "2025-01-16T10:00:05Z",
|
||||
"end_time": "2025-01-16T10:05:23Z",
|
||||
"duration_seconds": 318.5,
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model_signature": "claude-3.7-sonnet",
|
||||
"trading_date": "2025-01-16",
|
||||
"status": "completed",
|
||||
"start_time": "2025-01-16T10:05:24Z",
|
||||
"end_time": "2025-01-16T10:10:12Z",
|
||||
"duration_seconds": 288.0,
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model_signature": "gpt-4",
|
||||
"trading_date": "2025-01-17",
|
||||
"status": "running",
|
||||
"start_time": "2025-01-16T10:10:13Z",
|
||||
"end_time": null,
|
||||
"duration_seconds": null,
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model_signature": "claude-3.7-sonnet",
|
||||
"trading_date": "2025-01-17",
|
||||
"status": "pending",
|
||||
"start_time": null,
|
||||
"end_time": null,
|
||||
"duration_seconds": null,
|
||||
"error": null
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response Fields:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `job_id` | string | Job UUID |
|
||||
| `status` | string | Overall job status |
|
||||
| `progress` | object | Progress summary |
|
||||
| `progress.total_model_days` | integer | Total model-day combinations |
|
||||
| `progress.completed` | integer | Successfully completed model-days |
|
||||
| `progress.failed` | integer | Failed model-days |
|
||||
| `progress.pending` | integer | Not yet started model-days |
|
||||
| `date_range` | array[string] | Trading dates in this job |
|
||||
| `models` | array[string] | Model signatures in this job |
|
||||
| `created_at` | string | ISO 8601 timestamp when job was created |
|
||||
| `started_at` | string | ISO 8601 timestamp when execution began |
|
||||
| `completed_at` | string | ISO 8601 timestamp when job finished |
|
||||
| `total_duration_seconds` | float | Total execution time in seconds |
|
||||
| `error` | string | Error message if job failed |
|
||||
| `details` | array[object] | Per model-day execution details |
|
||||
|
||||
**Job Status Values:**
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `pending` | Job created, waiting to start |
|
||||
| `running` | Job currently executing |
|
||||
| `completed` | All model-days completed successfully |
|
||||
| `partial` | Some model-days completed, some failed |
|
||||
| `failed` | All model-days failed |
|
||||
|
||||
**Model-Day Status Values:**
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `pending` | Not started yet |
|
||||
| `running` | Currently executing |
|
||||
| `completed` | Finished successfully |
|
||||
| `failed` | Execution failed (see `error` field) |
|
||||
|
||||
**Error Response:**
|
||||
|
||||
**404 Not Found** - Job doesn't exist
|
||||
```json
|
||||
{
|
||||
"detail": "Job 550e8400-e29b-41d4-a716-446655440000 not found"
|
||||
}
|
||||
```
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/simulate/status/550e8400-e29b-41d4-a716-446655440000
|
||||
```
|
||||
|
||||
**Polling Recommendation:**
|
||||
|
||||
Poll every 10-30 seconds until `status` is `completed`, `partial`, or `failed`.
|
||||
|
||||
---
|
||||
|
||||
### GET /results
|
||||
|
||||
Query simulation results with optional filters.
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
|-----------|------|----------|-------------|
|
||||
| `job_id` | string | No | Filter by job UUID |
|
||||
| `date` | string | No | Filter by trading date (YYYY-MM-DD) |
|
||||
| `model` | string | No | Filter by model signature |
|
||||
|
||||
**Response (200 OK):**
|
||||
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"id": 1,
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"date": "2025-01-16",
|
||||
"model": "gpt-4",
|
||||
"action_id": 1,
|
||||
"action_type": "buy",
|
||||
"symbol": "AAPL",
|
||||
"amount": 10,
|
||||
"price": 250.50,
|
||||
"cash": 7495.00,
|
||||
"portfolio_value": 10000.00,
|
||||
"daily_profit": 0.00,
|
||||
"daily_return_pct": 0.00,
|
||||
"created_at": "2025-01-16T10:05:23Z",
|
||||
"holdings": [
|
||||
{"symbol": "AAPL", "quantity": 10},
|
||||
{"symbol": "CASH", "quantity": 7495.00}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"date": "2025-01-16",
|
||||
"model": "gpt-4",
|
||||
"action_id": 2,
|
||||
"action_type": "buy",
|
||||
"symbol": "MSFT",
|
||||
"amount": 5,
|
||||
"price": 380.20,
|
||||
"cash": 5594.00,
|
||||
"portfolio_value": 10105.00,
|
||||
"daily_profit": 105.00,
|
||||
"daily_return_pct": 1.05,
|
||||
"created_at": "2025-01-16T10:05:23Z",
|
||||
"holdings": [
|
||||
{"symbol": "AAPL", "quantity": 10},
|
||||
{"symbol": "MSFT", "quantity": 5},
|
||||
{"symbol": "CASH", "quantity": 5594.00}
|
||||
]
|
||||
}
|
||||
],
|
||||
"count": 2
|
||||
}
|
||||
```
|
||||
|
||||
**Response Fields:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `results` | array[object] | Array of position records |
|
||||
| `count` | integer | Number of results returned |
|
||||
|
||||
**Position Record Fields:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | integer | Unique position record ID |
|
||||
| `job_id` | string | Job UUID this belongs to |
|
||||
| `date` | string | Trading date (YYYY-MM-DD) |
|
||||
| `model` | string | Model signature |
|
||||
| `action_id` | integer | Action sequence number (1, 2, 3...) for this model-day |
|
||||
| `action_type` | string | Action taken: `buy`, `sell`, or `hold` |
|
||||
| `symbol` | string | Stock symbol traded (or null for `hold`) |
|
||||
| `amount` | integer | Quantity traded (or null for `hold`) |
|
||||
| `price` | float | Price per share (or null for `hold`) |
|
||||
| `cash` | float | Cash balance after this action |
|
||||
| `portfolio_value` | float | Total portfolio value (cash + holdings) |
|
||||
| `daily_profit` | float | Profit/loss for this trading day |
|
||||
| `daily_return_pct` | float | Return percentage for this day |
|
||||
| `created_at` | string | ISO 8601 timestamp when recorded |
|
||||
| `holdings` | array[object] | Current holdings after this action |
|
||||
|
||||
**Holdings Object:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `symbol` | string | Stock symbol or "CASH" |
|
||||
| `quantity` | float | Shares owned (or cash amount) |
|
||||
|
||||
**Examples:**
|
||||
|
||||
All results for a specific job:
|
||||
```bash
|
||||
curl "http://localhost:8080/results?job_id=550e8400-e29b-41d4-a716-446655440000"
|
||||
```
|
||||
|
||||
Results for a specific date:
|
||||
```bash
|
||||
curl "http://localhost:8080/results?date=2025-01-16"
|
||||
```
|
||||
|
||||
Results for a specific model:
|
||||
```bash
|
||||
curl "http://localhost:8080/results?model=gpt-4"
|
||||
```
|
||||
|
||||
Combine filters:
|
||||
```bash
|
||||
curl "http://localhost:8080/results?job_id=550e8400-e29b-41d4-a716-446655440000&date=2025-01-16&model=gpt-4"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /health
|
||||
|
||||
Health check endpoint for monitoring and orchestration services.
|
||||
|
||||
**Response (200 OK):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"database": "connected",
|
||||
"timestamp": "2025-01-16T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Response Fields:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `status` | string | Overall service health: `healthy` or `unhealthy` |
|
||||
| `database` | string | Database connection status: `connected` or `disconnected` |
|
||||
| `timestamp` | string | ISO 8601 timestamp of health check |
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
- Docker health checks: `HEALTHCHECK CMD curl -f http://localhost:8080/health`
|
||||
- Monitoring systems: Poll every 30-60 seconds
|
||||
- Orchestration services: Verify availability before triggering simulations
|
||||
|
||||
---
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Trigger and Monitor a Simulation
|
||||
|
||||
1. **Trigger simulation:**
|
||||
```bash
|
||||
RESPONSE=$(curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"start_date": "2025-01-16", "end_date": "2025-01-17", "models": ["gpt-4"]}')
|
||||
|
||||
JOB_ID=$(echo $RESPONSE | jq -r '.job_id')
|
||||
echo "Job ID: $JOB_ID"
|
||||
```
|
||||
|
||||
2. **Poll for completion:**
|
||||
```bash
|
||||
while true; do
|
||||
STATUS=$(curl -s http://localhost:8080/simulate/status/$JOB_ID | jq -r '.status')
|
||||
echo "Status: $STATUS"
|
||||
|
||||
if [[ "$STATUS" == "completed" ]] || [[ "$STATUS" == "partial" ]] || [[ "$STATUS" == "failed" ]]; then
|
||||
break
|
||||
fi
|
||||
|
||||
sleep 10
|
||||
done
|
||||
```
|
||||
|
||||
3. **Retrieve results:**
|
||||
```bash
|
||||
curl "http://localhost:8080/results?job_id=$JOB_ID" | jq '.'
|
||||
```
|
||||
|
||||
### Scheduled Daily Simulations
|
||||
|
||||
Use a scheduler (cron, Airflow, etc.) to trigger simulations:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# daily_simulation.sh
|
||||
|
||||
# Calculate yesterday's date
|
||||
DATE=$(date -d "yesterday" +%Y-%m-%d)
|
||||
|
||||
# Trigger simulation
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"start_date\": \"$DATE\", \"models\": [\"gpt-4\"]}"
|
||||
```
|
||||
|
||||
Add to crontab:
|
||||
```
|
||||
0 6 * * * /path/to/daily_simulation.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
All endpoints return consistent error responses with HTTP status codes and detail messages.
|
||||
|
||||
### Common Error Codes
|
||||
|
||||
| Code | Meaning | Common Causes |
|
||||
|------|---------|---------------|
|
||||
| 400 | Bad Request | Invalid date format, invalid parameters, concurrent job running |
|
||||
| 404 | Not Found | Job ID doesn't exist |
|
||||
| 500 | Internal Server Error | Server misconfiguration, missing config file |
|
||||
| 503 | Service Unavailable | Price data download failed, database unavailable |
|
||||
|
||||
### Error Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Human-readable error message"
|
||||
}
|
||||
```
|
||||
|
||||
### Retry Recommendations
|
||||
|
||||
- **400 errors:** Fix request parameters, don't retry
|
||||
- **404 errors:** Verify job ID, don't retry
|
||||
- **500 errors:** Check server logs, investigate before retrying
|
||||
- **503 errors:** Retry with exponential backoff (wait 1s, 2s, 4s, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Rate Limits and Constraints
|
||||
|
||||
### Concurrency
|
||||
|
||||
- **Maximum concurrent jobs:** 1 (configurable via `MAX_CONCURRENT_JOBS`)
|
||||
- **Attempting to start a second job returns:** 400 Bad Request
|
||||
|
||||
### Date Range Limits
|
||||
|
||||
- **Maximum date range:** 30 days (configurable via `MAX_SIMULATION_DAYS`)
|
||||
- **Attempting longer range returns:** 400 Bad Request
|
||||
|
||||
### Price Data
|
||||
|
||||
- **Alpha Vantage API rate limit:** 5 requests/minute (free tier), 75 requests/minute (premium)
|
||||
- **Automatic download:** Enabled by default (`AUTO_DOWNLOAD_PRICE_DATA=true`)
|
||||
- **Behavior when rate limited:** Partial data downloaded, simulation continues with available dates
|
||||
|
||||
---
|
||||
|
||||
## Data Persistence
|
||||
|
||||
All simulation data is stored in SQLite database at `data/jobs.db`.
|
||||
|
||||
### Database Tables
|
||||
|
||||
- **jobs** - Job metadata and status
|
||||
- **job_details** - Per model-day execution details
|
||||
- **positions** - Trading position records
|
||||
- **holdings** - Portfolio holdings breakdown
|
||||
- **reasoning_logs** - AI decision reasoning (if enabled)
|
||||
- **tool_usage** - MCP tool usage statistics
|
||||
- **price_data** - Historical price data cache
|
||||
- **price_coverage** - Data availability tracking
|
||||
|
||||
### Data Retention
|
||||
|
||||
- Job data persists indefinitely by default
|
||||
- Results can be queried at any time after job completion
|
||||
- Manual cleanup: Delete rows from `jobs` table (cascades to related tables)
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
API behavior is controlled via environment variables and server configuration file.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
See [docs/reference/environment-variables.md](docs/reference/environment-variables.md) for complete reference.
|
||||
|
||||
**Key variables:**
|
||||
|
||||
- `API_PORT` - API server port (default: 8080)
|
||||
- `MAX_CONCURRENT_JOBS` - Maximum concurrent simulations (default: 1)
|
||||
- `MAX_SIMULATION_DAYS` - Maximum date range (default: 30)
|
||||
- `AUTO_DOWNLOAD_PRICE_DATA` - Auto-download missing data (default: true)
|
||||
- `ALPHAADVANTAGE_API_KEY` - Alpha Vantage API key (required)
|
||||
|
||||
### Server Configuration File
|
||||
|
||||
Server loads model definitions from configuration file (default: `configs/default_config.json`).
|
||||
|
||||
**Example config:**
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "GPT-4",
|
||||
"basemodel": "openai/gpt-4",
|
||||
"signature": "gpt-4",
|
||||
"enabled": true
|
||||
},
|
||||
{
|
||||
"name": "Claude 3.7 Sonnet",
|
||||
"basemodel": "anthropic/claude-3.7-sonnet",
|
||||
"signature": "claude-3.7-sonnet",
|
||||
"enabled": true
|
||||
}
|
||||
],
|
||||
"agent_config": {
|
||||
"max_steps": 30,
|
||||
"initial_cash": 10000.0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Model fields:**
|
||||
|
||||
- `signature` - Unique identifier used in API requests
|
||||
- `enabled` - Whether model runs when no models specified in request
|
||||
- `basemodel` - Model identifier for AI provider
|
||||
- `openai_base_url` - Optional custom API endpoint
|
||||
- `openai_api_key` - Optional model-specific API key
|
||||
|
||||
---
|
||||
|
||||
## OpenAPI / Swagger Documentation
|
||||
|
||||
Interactive API documentation available at:
|
||||
|
||||
- Swagger UI: `http://localhost:8080/docs`
|
||||
- ReDoc: `http://localhost:8080/redoc`
|
||||
- OpenAPI JSON: `http://localhost:8080/openapi.json`
|
||||
|
||||
---
|
||||
|
||||
## Client Libraries
|
||||
|
||||
### Python
|
||||
|
||||
```python
|
||||
import requests
|
||||
import time
|
||||
|
||||
class AITraderClient:
|
||||
def __init__(self, base_url="http://localhost:8080"):
|
||||
self.base_url = base_url
|
||||
|
||||
def trigger_simulation(self, start_date, end_date=None, models=None):
|
||||
"""Trigger a simulation job."""
|
||||
payload = {"start_date": start_date}
|
||||
if end_date:
|
||||
payload["end_date"] = end_date
|
||||
if models:
|
||||
payload["models"] = models
|
||||
|
||||
response = requests.post(
|
||||
f"{self.base_url}/simulate/trigger",
|
||||
json=payload
|
||||
)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def get_status(self, job_id):
|
||||
"""Get job status."""
|
||||
response = requests.get(f"{self.base_url}/simulate/status/{job_id}")
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def wait_for_completion(self, job_id, poll_interval=10):
|
||||
"""Poll until job completes."""
|
||||
while True:
|
||||
status = self.get_status(job_id)
|
||||
if status["status"] in ["completed", "partial", "failed"]:
|
||||
return status
|
||||
time.sleep(poll_interval)
|
||||
|
||||
def get_results(self, job_id=None, date=None, model=None):
|
||||
"""Query results with optional filters."""
|
||||
params = {}
|
||||
if job_id:
|
||||
params["job_id"] = job_id
|
||||
if date:
|
||||
params["date"] = date
|
||||
if model:
|
||||
params["model"] = model
|
||||
|
||||
response = requests.get(f"{self.base_url}/results", params=params)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
# Usage
|
||||
client = AITraderClient()
|
||||
job = client.trigger_simulation("2025-01-16", models=["gpt-4"])
|
||||
result = client.wait_for_completion(job["job_id"])
|
||||
results = client.get_results(job_id=job["job_id"])
|
||||
```
|
||||
|
||||
### TypeScript/JavaScript
|
||||
|
||||
```typescript
|
||||
class AITraderClient {
|
||||
constructor(private baseUrl: string = "http://localhost:8080") {}
|
||||
|
||||
async triggerSimulation(
|
||||
startDate: string,
|
||||
endDate?: string,
|
||||
models?: string[]
|
||||
) {
|
||||
const body: any = { start_date: startDate };
|
||||
if (endDate) body.end_date = endDate;
|
||||
if (models) body.models = models;
|
||||
|
||||
const response = await fetch(`${this.baseUrl}/simulate/trigger`, {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify(body)
|
||||
});
|
||||
|
||||
if (!response.ok) throw new Error(`HTTP ${response.status}`);
|
||||
return response.json();
|
||||
}
|
||||
|
||||
async getStatus(jobId: string) {
|
||||
const response = await fetch(
|
||||
`${this.baseUrl}/simulate/status/${jobId}`
|
||||
);
|
||||
if (!response.ok) throw new Error(`HTTP ${response.status}`);
|
||||
return response.json();
|
||||
}
|
||||
|
||||
async waitForCompletion(jobId: string, pollInterval: number = 10000) {
|
||||
while (true) {
|
||||
const status = await this.getStatus(jobId);
|
||||
if (["completed", "partial", "failed"].includes(status.status)) {
|
||||
return status;
|
||||
}
|
||||
await new Promise(resolve => setTimeout(resolve, pollInterval));
|
||||
}
|
||||
}
|
||||
|
||||
async getResults(filters: {
|
||||
jobId?: string;
|
||||
date?: string;
|
||||
model?: string;
|
||||
} = {}) {
|
||||
const params = new URLSearchParams();
|
||||
if (filters.jobId) params.set("job_id", filters.jobId);
|
||||
if (filters.date) params.set("date", filters.date);
|
||||
if (filters.model) params.set("model", filters.model);
|
||||
|
||||
const response = await fetch(
|
||||
`${this.baseUrl}/results?${params.toString()}`
|
||||
);
|
||||
if (!response.ok) throw new Error(`HTTP ${response.status}`);
|
||||
return response.json();
|
||||
}
|
||||
}
|
||||
|
||||
// Usage
|
||||
const client = new AITraderClient();
|
||||
const job = await client.triggerSimulation("2025-01-16", null, ["gpt-4"]);
|
||||
const result = await client.waitForCompletion(job.job_id);
|
||||
const results = await client.getResults({ jobId: job.job_id });
|
||||
```
|
||||
42
CLAUDE.md
42
CLAUDE.md
@@ -303,6 +303,48 @@ When modifying agent behavior or adding tools:
|
||||
4. Verify position updates in `position/position.jsonl`
|
||||
5. Use `main.sh` only for full end-to-end testing
|
||||
|
||||
See [docs/developer/testing.md](docs/developer/testing.md) for complete testing guide.
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
The project uses a well-organized documentation structure:
|
||||
|
||||
### Root Level (User-facing)
|
||||
- **README.md** - Project overview, quick start, API overview
|
||||
- **QUICK_START.md** - 5-minute getting started guide
|
||||
- **API_REFERENCE.md** - Complete API endpoint documentation
|
||||
- **CHANGELOG.md** - Release notes and version history
|
||||
- **TESTING_GUIDE.md** - Testing and validation procedures
|
||||
|
||||
### docs/user-guide/
|
||||
- `configuration.md` - Environment setup and model configuration
|
||||
- `using-the-api.md` - Common workflows and best practices
|
||||
- `integration-examples.md` - Python, TypeScript, automation examples
|
||||
- `troubleshooting.md` - Common issues and solutions
|
||||
|
||||
### docs/developer/
|
||||
- `CONTRIBUTING.md` - Contribution guidelines
|
||||
- `development-setup.md` - Local development without Docker
|
||||
- `testing.md` - Running tests and validation
|
||||
- `architecture.md` - System design and components
|
||||
- `database-schema.md` - SQLite table reference
|
||||
- `adding-models.md` - How to add custom AI models
|
||||
|
||||
### docs/deployment/
|
||||
- `docker-deployment.md` - Production Docker setup
|
||||
- `production-checklist.md` - Pre-deployment verification
|
||||
- `monitoring.md` - Health checks, logging, metrics
|
||||
- `scaling.md` - Multiple instances and load balancing
|
||||
|
||||
### docs/reference/
|
||||
- `environment-variables.md` - Configuration reference
|
||||
- `mcp-tools.md` - Trading tool documentation
|
||||
- `data-formats.md` - File formats and schemas
|
||||
|
||||
### docs/ (Maintainer docs)
|
||||
- `DOCKER.md` - Docker deployment details
|
||||
- `RELEASING.md` - Release process for maintainers
|
||||
|
||||
## Common Issues
|
||||
|
||||
**MCP Services Not Running:**
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
We provide QR codes for joining the HKUDS discussion groups on WeChat and Feishu.
|
||||
|
||||
You can join by scanning the QR codes below:
|
||||
|
||||
<img src="https://github.com/HKUDS/.github/blob/main/profile/QR.png" alt="WeChat QR Code" width="400"/>
|
||||
|
||||
347
DOCKER_API.md
347
DOCKER_API.md
@@ -1,347 +0,0 @@
|
||||
# Docker API Server Deployment
|
||||
|
||||
This guide explains how to run AI-Trader as a persistent REST API server using Docker for Windmill.dev integration.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Environment Setup
|
||||
|
||||
```bash
|
||||
# Copy environment template
|
||||
cp .env.example .env
|
||||
|
||||
# Edit .env and add your API keys:
|
||||
# - OPENAI_API_KEY
|
||||
# - ALPHAADVANTAGE_API_KEY
|
||||
# - JINA_API_KEY
|
||||
```
|
||||
|
||||
### 2. Start API Server
|
||||
|
||||
```bash
|
||||
# Start in API mode (default)
|
||||
docker-compose up -d ai-trader-api
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f ai-trader-api
|
||||
|
||||
# Check health
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
### 3. Test API Endpoints
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Trigger simulation
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"config_path": "/app/configs/default_config.json",
|
||||
"date_range": ["2025-01-16", "2025-01-17"],
|
||||
"models": ["gpt-4"]
|
||||
}'
|
||||
|
||||
# Check job status (replace JOB_ID)
|
||||
curl http://localhost:8080/simulate/status/JOB_ID
|
||||
|
||||
# Query results
|
||||
curl http://localhost:8080/results?date=2025-01-16
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Two Deployment Modes
|
||||
|
||||
**API Server Mode** (Windmill integration):
|
||||
- REST API on port 8080
|
||||
- Background job execution
|
||||
- Persistent SQLite database
|
||||
- Continuous uptime with health checks
|
||||
- Start with: `docker-compose up -d ai-trader-api`
|
||||
|
||||
**Batch Mode** (one-time simulation):
|
||||
- Command-line execution
|
||||
- Runs to completion then exits
|
||||
- Config file driven
|
||||
- Start with: `docker-compose --profile batch up ai-trader-batch`
|
||||
|
||||
### Port Configuration
|
||||
|
||||
| Service | Internal Port | Default Host Port | Environment Variable |
|
||||
|---------|--------------|-------------------|---------------------|
|
||||
| API Server | 8080 | 8080 | `API_PORT` |
|
||||
| Math MCP | 8000 | 8000 | `MATH_HTTP_PORT` |
|
||||
| Search MCP | 8001 | 8001 | `SEARCH_HTTP_PORT` |
|
||||
| Trade MCP | 8002 | 8002 | `TRADE_HTTP_PORT` |
|
||||
| Price MCP | 8003 | 8003 | `GETPRICE_HTTP_PORT` |
|
||||
| Web Dashboard | 8888 | 8888 | `WEB_HTTP_PORT` |
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /simulate/trigger
|
||||
Trigger a new simulation job.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"config_path": "/app/configs/default_config.json",
|
||||
"date_range": ["2025-01-16", "2025-01-17"],
|
||||
"models": ["gpt-4", "claude-3.7-sonnet"]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "pending",
|
||||
"total_model_days": 4,
|
||||
"message": "Simulation job created and started"
|
||||
}
|
||||
```
|
||||
|
||||
### GET /simulate/status/{job_id}
|
||||
Get job progress and status.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"progress": {
|
||||
"total_model_days": 4,
|
||||
"completed": 2,
|
||||
"failed": 0,
|
||||
"pending": 2
|
||||
},
|
||||
"date_range": ["2025-01-16", "2025-01-17"],
|
||||
"models": ["gpt-4", "claude-3.7-sonnet"],
|
||||
"created_at": "2025-01-16T10:00:00Z",
|
||||
"details": [
|
||||
{
|
||||
"date": "2025-01-16",
|
||||
"model": "gpt-4",
|
||||
"status": "completed",
|
||||
"started_at": "2025-01-16T10:00:05Z",
|
||||
"completed_at": "2025-01-16T10:05:23Z",
|
||||
"duration_seconds": 318.5
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### GET /results
|
||||
Query simulation results with optional filters.
|
||||
|
||||
**Parameters:**
|
||||
- `job_id` (optional): Filter by job UUID
|
||||
- `date` (optional): Filter by trading date (YYYY-MM-DD)
|
||||
- `model` (optional): Filter by model signature
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"id": 1,
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"date": "2025-01-16",
|
||||
"model": "gpt-4",
|
||||
"action_id": 1,
|
||||
"action_type": "buy",
|
||||
"symbol": "AAPL",
|
||||
"amount": 10,
|
||||
"price": 250.50,
|
||||
"cash": 7495.00,
|
||||
"portfolio_value": 10000.00,
|
||||
"daily_profit": 0.00,
|
||||
"daily_return_pct": 0.00,
|
||||
"holdings": [
|
||||
{"symbol": "AAPL", "quantity": 10},
|
||||
{"symbol": "CASH", "quantity": 7495.00}
|
||||
]
|
||||
}
|
||||
],
|
||||
"count": 1
|
||||
}
|
||||
```
|
||||
|
||||
### GET /health
|
||||
Service health check.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"database": "connected",
|
||||
"timestamp": "2025-01-16T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
## Volume Mounts
|
||||
|
||||
Data persists across container restarts via volume mounts:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- ./data:/app/data # SQLite database, price data
|
||||
- ./logs:/app/logs # Application logs
|
||||
- ./configs:/app/configs # Configuration files
|
||||
```
|
||||
|
||||
**Key files:**
|
||||
- `/app/data/jobs.db` - SQLite database with job history and results
|
||||
- `/app/data/merged.jsonl` - Cached price data (fetched on first run)
|
||||
- `/app/logs/` - Application and MCP service logs
|
||||
|
||||
## Configuration
|
||||
|
||||
### Custom Config File
|
||||
|
||||
Place config files in `./configs/` directory:
|
||||
|
||||
```json
|
||||
{
|
||||
"agent_type": "BaseAgent",
|
||||
"date_range": {
|
||||
"init_date": "2025-01-01",
|
||||
"end_date": "2025-01-31"
|
||||
},
|
||||
"models": [
|
||||
{
|
||||
"name": "GPT-4",
|
||||
"basemodel": "gpt-4",
|
||||
"signature": "gpt-4",
|
||||
"enabled": true
|
||||
}
|
||||
],
|
||||
"agent_config": {
|
||||
"max_steps": 30,
|
||||
"initial_cash": 10000.0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Reference in API calls: `/app/configs/your_config.json`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Container Status
|
||||
```bash
|
||||
docker-compose ps
|
||||
docker-compose logs ai-trader-api
|
||||
```
|
||||
|
||||
### Health Check Failing
|
||||
```bash
|
||||
# Check if services started
|
||||
docker exec ai-trader-api ps aux
|
||||
|
||||
# Test internal health
|
||||
docker exec ai-trader-api curl http://localhost:8080/health
|
||||
|
||||
# Check MCP services
|
||||
docker exec ai-trader-api curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
### Database Issues
|
||||
```bash
|
||||
# View database
|
||||
docker exec ai-trader-api sqlite3 data/jobs.db ".tables"
|
||||
|
||||
# Reset database (WARNING: deletes all data)
|
||||
rm ./data/jobs.db
|
||||
docker-compose restart ai-trader-api
|
||||
```
|
||||
|
||||
### Port Conflicts
|
||||
If ports are already in use, edit `.env`:
|
||||
```bash
|
||||
API_PORT=9080 # Change to available port
|
||||
```
|
||||
|
||||
## Windmill Integration
|
||||
|
||||
Example Windmill workflow step:
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
def trigger_simulation(
|
||||
api_url: str,
|
||||
config_path: str,
|
||||
start_date: str,
|
||||
end_date: str,
|
||||
models: list[str]
|
||||
):
|
||||
"""Trigger AI trading simulation via API."""
|
||||
|
||||
response = httpx.post(
|
||||
f"{api_url}/simulate/trigger",
|
||||
json={
|
||||
"config_path": config_path,
|
||||
"date_range": [start_date, end_date],
|
||||
"models": models
|
||||
},
|
||||
timeout=30.0
|
||||
)
|
||||
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def check_status(api_url: str, job_id: str):
|
||||
"""Check simulation job status."""
|
||||
|
||||
response = httpx.get(
|
||||
f"{api_url}/simulate/status/{job_id}",
|
||||
timeout=10.0
|
||||
)
|
||||
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Use Docker Hub Image
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
ai-trader-api:
|
||||
image: ghcr.io/xe138/ai-trader:latest
|
||||
# ... rest of config
|
||||
```
|
||||
|
||||
### Build Locally
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
ai-trader-api:
|
||||
build: .
|
||||
# ... rest of config
|
||||
```
|
||||
|
||||
### Environment Security
|
||||
- Never commit `.env` to version control
|
||||
- Use secrets management in production (Docker secrets, Kubernetes secrets, etc.)
|
||||
- Rotate API keys regularly
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Prometheus Metrics (Future)
|
||||
Metrics endpoint planned: `GET /metrics`
|
||||
|
||||
### Log Aggregation
|
||||
- Container logs: `docker-compose logs -f`
|
||||
- Application logs: `./logs/api.log`
|
||||
- MCP service logs: `./logs/mcp_*.log`
|
||||
|
||||
## Scaling Considerations
|
||||
|
||||
- Single-job concurrency enforced by database lock
|
||||
- For parallel simulations, deploy multiple instances with separate databases
|
||||
- Consider load balancer for high-availability setup
|
||||
- Database size grows with number of simulations (plan for cleanup/archival)
|
||||
373
QUICK_START.md
Normal file
373
QUICK_START.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# Quick Start Guide
|
||||
|
||||
Get AI-Trader running in under 5 minutes using Docker.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Docker** and **Docker Compose** installed
|
||||
- [Install Docker Desktop](https://www.docker.com/products/docker-desktop/) (includes both)
|
||||
- **API Keys:**
|
||||
- OpenAI API key ([get one here](https://platform.openai.com/api-keys))
|
||||
- Alpha Vantage API key ([free tier](https://www.alphavantage.co/support/#api-key))
|
||||
- Jina AI API key ([free tier](https://jina.ai/))
|
||||
- **System Requirements:**
|
||||
- 2GB free disk space
|
||||
- Internet connection
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Clone Repository
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Xe138/AI-Trader.git
|
||||
cd AI-Trader
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Configure Environment
|
||||
|
||||
Create `.env` file with your API keys:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
Edit `.env` and add your keys:
|
||||
|
||||
```bash
|
||||
# Required API Keys
|
||||
OPENAI_API_KEY=sk-your-openai-key-here
|
||||
ALPHAADVANTAGE_API_KEY=your-alpha-vantage-key-here
|
||||
JINA_API_KEY=your-jina-key-here
|
||||
|
||||
# Optional: Custom OpenAI endpoint
|
||||
# OPENAI_API_BASE=https://api.openai.com/v1
|
||||
|
||||
# Optional: API server port (default: 8080)
|
||||
# API_PORT=8080
|
||||
```
|
||||
|
||||
**Save the file.**
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Start the API Server
|
||||
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
This will:
|
||||
- Build the Docker image (~5-10 minutes first time)
|
||||
- Start the AI-Trader API service
|
||||
- Start internal MCP services (math, search, trade, price)
|
||||
- Initialize the SQLite database
|
||||
|
||||
**Wait for startup:**
|
||||
|
||||
```bash
|
||||
# View logs
|
||||
docker logs -f ai-trader
|
||||
|
||||
# Wait for this message:
|
||||
# "Application startup complete"
|
||||
# Press Ctrl+C to stop viewing logs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Verify Service is Running
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
**Expected response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"database": "connected",
|
||||
"timestamp": "2025-01-16T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
If you see `"status": "healthy"`, you're ready!
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Run Your First Simulation
|
||||
|
||||
Trigger a simulation for a single day with GPT-4:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"start_date": "2025-01-16",
|
||||
"models": ["gpt-4"]
|
||||
}'
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "pending",
|
||||
"total_model_days": 1,
|
||||
"message": "Simulation job created with 1 trading dates"
|
||||
}
|
||||
```
|
||||
|
||||
**Save the `job_id`** - you'll need it to check status.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Monitor Progress
|
||||
|
||||
```bash
|
||||
# Replace with your job_id from Step 5
|
||||
JOB_ID="550e8400-e29b-41d4-a716-446655440000"
|
||||
|
||||
curl http://localhost:8080/simulate/status/$JOB_ID
|
||||
```
|
||||
|
||||
**While running:**
|
||||
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-...",
|
||||
"status": "running",
|
||||
"progress": {
|
||||
"total_model_days": 1,
|
||||
"completed": 0,
|
||||
"failed": 0,
|
||||
"pending": 1
|
||||
},
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**When complete:**
|
||||
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-...",
|
||||
"status": "completed",
|
||||
"progress": {
|
||||
"total_model_days": 1,
|
||||
"completed": 1,
|
||||
"failed": 0,
|
||||
"pending": 0
|
||||
},
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Typical execution time:** 2-5 minutes for a single model-day.
|
||||
|
||||
---
|
||||
|
||||
## Step 7: View Results
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8080/results?job_id=$JOB_ID" | jq '.'
|
||||
```
|
||||
|
||||
**Example output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"id": 1,
|
||||
"job_id": "550e8400-...",
|
||||
"date": "2025-01-16",
|
||||
"model": "gpt-4",
|
||||
"action_type": "buy",
|
||||
"symbol": "AAPL",
|
||||
"amount": 10,
|
||||
"price": 250.50,
|
||||
"cash": 7495.00,
|
||||
"portfolio_value": 10000.00,
|
||||
"daily_profit": 0.00,
|
||||
"holdings": [
|
||||
{"symbol": "AAPL", "quantity": 10},
|
||||
{"symbol": "CASH", "quantity": 7495.00}
|
||||
]
|
||||
}
|
||||
],
|
||||
"count": 1
|
||||
}
|
||||
```
|
||||
|
||||
You can see:
|
||||
- What the AI decided to buy/sell
|
||||
- Portfolio value and cash balance
|
||||
- All current holdings
|
||||
|
||||
---
|
||||
|
||||
## Success! What's Next?
|
||||
|
||||
### Run Multiple Days
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"start_date": "2025-01-16",
|
||||
"end_date": "2025-01-20"
|
||||
}'
|
||||
```
|
||||
|
||||
This simulates 5 trading days (weekdays only).
|
||||
|
||||
### Run Multiple Models
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"start_date": "2025-01-16",
|
||||
"models": ["gpt-4", "claude-3.7-sonnet"]
|
||||
}'
|
||||
```
|
||||
|
||||
**Note:** Models must be defined and enabled in `configs/default_config.json`.
|
||||
|
||||
### Query Specific Results
|
||||
|
||||
```bash
|
||||
# All results for a specific date
|
||||
curl "http://localhost:8080/results?date=2025-01-16"
|
||||
|
||||
# All results for a specific model
|
||||
curl "http://localhost:8080/results?model=gpt-4"
|
||||
|
||||
# Combine filters
|
||||
curl "http://localhost:8080/results?date=2025-01-16&model=gpt-4"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service won't start
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
docker logs ai-trader
|
||||
|
||||
# Common issues:
|
||||
# - Missing API keys in .env
|
||||
# - Port 8080 already in use
|
||||
# - Docker not running
|
||||
```
|
||||
|
||||
**Fix port conflicts:**
|
||||
|
||||
Edit `.env` and change `API_PORT`:
|
||||
|
||||
```bash
|
||||
API_PORT=8889
|
||||
```
|
||||
|
||||
Then restart:
|
||||
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Health check returns error
|
||||
|
||||
```bash
|
||||
# Check if container is running
|
||||
docker ps | grep ai-trader
|
||||
|
||||
# Restart service
|
||||
docker-compose restart
|
||||
|
||||
# Check for errors in logs
|
||||
docker logs ai-trader | grep -i error
|
||||
```
|
||||
|
||||
### Job stays "pending"
|
||||
|
||||
The simulation might still be downloading price data on first run.
|
||||
|
||||
```bash
|
||||
# Watch logs in real-time
|
||||
docker logs -f ai-trader
|
||||
|
||||
# Look for messages like:
|
||||
# "Downloading missing price data..."
|
||||
# "Starting simulation for model-day..."
|
||||
```
|
||||
|
||||
First run can take 10-15 minutes while downloading historical price data.
|
||||
|
||||
### "No trading dates with complete price data"
|
||||
|
||||
This means price data is missing for the requested date range.
|
||||
|
||||
**Solution 1:** Try a different date range (recent dates work best)
|
||||
|
||||
**Solution 2:** Manually download price data:
|
||||
|
||||
```bash
|
||||
docker exec -it ai-trader bash
|
||||
cd data
|
||||
python get_daily_price.py
|
||||
python merge_jsonl.py
|
||||
exit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Commands
|
||||
|
||||
```bash
|
||||
# View logs
|
||||
docker logs -f ai-trader
|
||||
|
||||
# Stop service
|
||||
docker-compose down
|
||||
|
||||
# Start service
|
||||
docker-compose up -d
|
||||
|
||||
# Restart service
|
||||
docker-compose restart
|
||||
|
||||
# Check health
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Access container shell
|
||||
docker exec -it ai-trader bash
|
||||
|
||||
# View database
|
||||
docker exec -it ai-trader sqlite3 /app/data/jobs.db
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **Full API Reference:** [API_REFERENCE.md](API_REFERENCE.md)
|
||||
- **Configuration Guide:** [docs/user-guide/configuration.md](docs/user-guide/configuration.md)
|
||||
- **Integration Examples:** [docs/user-guide/integration-examples.md](docs/user-guide/integration-examples.md)
|
||||
- **Troubleshooting:** [docs/user-guide/troubleshooting.md](docs/user-guide/troubleshooting.md)
|
||||
|
||||
---
|
||||
|
||||
## Need Help?
|
||||
|
||||
- Check [docs/user-guide/troubleshooting.md](docs/user-guide/troubleshooting.md)
|
||||
- Review logs: `docker logs ai-trader`
|
||||
- Open an issue: [GitHub Issues](https://github.com/Xe138/AI-Trader/issues)
|
||||
12
ROADMAP.md
12
ROADMAP.md
@@ -66,6 +66,18 @@ This document outlines planned features and improvements for the AI-Trader proje
|
||||
- Chart library (Plotly.js, Chart.js, or Recharts)
|
||||
- Served alongside API (single container deployment)
|
||||
|
||||
#### Development Infrastructure
|
||||
- **Migration to uv Package Manager** - Modern Python package management
|
||||
- Replace pip with uv for dependency management
|
||||
- Create pyproject.toml with project metadata and dependencies
|
||||
- Update Dockerfile to use uv for faster, more reliable builds
|
||||
- Update development documentation and workflows
|
||||
- Benefits:
|
||||
- 10-100x faster dependency resolution and installation
|
||||
- Better dependency locking and reproducibility
|
||||
- Unified tool for virtual environments and package management
|
||||
- Drop-in pip replacement with improved UX
|
||||
|
||||
## Contributing
|
||||
|
||||
We welcome contributions to any of these planned features! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
||||
|
||||
@@ -1,631 +0,0 @@
|
||||
# AI-Trader API Service - Enhanced Specifications Summary
|
||||
|
||||
## Changes from Original Specifications
|
||||
|
||||
Based on user feedback, the specifications have been enhanced with:
|
||||
|
||||
1. **SQLite-backed results storage** (instead of reading position.jsonl on-demand)
|
||||
2. **Comprehensive Python testing suite** with pytest
|
||||
3. **Defined testing thresholds** for coverage, performance, and quality gates
|
||||
|
||||
---
|
||||
|
||||
## Document Index
|
||||
|
||||
### Core Specifications (Original)
|
||||
1. **[api-specification.md](./api-specification.md)** - REST API endpoints and data models
|
||||
2. **[job-manager-specification.md](./job-manager-specification.md)** - Job tracking and database layer
|
||||
3. **[worker-specification.md](./worker-specification.md)** - Background worker architecture
|
||||
4. **[implementation-specifications.md](./implementation-specifications.md)** - Agent, Docker, Windmill integration
|
||||
|
||||
### Enhanced Specifications (New)
|
||||
5. **[database-enhanced-specification.md](./database-enhanced-specification.md)** - SQLite results storage
|
||||
6. **[testing-specification.md](./testing-specification.md)** - Comprehensive testing suite
|
||||
|
||||
### Summary Documents
|
||||
7. **[README-SPECS.md](./README-SPECS.md)** - Original specifications overview
|
||||
8. **[ENHANCED-SPECIFICATIONS-SUMMARY.md](./ENHANCED-SPECIFICATIONS-SUMMARY.md)** - This document
|
||||
|
||||
---
|
||||
|
||||
## Key Enhancement #1: SQLite Results Storage
|
||||
|
||||
### What Changed
|
||||
|
||||
**Before:**
|
||||
- `/results` endpoint reads `position.jsonl` files on-demand
|
||||
- File I/O on every API request
|
||||
- No support for advanced queries (date ranges, aggregations)
|
||||
|
||||
**After:**
|
||||
- Simulation results written to SQLite during execution
|
||||
- Fast database queries (10-100x faster than file I/O)
|
||||
- Advanced analytics: timeseries, leaderboards, aggregations
|
||||
|
||||
### New Database Tables
|
||||
|
||||
```sql
|
||||
-- Results storage
|
||||
CREATE TABLE positions (
|
||||
id INTEGER PRIMARY KEY,
|
||||
job_id TEXT,
|
||||
date TEXT,
|
||||
model TEXT,
|
||||
action_id INTEGER,
|
||||
action_type TEXT,
|
||||
symbol TEXT,
|
||||
amount INTEGER,
|
||||
price REAL,
|
||||
cash REAL,
|
||||
portfolio_value REAL,
|
||||
daily_profit REAL,
|
||||
daily_return_pct REAL,
|
||||
cumulative_profit REAL,
|
||||
cumulative_return_pct REAL,
|
||||
created_at TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id)
|
||||
);
|
||||
|
||||
CREATE TABLE holdings (
|
||||
id INTEGER PRIMARY KEY,
|
||||
position_id INTEGER,
|
||||
symbol TEXT,
|
||||
quantity INTEGER,
|
||||
FOREIGN KEY (position_id) REFERENCES positions(id)
|
||||
);
|
||||
|
||||
CREATE TABLE reasoning_logs (
|
||||
id INTEGER PRIMARY KEY,
|
||||
job_id TEXT,
|
||||
date TEXT,
|
||||
model TEXT,
|
||||
step_number INTEGER,
|
||||
timestamp TEXT,
|
||||
role TEXT,
|
||||
content TEXT,
|
||||
tool_name TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id)
|
||||
);
|
||||
|
||||
CREATE TABLE tool_usage (
|
||||
id INTEGER PRIMARY KEY,
|
||||
job_id TEXT,
|
||||
date TEXT,
|
||||
model TEXT,
|
||||
tool_name TEXT,
|
||||
call_count INTEGER,
|
||||
total_duration_seconds REAL,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id)
|
||||
);
|
||||
```
|
||||
|
||||
### New API Endpoints
|
||||
|
||||
```python
|
||||
# Enhanced results endpoint (now reads from SQLite)
|
||||
GET /results?date=2025-01-16&model=gpt-5&detail=minimal|full
|
||||
|
||||
# New analytics endpoints
|
||||
GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31
|
||||
GET /leaderboard?date=2025-01-16 # Rankings by portfolio value
|
||||
```
|
||||
|
||||
### Migration Strategy
|
||||
|
||||
**Phase 1:** Dual-write mode
|
||||
- Agent writes to `position.jsonl` (existing code)
|
||||
- Executor writes to SQLite after agent completes
|
||||
- Ensures backward compatibility
|
||||
|
||||
**Phase 2:** Verification
|
||||
- Compare SQLite data vs JSONL data
|
||||
- Fix any discrepancies
|
||||
|
||||
**Phase 3:** Switch over
|
||||
- `/results` endpoint reads from SQLite
|
||||
- JSONL writes become optional (can deprecate later)
|
||||
|
||||
### Performance Improvement
|
||||
|
||||
| Operation | Before (JSONL) | After (SQLite) | Speedup |
|
||||
|-----------|----------------|----------------|---------|
|
||||
| Get results for 1 date | 200-500ms | 20-50ms | **10x faster** |
|
||||
| Get timeseries (30 days) | 6-15 seconds | 100-300ms | **50x faster** |
|
||||
| Get leaderboard | 5-10 seconds | 50-100ms | **100x faster** |
|
||||
|
||||
---
|
||||
|
||||
## Key Enhancement #2: Comprehensive Testing Suite
|
||||
|
||||
### Testing Thresholds
|
||||
|
||||
| Metric | Minimum | Target | Enforcement |
|
||||
|--------|---------|--------|-------------|
|
||||
| **Code Coverage** | 85% | 90% | CI fails if below |
|
||||
| **Critical Path Coverage** | 90% | 95% | Manual review |
|
||||
| **Unit Test Speed** | <10s | <5s | Benchmark tracking |
|
||||
| **Integration Test Speed** | <60s | <30s | Benchmark tracking |
|
||||
| **API Response Times** | <500ms | <200ms | Load testing |
|
||||
|
||||
### Test Suite Structure
|
||||
|
||||
```
|
||||
tests/
|
||||
├── unit/ # 80 tests, <10 seconds
|
||||
│ ├── test_job_manager.py # 95% coverage target
|
||||
│ ├── test_database.py
|
||||
│ ├── test_runtime_manager.py
|
||||
│ ├── test_results_service.py # 95% coverage target
|
||||
│ └── test_models.py
|
||||
│
|
||||
├── integration/ # 30 tests, <60 seconds
|
||||
│ ├── test_api_endpoints.py # Full FastAPI testing
|
||||
│ ├── test_worker.py
|
||||
│ ├── test_executor.py
|
||||
│ └── test_end_to_end.py
|
||||
│
|
||||
├── performance/ # 20 tests
|
||||
│ ├── test_database_benchmarks.py
|
||||
│ ├── test_api_load.py # Locust load testing
|
||||
│ └── test_simulation_timing.py
|
||||
│
|
||||
├── security/ # 10 tests
|
||||
│ ├── test_api_security.py # SQL injection, XSS, path traversal
|
||||
│ └── test_auth.py # Future: API key validation
|
||||
│
|
||||
└── e2e/ # 10 tests, Docker required
|
||||
└── test_docker_workflow.py # Full Docker compose scenario
|
||||
```
|
||||
|
||||
### Quality Gates
|
||||
|
||||
**All PRs must pass:**
|
||||
1. ✅ All tests passing (unit + integration)
|
||||
2. ✅ Code coverage ≥ 85%
|
||||
3. ✅ No critical security vulnerabilities (Bandit scan)
|
||||
4. ✅ Linting passes (Ruff or Flake8)
|
||||
5. ✅ Type checking passes (mypy strict mode)
|
||||
6. ✅ No performance regressions (±10% tolerance)
|
||||
|
||||
**Release checklist:**
|
||||
1. ✅ All quality gates pass
|
||||
2. ✅ End-to-end tests pass in Docker
|
||||
3. ✅ Load testing passes (100 concurrent requests)
|
||||
4. ✅ Security scan passes (OWASP ZAP)
|
||||
5. ✅ Manual smoke tests complete
|
||||
|
||||
### CI/CD Integration
|
||||
|
||||
```yaml
|
||||
# .github/workflows/test.yml
|
||||
name: Test Suite
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Run unit tests
|
||||
run: pytest tests/unit/ --cov=api --cov-fail-under=85
|
||||
- name: Run integration tests
|
||||
run: pytest tests/integration/
|
||||
- name: Security scan
|
||||
run: bandit -r api/ -ll
|
||||
- name: Upload coverage
|
||||
uses: codecov/codecov-action@v3
|
||||
```
|
||||
|
||||
### Test Coverage Breakdown
|
||||
|
||||
| Component | Minimum | Target | Tests |
|
||||
|-----------|---------|--------|-------|
|
||||
| `api/job_manager.py` | 90% | 95% | 25 tests |
|
||||
| `api/worker.py` | 85% | 90% | 15 tests |
|
||||
| `api/executor.py` | 85% | 90% | 12 tests |
|
||||
| `api/results_service.py` | 90% | 95% | 18 tests |
|
||||
| `api/database.py` | 95% | 100% | 10 tests |
|
||||
| `api/runtime_manager.py` | 85% | 90% | 8 tests |
|
||||
| `api/main.py` | 80% | 85% | 20 tests |
|
||||
| **Total** | **85%** | **90%** | **~150 tests** |
|
||||
|
||||
---
|
||||
|
||||
## Updated Implementation Plan
|
||||
|
||||
### Phase 1: API Foundation (Days 1-2)
|
||||
- [x] Create `api/` directory structure
|
||||
- [ ] Implement `api/models.py` with Pydantic models
|
||||
- [ ] Implement `api/database.py` with **enhanced schema** (6 tables)
|
||||
- [ ] Implement `api/job_manager.py` with job CRUD operations
|
||||
- [ ] **NEW:** Write unit tests for job_manager (target: 95% coverage)
|
||||
- [ ] Test database operations manually
|
||||
|
||||
**Testing Deliverables:**
|
||||
- 25 unit tests for job_manager
|
||||
- 10 unit tests for database utilities
|
||||
- 85%+ coverage for Phase 1 code
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Worker & Executor (Days 3-4)
|
||||
- [ ] Implement `api/runtime_manager.py`
|
||||
- [ ] Implement `api/executor.py` for single model-day execution
|
||||
- [ ] **NEW:** Add SQLite write logic to executor (`_store_results_to_db()`)
|
||||
- [ ] Implement `api/worker.py` for job orchestration
|
||||
- [ ] **NEW:** Write unit tests for worker and executor (target: 85% coverage)
|
||||
- [ ] Test runtime config isolation
|
||||
|
||||
**Testing Deliverables:**
|
||||
- 15 unit tests for worker
|
||||
- 12 unit tests for executor
|
||||
- 8 unit tests for runtime_manager
|
||||
- 85%+ coverage for Phase 2 code
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Results Service & FastAPI Endpoints (Days 5-6)
|
||||
- [ ] **NEW:** Implement `api/results_service.py` (SQLite-backed)
|
||||
- [ ] `get_results(date, model, detail)`
|
||||
- [ ] `get_portfolio_timeseries(model, start_date, end_date)`
|
||||
- [ ] `get_leaderboard(date)`
|
||||
- [ ] Implement `api/main.py` with all endpoints
|
||||
- [ ] `/simulate/trigger` with background tasks
|
||||
- [ ] `/simulate/status/{job_id}`
|
||||
- [ ] `/simulate/current`
|
||||
- [ ] `/results` (now reads from SQLite)
|
||||
- [ ] **NEW:** `/portfolio/timeseries`
|
||||
- [ ] **NEW:** `/leaderboard`
|
||||
- [ ] `/health` with MCP checks
|
||||
- [ ] **NEW:** Write unit tests for results_service (target: 95% coverage)
|
||||
- [ ] **NEW:** Write integration tests for API endpoints (target: 80% coverage)
|
||||
- [ ] Test all endpoints with Postman/curl
|
||||
|
||||
**Testing Deliverables:**
|
||||
- 18 unit tests for results_service
|
||||
- 20 integration tests for API endpoints
|
||||
- Performance benchmarks for database queries
|
||||
- 85%+ coverage for Phase 3 code
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Docker Integration (Day 7)
|
||||
- [ ] Update `Dockerfile`
|
||||
- [ ] Create `docker-entrypoint-api.sh`
|
||||
- [ ] Create `requirements-api.txt`
|
||||
- [ ] Update `docker-compose.yml`
|
||||
- [ ] Test Docker build
|
||||
- [ ] Test container startup and health checks
|
||||
- [ ] **NEW:** Run E2E tests in Docker environment
|
||||
- [ ] Test end-to-end simulation via API in Docker
|
||||
|
||||
**Testing Deliverables:**
|
||||
- 10 E2E tests with Docker
|
||||
- Docker health check validation
|
||||
- Performance testing in containerized environment
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Windmill Integration (Days 8-9)
|
||||
- [ ] Create Windmill scripts (trigger, poll, store)
|
||||
- [ ] **UPDATED:** Modify `store_simulation_results.py` to use new `/results` endpoint
|
||||
- [ ] Test scripts locally against Docker API
|
||||
- [ ] Deploy scripts to Windmill instance
|
||||
- [ ] Create Windmill workflow
|
||||
- [ ] Test workflow end-to-end
|
||||
- [ ] Create Windmill dashboard (using new `/portfolio/timeseries` and `/leaderboard` endpoints)
|
||||
- [ ] Document Windmill setup process
|
||||
|
||||
**Testing Deliverables:**
|
||||
- Integration tests for Windmill scripts
|
||||
- End-to-end workflow validation
|
||||
- Dashboard functionality verification
|
||||
|
||||
---
|
||||
|
||||
### Phase 6: Testing, Security & Documentation (Day 10)
|
||||
- [ ] **NEW:** Run full test suite and verify all thresholds met
|
||||
- [ ] Code coverage ≥ 85%
|
||||
- [ ] All ~150 tests passing
|
||||
- [ ] Performance benchmarks within limits
|
||||
- [ ] **NEW:** Security testing
|
||||
- [ ] Bandit scan (Python security issues)
|
||||
- [ ] SQL injection tests
|
||||
- [ ] Input validation tests
|
||||
- [ ] OWASP ZAP scan (optional)
|
||||
- [ ] **NEW:** Load testing with Locust
|
||||
- [ ] 100 concurrent users
|
||||
- [ ] API endpoints within performance thresholds
|
||||
- [ ] Integration tests for complete workflow
|
||||
- [ ] Update README.md with API usage
|
||||
- [ ] Create API documentation (Swagger/OpenAPI - auto-generated by FastAPI)
|
||||
- [ ] Create deployment guide
|
||||
- [ ] Create troubleshooting guide
|
||||
- [ ] **NEW:** Generate test coverage report
|
||||
|
||||
**Testing Deliverables:**
|
||||
- Full test suite execution report
|
||||
- Security scan results
|
||||
- Load testing results
|
||||
- Coverage report (HTML + XML)
|
||||
- CI/CD pipeline configuration
|
||||
|
||||
---
|
||||
|
||||
## New Files Created
|
||||
|
||||
### Database & Results
|
||||
- `api/results_service.py` - SQLite-backed results retrieval
|
||||
- `api/import_historical_data.py` - Migration script for existing position.jsonl files
|
||||
|
||||
### Testing Suite
|
||||
- `tests/conftest.py` - Shared pytest fixtures
|
||||
- `tests/unit/test_job_manager.py` - 25 tests
|
||||
- `tests/unit/test_database.py` - 10 tests
|
||||
- `tests/unit/test_runtime_manager.py` - 8 tests
|
||||
- `tests/unit/test_results_service.py` - 18 tests
|
||||
- `tests/unit/test_models.py` - 5 tests
|
||||
- `tests/integration/test_api_endpoints.py` - 20 tests
|
||||
- `tests/integration/test_worker.py` - 15 tests
|
||||
- `tests/integration/test_executor.py` - 12 tests
|
||||
- `tests/integration/test_end_to_end.py` - 5 tests
|
||||
- `tests/performance/test_database_benchmarks.py` - 10 tests
|
||||
- `tests/performance/test_api_load.py` - Locust load testing
|
||||
- `tests/security/test_api_security.py` - 10 tests
|
||||
- `tests/e2e/test_docker_workflow.py` - 10 tests
|
||||
- `pytest.ini` - Test configuration
|
||||
- `requirements-dev.txt` - Testing dependencies
|
||||
|
||||
### CI/CD
|
||||
- `.github/workflows/test.yml` - GitHub Actions workflow
|
||||
|
||||
---
|
||||
|
||||
## Updated File Structure
|
||||
|
||||
```
|
||||
AI-Trader/
|
||||
├── api/
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # FastAPI application
|
||||
│ ├── models.py # Pydantic request/response models
|
||||
│ ├── job_manager.py # Job lifecycle management
|
||||
│ ├── database.py # SQLite utilities (enhanced schema)
|
||||
│ ├── worker.py # Background simulation worker
|
||||
│ ├── executor.py # Single model-day execution (+ SQLite writes)
|
||||
│ ├── runtime_manager.py # Runtime config isolation
|
||||
│ ├── results_service.py # NEW: SQLite-backed results retrieval
|
||||
│ └── import_historical_data.py # NEW: JSONL → SQLite migration
|
||||
│
|
||||
├── tests/ # NEW: Comprehensive test suite
|
||||
│ ├── conftest.py
|
||||
│ ├── unit/ # 80 tests, <10s
|
||||
│ ├── integration/ # 30 tests, <60s
|
||||
│ ├── performance/ # 20 tests
|
||||
│ ├── security/ # 10 tests
|
||||
│ └── e2e/ # 10 tests
|
||||
│
|
||||
├── docs/
|
||||
│ ├── api-specification.md
|
||||
│ ├── job-manager-specification.md
|
||||
│ ├── worker-specification.md
|
||||
│ ├── implementation-specifications.md
|
||||
│ ├── database-enhanced-specification.md # NEW
|
||||
│ ├── testing-specification.md # NEW
|
||||
│ ├── README-SPECS.md
|
||||
│ └── ENHANCED-SPECIFICATIONS-SUMMARY.md # NEW (this file)
|
||||
│
|
||||
├── data/
|
||||
│ ├── jobs.db # SQLite database (6 tables)
|
||||
│ ├── runtime_env*.json # Runtime configs (temporary)
|
||||
│ ├── agent_data/ # Existing position/log data
|
||||
│ └── merged.jsonl # Existing price data
|
||||
│
|
||||
├── pytest.ini # NEW: Test configuration
|
||||
├── requirements-dev.txt # NEW: Testing dependencies
|
||||
├── .github/workflows/test.yml # NEW: CI/CD pipeline
|
||||
└── ... (existing files)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Benefits Summary
|
||||
|
||||
### Performance
|
||||
- **10-100x faster** results queries (SQLite vs file I/O)
|
||||
- **Advanced analytics** - timeseries, leaderboards, aggregations in milliseconds
|
||||
- **Optimized indexes** for common queries
|
||||
|
||||
### Quality
|
||||
- **85% minimum coverage** enforced by CI/CD
|
||||
- **150 comprehensive tests** across unit, integration, performance, security
|
||||
- **Quality gates** prevent regressions
|
||||
- **Type safety** with mypy strict mode
|
||||
|
||||
### Maintainability
|
||||
- **SQLite single source of truth** - easier backup, restore, migration
|
||||
- **Automated testing** catches bugs early
|
||||
- **CI/CD integration** provides fast feedback on every commit
|
||||
- **Security scanning** prevents vulnerabilities
|
||||
|
||||
### Analytics Capabilities
|
||||
|
||||
**New queries enabled by SQLite:**
|
||||
|
||||
```python
|
||||
# Portfolio timeseries for charting
|
||||
GET /portfolio/timeseries?model=gpt-5&start_date=2025-01-01&end_date=2025-01-31
|
||||
|
||||
# Model leaderboard
|
||||
GET /leaderboard?date=2025-01-31
|
||||
|
||||
# Advanced filtering (future)
|
||||
SELECT * FROM positions
|
||||
WHERE daily_return_pct > 2.0
|
||||
ORDER BY portfolio_value DESC;
|
||||
|
||||
# Aggregations (future)
|
||||
SELECT model, AVG(daily_return_pct) as avg_return
|
||||
FROM positions
|
||||
GROUP BY model
|
||||
ORDER BY avg_return DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration from Original Spec
|
||||
|
||||
If you've already started implementation based on original specs:
|
||||
|
||||
### Step 1: Database Schema Migration
|
||||
```sql
|
||||
-- Run enhanced schema creation
|
||||
-- See database-enhanced-specification.md Section 2.1
|
||||
```
|
||||
|
||||
### Step 2: Add Results Service
|
||||
```bash
|
||||
# Create new file
|
||||
touch api/results_service.py
|
||||
# Implement as per database-enhanced-specification.md Section 4.1
|
||||
```
|
||||
|
||||
### Step 3: Update Executor
|
||||
```python
|
||||
# In api/executor.py, add after agent.run_trading_session():
|
||||
self._store_results_to_db(job_id, date, model_sig)
|
||||
```
|
||||
|
||||
### Step 4: Update API Endpoints
|
||||
```python
|
||||
# In api/main.py, update /results endpoint to use ResultsService
|
||||
from api.results_service import ResultsService
|
||||
results_service = ResultsService()
|
||||
|
||||
@app.get("/results")
|
||||
async def get_results(...):
|
||||
return results_service.get_results(date, model, detail)
|
||||
```
|
||||
|
||||
### Step 5: Add Test Suite
|
||||
```bash
|
||||
mkdir -p tests/{unit,integration,performance,security,e2e}
|
||||
# Create test files as per testing-specification.md Section 4-8
|
||||
```
|
||||
|
||||
### Step 6: Configure CI/CD
|
||||
```bash
|
||||
mkdir -p .github/workflows
|
||||
# Create test.yml as per testing-specification.md Section 10.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Execution Guide
|
||||
|
||||
### Run Unit Tests
|
||||
```bash
|
||||
pytest tests/unit/ -v --cov=api --cov-report=term-missing
|
||||
```
|
||||
|
||||
### Run Integration Tests
|
||||
```bash
|
||||
pytest tests/integration/ -v
|
||||
```
|
||||
|
||||
### Run All Tests (Except E2E)
|
||||
```bash
|
||||
pytest tests/ -v --ignore=tests/e2e/ --cov=api --cov-report=html
|
||||
```
|
||||
|
||||
### Run E2E Tests (Requires Docker)
|
||||
```bash
|
||||
pytest tests/e2e/ -v -s
|
||||
```
|
||||
|
||||
### Run Performance Benchmarks
|
||||
```bash
|
||||
pytest tests/performance/ --benchmark-only
|
||||
```
|
||||
|
||||
### Run Security Tests
|
||||
```bash
|
||||
pytest tests/security/ -v
|
||||
bandit -r api/ -ll
|
||||
```
|
||||
|
||||
### Generate Coverage Report
|
||||
```bash
|
||||
pytest tests/unit/ tests/integration/ --cov=api --cov-report=html
|
||||
open htmlcov/index.html # View in browser
|
||||
```
|
||||
|
||||
### Run Load Tests
|
||||
```bash
|
||||
locust -f tests/performance/test_api_load.py --host=http://localhost:8080
|
||||
# Open http://localhost:8089 for Locust UI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Questions & Next Steps
|
||||
|
||||
### Review Checklist
|
||||
|
||||
Please review:
|
||||
1. ✅ **Enhanced database schema** with 6 tables for comprehensive results storage
|
||||
2. ✅ **Migration strategy** for backward compatibility (dual-write mode)
|
||||
3. ✅ **Testing thresholds** (85% coverage minimum, performance benchmarks)
|
||||
4. ✅ **Test suite structure** (150 tests across 5 categories)
|
||||
5. ✅ **CI/CD integration** with quality gates
|
||||
6. ✅ **Updated implementation plan** (10 days, 6 phases)
|
||||
|
||||
### Questions to Consider
|
||||
|
||||
1. **Database migration timing:** Start with dual-write mode immediately, or add in Phase 2?
|
||||
2. **Testing priorities:** Should we implement tests alongside features (TDD) or after each phase?
|
||||
3. **CI/CD platform:** GitHub Actions (as specified) or different platform?
|
||||
4. **Performance baselines:** Should we run benchmarks before implementation to track improvement?
|
||||
5. **Security priorities:** Which security tests are MVP vs nice-to-have?
|
||||
|
||||
### Ready to Implement?
|
||||
|
||||
**Option A:** Approve specifications and begin Phase 1 implementation
|
||||
- Create API directory structure
|
||||
- Implement enhanced database schema
|
||||
- Write unit tests for database layer
|
||||
- Target: 2 days, 90%+ coverage for database code
|
||||
|
||||
**Option B:** Request modifications to specifications
|
||||
- Clarify any unclear requirements
|
||||
- Adjust testing thresholds
|
||||
- Modify implementation timeline
|
||||
|
||||
**Option C:** Implement in parallel workstreams
|
||||
- Workstream 1: Core API (Phases 1-3)
|
||||
- Workstream 2: Testing suite (parallel with Phase 1-3)
|
||||
- Workstream 3: Docker + Windmill (Phases 4-5)
|
||||
- Benefits: Faster delivery, more parallelization
|
||||
- Requires: Clear interfaces between components
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Enhanced specifications** add:
|
||||
1. 🗄️ **SQLite results storage** - 10-100x faster queries, advanced analytics
|
||||
2. 🧪 **Comprehensive testing** - 150 tests, 85% coverage, quality gates
|
||||
3. 🔒 **Security testing** - SQL injection, XSS, input validation
|
||||
4. ⚡ **Performance benchmarks** - Catch regressions early
|
||||
5. 🚀 **CI/CD pipeline** - Automated quality checks on every commit
|
||||
|
||||
**Total effort:** Still ~10 days, but with significantly higher code quality and confidence in deployments.
|
||||
|
||||
**Risk mitigation:** Extensive testing catches bugs before production, preventing costly hotfixes.
|
||||
|
||||
**Long-term value:** Maintainable, well-tested codebase enables rapid feature development.
|
||||
|
||||
---
|
||||
|
||||
Ready to proceed? Please provide feedback or approval to begin implementation!
|
||||
@@ -1,436 +0,0 @@
|
||||
# AI-Trader API Service - Technical Specifications Summary
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains comprehensive technical specifications for transforming the AI-Trader batch simulation system into an API service compatible with Windmill automation.
|
||||
|
||||
## Specification Documents
|
||||
|
||||
### 1. [API Specification](./api-specification.md)
|
||||
**Purpose:** Defines all API endpoints, request/response formats, and data models
|
||||
|
||||
**Key Contents:**
|
||||
- **5 REST Endpoints:**
|
||||
- `POST /simulate/trigger` - Queue catch-up simulation job
|
||||
- `GET /simulate/status/{job_id}` - Poll job progress
|
||||
- `GET /simulate/current` - Get latest job
|
||||
- `GET /results` - Retrieve simulation results (minimal/full detail)
|
||||
- `GET /health` - Service health check
|
||||
- **Pydantic Models** for type-safe request/response handling
|
||||
- **Error Handling** strategies and HTTP status codes
|
||||
- **SQLite Schema** for jobs and job_details tables
|
||||
- **Configuration Management** via environment variables
|
||||
|
||||
**Status Codes:** 200 OK, 202 Accepted, 400 Bad Request, 404 Not Found, 409 Conflict, 503 Service Unavailable
|
||||
|
||||
---
|
||||
|
||||
### 2. [Job Manager Specification](./job-manager-specification.md)
|
||||
**Purpose:** Details the job tracking and database layer
|
||||
|
||||
**Key Contents:**
|
||||
- **SQLite Database Schema:**
|
||||
- `jobs` table - High-level job metadata
|
||||
- `job_details` table - Per model-day execution tracking
|
||||
- **JobManager Class Interface:**
|
||||
- `create_job()` - Create new simulation job
|
||||
- `get_job()` - Retrieve job by ID
|
||||
- `update_job_status()` - State transitions (pending → running → completed/partial/failed)
|
||||
- `get_job_progress()` - Detailed progress metrics
|
||||
- `can_start_new_job()` - Concurrency control
|
||||
- **State Machine:** Job status transitions and business logic
|
||||
- **Concurrency Control:** Single-job execution enforcement
|
||||
- **Testing Strategy:** Unit tests with temporary databases
|
||||
|
||||
**Key Feature:** Independent model execution - one model's failure doesn't block others (results in "partial" status)
|
||||
|
||||
---
|
||||
|
||||
### 3. [Background Worker Specification](./worker-specification.md)
|
||||
**Purpose:** Defines async job execution architecture
|
||||
|
||||
**Key Contents:**
|
||||
- **Execution Pattern:** Date-sequential, Model-parallel
|
||||
- All models for Date 1 run in parallel
|
||||
- Date 2 starts only after all models finish Date 1
|
||||
- Ensures position.jsonl integrity (no concurrent writes)
|
||||
- **SimulationWorker Class:**
|
||||
- Orchestrates job execution
|
||||
- Manages date sequencing
|
||||
- Handles job-level errors
|
||||
- **ModelDayExecutor Class:**
|
||||
- Executes single model-day simulation
|
||||
- Updates job_detail status
|
||||
- Isolates runtime configuration
|
||||
- **RuntimeConfigManager:**
|
||||
- Creates temporary runtime_env_{job_id}_{model}_{date}.json files
|
||||
- Prevents state collisions between concurrent models
|
||||
- Cleans up after execution
|
||||
- **Error Handling:** Graceful failure (models continue despite peer failures)
|
||||
- **Logging:** Structured JSON logging with job/model/date context
|
||||
|
||||
**Performance:** 3 models × 5 days = ~7-15 minutes (vs. ~22-45 minutes sequential)
|
||||
|
||||
---
|
||||
|
||||
### 4. [Implementation Specification](./implementation-specifications.md)
|
||||
**Purpose:** Complete implementation guide covering Agent, Docker, and Windmill
|
||||
|
||||
**Key Contents:**
|
||||
|
||||
#### Part 1: BaseAgent Refactoring
|
||||
- **Analysis:** Existing `run_trading_session()` already compatible with API mode
|
||||
- **Required Changes:** ✅ NONE! Existing code works as-is
|
||||
- **Worker Integration:** Calls `agent.run_trading_session(date)` directly
|
||||
|
||||
#### Part 2: Docker Configuration
|
||||
- **Modified Dockerfile:** Adds FastAPI dependencies, new entrypoint
|
||||
- **docker-entrypoint-api.sh:** Starts MCP services → launches uvicorn
|
||||
- **Health Checks:** Verifies MCP services and database connectivity
|
||||
- **Volume Mounts:** `./data`, `./configs` for persistence
|
||||
|
||||
#### Part 3: Windmill Integration
|
||||
- **Flow 1: trigger_simulation.ts** - Daily cron triggers API
|
||||
- **Flow 2: poll_simulation_status.ts** - Polls every 5 min until complete
|
||||
- **Flow 3: store_simulation_results.py** - Stores results in Windmill DB
|
||||
- **Dashboard:** Charts and tables showing portfolio performance
|
||||
- **Workflow Orchestration:** Complete YAML workflow definition
|
||||
|
||||
#### Part 4: File Structure
|
||||
- New `api/` directory with 7 modules
|
||||
- New `windmill/` directory with scripts and dashboard
|
||||
- New `docs/` directory (this folder)
|
||||
- `data/jobs.db` for job tracking
|
||||
|
||||
#### Part 5: Implementation Checklist
|
||||
10-day implementation plan broken into 6 phases
|
||||
|
||||
---
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
### Request Flow
|
||||
|
||||
```
|
||||
1. Windmill → POST /simulate/trigger
|
||||
2. API creates job in SQLite (status: pending)
|
||||
3. API queues BackgroundTask
|
||||
4. API returns 202 Accepted with job_id
|
||||
↓
|
||||
5. Worker starts (status: running)
|
||||
6. For each date sequentially:
|
||||
For each model in parallel:
|
||||
- Create isolated runtime config
|
||||
- Execute agent.run_trading_session(date)
|
||||
- Update job_detail status
|
||||
7. Worker finishes (status: completed/partial/failed)
|
||||
↓
|
||||
8. Windmill polls GET /simulate/status/{job_id}
|
||||
9. When complete: Windmill calls GET /results?date=X
|
||||
10. Windmill stores results in internal DB
|
||||
11. Windmill dashboard displays performance
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Input: configs/default_config.json
|
||||
↓
|
||||
API: Calculates date_range (last position → today)
|
||||
↓
|
||||
Worker: Executes simulations
|
||||
↓
|
||||
Output: data/agent_data/{model}/position/position.jsonl
|
||||
data/agent_data/{model}/log/{date}/log.jsonl
|
||||
data/jobs.db (job tracking)
|
||||
↓
|
||||
API: Reads position.jsonl + calculates P&L
|
||||
↓
|
||||
Windmill: Stores in internal DB → Dashboard visualization
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### 1. Pattern B: Lazy On-Demand Processing
|
||||
- **Chosen:** Windmill controls simulation timing via API calls
|
||||
- **Benefit:** Centralized scheduling in Windmill
|
||||
- **Tradeoff:** First Windmill call of the day triggers long-running job
|
||||
|
||||
### 2. SQLite vs. PostgreSQL
|
||||
- **Chosen:** SQLite for MVP
|
||||
- **Rationale:** Low concurrency (1 job at a time), simple deployment
|
||||
- **Future:** PostgreSQL for production with multiple concurrent jobs
|
||||
|
||||
### 3. Date-Sequential, Model-Parallel Execution
|
||||
- **Chosen:** Dates run sequentially, models run in parallel per date
|
||||
- **Rationale:** Prevents position.jsonl race conditions, faster than fully sequential
|
||||
- **Performance:** ~50% faster than sequential (3 models in parallel)
|
||||
|
||||
### 4. Independent Model Failures
|
||||
- **Chosen:** One model's failure doesn't block others
|
||||
- **Benefit:** Partial results better than no results
|
||||
- **Implementation:** Job status becomes "partial" if any model fails
|
||||
|
||||
### 5. Minimal BaseAgent Changes
|
||||
- **Chosen:** No modifications to agent code
|
||||
- **Rationale:** Existing `run_trading_session()` is perfect API interface
|
||||
- **Benefit:** Maintains backward compatibility with batch mode
|
||||
|
||||
---
|
||||
|
||||
## Implementation Prerequisites
|
||||
|
||||
### Required Environment Variables
|
||||
```bash
|
||||
OPENAI_API_BASE=...
|
||||
OPENAI_API_KEY=...
|
||||
ALPHAADVANTAGE_API_KEY=...
|
||||
JINA_API_KEY=...
|
||||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
MATH_HTTP_PORT=8000
|
||||
SEARCH_HTTP_PORT=8001
|
||||
TRADE_HTTP_PORT=8002
|
||||
GETPRICE_HTTP_PORT=8003
|
||||
API_HOST=0.0.0.0
|
||||
API_PORT=8080
|
||||
```
|
||||
|
||||
### Required Python Packages (new)
|
||||
```
|
||||
fastapi==0.109.0
|
||||
uvicorn[standard]==0.27.0
|
||||
pydantic==2.5.3
|
||||
```
|
||||
|
||||
### Docker Requirements
|
||||
- Docker Engine 20.10+
|
||||
- Docker Compose 2.0+
|
||||
- 2GB RAM minimum for container
|
||||
- 10GB disk space for data
|
||||
|
||||
### Windmill Requirements
|
||||
- Windmill instance (self-hosted or cloud)
|
||||
- Network access from Windmill to AI-Trader API
|
||||
- Windmill CLI for deployment (optional)
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
- `tests/test_job_manager.py` - Database operations
|
||||
- `tests/test_worker.py` - Job execution logic
|
||||
- `tests/test_executor.py` - Model-day execution
|
||||
|
||||
### Integration Tests
|
||||
- `tests/test_api_endpoints.py` - FastAPI endpoint behavior
|
||||
- `tests/test_end_to_end.py` - Full workflow (trigger → execute → retrieve)
|
||||
|
||||
### Manual Testing
|
||||
- Docker container startup
|
||||
- Health check endpoint
|
||||
- Windmill workflow execution
|
||||
- Dashboard visualization
|
||||
|
||||
---
|
||||
|
||||
## Performance Expectations
|
||||
|
||||
### Single Model-Day Execution
|
||||
- **Duration:** 30-60 seconds (varies by AI model latency)
|
||||
- **Bottlenecks:** AI API calls, MCP tool latency
|
||||
|
||||
### Multi-Model Job
|
||||
- **Example:** 3 models × 5 days = 15 model-days
|
||||
- **Parallel Execution:** ~7-15 minutes
|
||||
- **Sequential Execution:** ~22-45 minutes
|
||||
- **Speedup:** ~3x (number of models)
|
||||
|
||||
### API Response Times
|
||||
- `/simulate/trigger`: < 1 second (just queues job)
|
||||
- `/simulate/status`: < 100ms (SQLite query)
|
||||
- `/results?detail=minimal`: < 500ms (file read + JSON parsing)
|
||||
- `/results?detail=full`: < 2 seconds (parse log files)
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### MVP Security
|
||||
- **Network Isolation:** Docker network (no public exposure)
|
||||
- **No Authentication:** Assumes Windmill → API is trusted network
|
||||
|
||||
### Future Enhancements
|
||||
- API key authentication (`X-API-Key` header)
|
||||
- Rate limiting per client
|
||||
- HTTPS/TLS encryption
|
||||
- Input sanitization for path traversal prevention
|
||||
|
||||
---
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### 1. Build Docker Image
|
||||
```bash
|
||||
docker-compose build
|
||||
```
|
||||
|
||||
### 2. Start API Service
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### 3. Verify Health
|
||||
```bash
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
### 4. Test Trigger
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"config_path": "configs/default_config.json"}'
|
||||
```
|
||||
|
||||
### 5. Deploy Windmill Scripts
|
||||
```bash
|
||||
wmill script push windmill/trigger_simulation.ts
|
||||
wmill script push windmill/poll_simulation_status.ts
|
||||
wmill script push windmill/store_simulation_results.py
|
||||
```
|
||||
|
||||
### 6. Create Windmill Workflow
|
||||
- Import `windmill/daily_simulation_workflow.yaml`
|
||||
- Configure resource `ai_trader_api` with API URL
|
||||
- Set cron schedule (daily 6 AM)
|
||||
|
||||
### 7. Create Windmill Dashboard
|
||||
- Import `windmill/dashboard.json`
|
||||
- Verify data visualization
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Guide
|
||||
|
||||
### Issue: Health check fails
|
||||
**Symptoms:** `curl http://localhost:8080/health` returns 503
|
||||
|
||||
**Possible Causes:**
|
||||
1. MCP services not running
|
||||
2. Database file permission error
|
||||
3. API server not started
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# Check MCP services
|
||||
docker-compose exec ai-trader curl http://localhost:8000/health
|
||||
|
||||
# Check API logs
|
||||
docker-compose logs -f ai-trader
|
||||
|
||||
# Restart container
|
||||
docker-compose restart
|
||||
```
|
||||
|
||||
### Issue: Job stuck in "running" status
|
||||
**Symptoms:** Job never completes, status remains "running"
|
||||
|
||||
**Possible Causes:**
|
||||
1. Agent execution crashed
|
||||
2. Model API timeout
|
||||
3. Worker process died
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# Check job details for error messages
|
||||
curl http://localhost:8080/simulate/status/{job_id}
|
||||
|
||||
# Check container logs
|
||||
docker-compose logs -f ai-trader
|
||||
|
||||
# If API restarted, stale jobs are marked as failed on startup
|
||||
docker-compose restart
|
||||
```
|
||||
|
||||
### Issue: Windmill can't reach API
|
||||
**Symptoms:** Connection refused from Windmill scripts
|
||||
|
||||
**Solutions:**
|
||||
- Verify Windmill and AI-Trader on same Docker network
|
||||
- Check firewall rules
|
||||
- Use container name (ai-trader) instead of localhost in Windmill resource
|
||||
- Verify API_PORT environment variable
|
||||
|
||||
---
|
||||
|
||||
## Migration from Batch Mode
|
||||
|
||||
### For Users Currently Running Batch Mode
|
||||
|
||||
**Option 1: Dual Mode (Recommended)**
|
||||
- Keep existing `main.py` for manual testing
|
||||
- Add new API mode for production automation
|
||||
- Use different config files for each mode
|
||||
|
||||
**Option 2: API-Only**
|
||||
- Replace batch execution entirely
|
||||
- All simulations via API calls
|
||||
- More consistent with production workflow
|
||||
|
||||
### Migration Checklist
|
||||
- [ ] Backup existing `data/` directory
|
||||
- [ ] Update `.env` with API configuration
|
||||
- [ ] Test API mode in separate environment first
|
||||
- [ ] Gradually migrate Windmill workflows
|
||||
- [ ] Monitor logs for errors
|
||||
- [ ] Validate results match batch mode output
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Review Specifications**
|
||||
- Read all 4 specification documents
|
||||
- Ask clarifying questions
|
||||
- Approve design before implementation
|
||||
|
||||
2. **Implementation Phase 1** (Days 1-2)
|
||||
- Set up `api/` directory structure
|
||||
- Implement database and job_manager
|
||||
- Write unit tests
|
||||
|
||||
3. **Implementation Phase 2** (Days 3-4)
|
||||
- Implement worker and executor
|
||||
- Test with mock agents
|
||||
|
||||
4. **Implementation Phase 3** (Days 5-6)
|
||||
- Implement FastAPI endpoints
|
||||
- Test with Postman/curl
|
||||
|
||||
5. **Implementation Phase 4** (Day 7)
|
||||
- Docker integration
|
||||
- End-to-end testing
|
||||
|
||||
6. **Implementation Phase 5** (Days 8-9)
|
||||
- Windmill integration
|
||||
- Dashboard creation
|
||||
|
||||
7. **Implementation Phase 6** (Day 10)
|
||||
- Final testing
|
||||
- Documentation
|
||||
|
||||
---
|
||||
|
||||
## Questions or Feedback?
|
||||
|
||||
Please review all specifications and provide feedback on:
|
||||
1. API endpoint design
|
||||
2. Database schema
|
||||
3. Execution pattern (date-sequential, model-parallel)
|
||||
4. Error handling approach
|
||||
5. Windmill integration workflow
|
||||
6. Any concerns or suggested improvements
|
||||
|
||||
**Ready to proceed with implementation?** Confirm approval of specifications to begin Phase 1.
|
||||
@@ -1,837 +0,0 @@
|
||||
# AI-Trader API Service - Technical Specification
|
||||
|
||||
## 1. API Endpoints Specification
|
||||
|
||||
### 1.1 POST /simulate/trigger
|
||||
|
||||
**Purpose:** Trigger a catch-up simulation from the last completed date to the most recent trading day.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
POST /simulate/trigger HTTP/1.1
|
||||
Content-Type: application/json
|
||||
|
||||
```
|
||||
|
||||
**Response (202 Accepted):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "accepted",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"created_at": "2025-01-20T14:30:00Z",
|
||||
"message": "Simulation job queued successfully"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Job Already Running):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 3,
|
||||
"failed": 0,
|
||||
"current": {
|
||||
"date": "2025-01-17",
|
||||
"model": "gpt-5"
|
||||
}
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"message": "Simulation already in progress"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Already Up To Date):**
|
||||
```json
|
||||
{
|
||||
"status": "current",
|
||||
"message": "Simulation already up-to-date",
|
||||
"last_simulation_date": "2025-01-20",
|
||||
"next_trading_day": "2025-01-21"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (409 Conflict):**
|
||||
```json
|
||||
{
|
||||
"error": "conflict",
|
||||
"message": "Different simulation already running",
|
||||
"current_job_id": "previous-job-uuid",
|
||||
"current_date_range": ["2025-01-10", "2025-01-15"]
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Load configuration from `config_path` (or default)
|
||||
2. Determine last completed date from each model's `position.jsonl`
|
||||
3. Calculate date range: `max(last_dates) + 1 day` → `most_recent_trading_day`
|
||||
4. Filter for weekdays only (Monday-Friday)
|
||||
5. If date_range is empty, return "already up-to-date"
|
||||
6. Check for existing jobs with same date range → return existing job
|
||||
7. Check for running jobs with different date range → return 409
|
||||
8. Create new job in SQLite with status=`pending`
|
||||
9. Queue background task to execute simulation
|
||||
10. Return 202 with job details
|
||||
|
||||
---
|
||||
|
||||
### 1.2 GET /simulate/status/{job_id}
|
||||
|
||||
**Purpose:** Poll the status and progress of a simulation job.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /simulate/status/550e8400-e29b-41d4-a716-446655440000 HTTP/1.1
|
||||
```
|
||||
|
||||
**Response (200 OK - Running):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 3,
|
||||
"failed": 0,
|
||||
"current": {
|
||||
"date": "2025-01-17",
|
||||
"model": "gpt-5"
|
||||
},
|
||||
"details": [
|
||||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 42.1},
|
||||
{"date": "2025-01-17", "model": "gpt-5", "status": "running", "duration_seconds": null}
|
||||
]
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"updated_at": "2025-01-20T14:27:15Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Completed):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "completed",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 6,
|
||||
"failed": 0,
|
||||
"details": [
|
||||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 42.1},
|
||||
{"date": "2025-01-17", "model": "gpt-5", "status": "completed", "duration_seconds": 40.3},
|
||||
{"date": "2025-01-20", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 43.8},
|
||||
{"date": "2025-01-20", "model": "gpt-5", "status": "completed", "duration_seconds": 39.1}
|
||||
]
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"completed_at": "2025-01-20T14:29:45Z",
|
||||
"total_duration_seconds": 285.0
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - Partial Failure):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "partial",
|
||||
"date_range": ["2025-01-16", "2025-01-17", "2025-01-20"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 6,
|
||||
"completed": 4,
|
||||
"failed": 2,
|
||||
"details": [
|
||||
{"date": "2025-01-16", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 45.2},
|
||||
{"date": "2025-01-16", "model": "gpt-5", "status": "completed", "duration_seconds": 38.7},
|
||||
{"date": "2025-01-17", "model": "claude-3.7-sonnet", "status": "failed", "error": "MCP service timeout after 3 retries", "duration_seconds": null},
|
||||
{"date": "2025-01-17", "model": "gpt-5", "status": "completed", "duration_seconds": 40.3},
|
||||
{"date": "2025-01-20", "model": "claude-3.7-sonnet", "status": "completed", "duration_seconds": 43.8},
|
||||
{"date": "2025-01-20", "model": "gpt-5", "status": "failed", "error": "AI model API timeout", "duration_seconds": null}
|
||||
]
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z",
|
||||
"completed_at": "2025-01-20T14:29:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (404 Not Found):**
|
||||
```json
|
||||
{
|
||||
"error": "not_found",
|
||||
"message": "Job not found",
|
||||
"job_id": "invalid-job-id"
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Query SQLite jobs table for job_id
|
||||
2. If not found, return 404
|
||||
3. Return job metadata + progress from job_details table
|
||||
4. Status transitions: `pending` → `running` → `completed`/`partial`/`failed`
|
||||
|
||||
---
|
||||
|
||||
### 1.3 GET /simulate/current
|
||||
|
||||
**Purpose:** Get the most recent simulation job (for Windmill to discover job_id).
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /simulate/current HTTP/1.1
|
||||
```
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"status": "running",
|
||||
"date_range": ["2025-01-16", "2025-01-17"],
|
||||
"models": ["claude-3.7-sonnet", "gpt-5"],
|
||||
"progress": {
|
||||
"total_model_days": 4,
|
||||
"completed": 2,
|
||||
"failed": 0
|
||||
},
|
||||
"created_at": "2025-01-20T14:25:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (404 Not Found):**
|
||||
```json
|
||||
{
|
||||
"error": "not_found",
|
||||
"message": "No simulation jobs found"
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Query SQLite: `SELECT * FROM jobs ORDER BY created_at DESC LIMIT 1`
|
||||
2. Return job details with progress summary
|
||||
|
||||
---
|
||||
|
||||
### 1.4 GET /results
|
||||
|
||||
**Purpose:** Retrieve simulation results for a specific date and model.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /results?date=2025-01-15&model=gpt-5&detail=minimal HTTP/1.1
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
- `date` (required): Trading date in YYYY-MM-DD format
|
||||
- `model` (optional): Model signature (if omitted, returns all models)
|
||||
- `detail` (optional): Response detail level
|
||||
- `minimal` (default): Positions + daily P&L
|
||||
- `full`: + trade history + AI reasoning logs + tool usage stats
|
||||
|
||||
**Response (200 OK - minimal):**
|
||||
```json
|
||||
{
|
||||
"date": "2025-01-15",
|
||||
"results": [
|
||||
{
|
||||
"model": "gpt-5",
|
||||
"positions": {
|
||||
"AAPL": 10,
|
||||
"MSFT": 5,
|
||||
"NVDA": 0,
|
||||
"CASH": 8500.00
|
||||
},
|
||||
"daily_pnl": {
|
||||
"profit": 150.50,
|
||||
"return_pct": 1.5,
|
||||
"portfolio_value": 10150.50
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response (200 OK - full):**
|
||||
```json
|
||||
{
|
||||
"date": "2025-01-15",
|
||||
"results": [
|
||||
{
|
||||
"model": "gpt-5",
|
||||
"positions": {
|
||||
"AAPL": 10,
|
||||
"MSFT": 5,
|
||||
"CASH": 8500.00
|
||||
},
|
||||
"daily_pnl": {
|
||||
"profit": 150.50,
|
||||
"return_pct": 1.5,
|
||||
"portfolio_value": 10150.50
|
||||
},
|
||||
"trades": [
|
||||
{
|
||||
"id": 1,
|
||||
"action": "buy",
|
||||
"symbol": "AAPL",
|
||||
"amount": 10,
|
||||
"price": 255.88,
|
||||
"total": 2558.80
|
||||
}
|
||||
],
|
||||
"ai_reasoning": {
|
||||
"total_steps": 15,
|
||||
"stop_signal_received": true,
|
||||
"reasoning_summary": "Market analysis indicated strong buy signal for AAPL...",
|
||||
"tool_usage": {
|
||||
"search": 3,
|
||||
"get_price": 5,
|
||||
"math": 2,
|
||||
"trade": 1
|
||||
}
|
||||
},
|
||||
"log_file_path": "data/agent_data/gpt-5/log/2025-01-15/log.jsonl"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response (400 Bad Request):**
|
||||
```json
|
||||
{
|
||||
"error": "invalid_date",
|
||||
"message": "Date must be in YYYY-MM-DD format"
|
||||
}
|
||||
```
|
||||
|
||||
**Response (404 Not Found):**
|
||||
```json
|
||||
{
|
||||
"error": "no_data",
|
||||
"message": "No simulation data found for date 2025-01-15 and model gpt-5"
|
||||
}
|
||||
```
|
||||
|
||||
**Business Logic:**
|
||||
1. Validate date format
|
||||
2. Read `position.jsonl` for specified model(s) and date
|
||||
3. For `detail=minimal`: Return positions + calculate daily P&L
|
||||
4. For `detail=full`:
|
||||
- Parse `log.jsonl` to extract reasoning summary
|
||||
- Count tool usage from log messages
|
||||
- Extract trades from position file
|
||||
5. Return aggregated results
|
||||
|
||||
---
|
||||
|
||||
### 1.5 GET /health
|
||||
|
||||
**Purpose:** Health check endpoint for Docker and monitoring.
|
||||
|
||||
**Request:**
|
||||
```http
|
||||
GET /health HTTP/1.1
|
||||
```
|
||||
|
||||
**Response (200 OK):**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"services": {
|
||||
"mcp_math": {"status": "up", "url": "http://localhost:8000/mcp"},
|
||||
"mcp_search": {"status": "up", "url": "http://localhost:8001/mcp"},
|
||||
"mcp_trade": {"status": "up", "url": "http://localhost:8002/mcp"},
|
||||
"mcp_getprice": {"status": "up", "url": "http://localhost:8003/mcp"}
|
||||
},
|
||||
"storage": {
|
||||
"data_directory": "/app/data",
|
||||
"writable": true,
|
||||
"free_space_mb": 15234
|
||||
},
|
||||
"database": {
|
||||
"status": "connected",
|
||||
"path": "/app/data/jobs.db"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response (503 Service Unavailable):**
|
||||
```json
|
||||
{
|
||||
"status": "unhealthy",
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"services": {
|
||||
"mcp_math": {"status": "down", "url": "http://localhost:8000/mcp", "error": "Connection refused"},
|
||||
"mcp_search": {"status": "up", "url": "http://localhost:8001/mcp"},
|
||||
"mcp_trade": {"status": "up", "url": "http://localhost:8002/mcp"},
|
||||
"mcp_getprice": {"status": "up", "url": "http://localhost:8003/mcp"}
|
||||
},
|
||||
"storage": {
|
||||
"data_directory": "/app/data",
|
||||
"writable": true
|
||||
},
|
||||
"database": {
|
||||
"status": "connected"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Data Models
|
||||
|
||||
### 2.1 SQLite Schema
|
||||
|
||||
**Table: jobs**
|
||||
```sql
|
||||
CREATE TABLE jobs (
|
||||
job_id TEXT PRIMARY KEY,
|
||||
config_path TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'partial', 'failed')),
|
||||
date_range TEXT NOT NULL, -- JSON array of dates
|
||||
models TEXT NOT NULL, -- JSON array of model signatures
|
||||
created_at TEXT NOT NULL,
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
total_duration_seconds REAL,
|
||||
error TEXT
|
||||
);
|
||||
|
||||
CREATE INDEX idx_jobs_status ON jobs(status);
|
||||
CREATE INDEX idx_jobs_created_at ON jobs(created_at DESC);
|
||||
```
|
||||
|
||||
**Table: job_details**
|
||||
```sql
|
||||
CREATE TABLE job_details (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed')),
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
duration_seconds REAL,
|
||||
error TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_job_details_job_id ON job_details(job_id);
|
||||
CREATE INDEX idx_job_details_status ON job_details(status);
|
||||
```
|
||||
|
||||
### 2.2 Pydantic Models
|
||||
|
||||
**Request Models:**
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, Literal
|
||||
|
||||
class TriggerSimulationRequest(BaseModel):
|
||||
config_path: Optional[str] = Field(default="configs/default_config.json", description="Path to configuration file")
|
||||
|
||||
class ResultsQueryParams(BaseModel):
|
||||
date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$", description="Date in YYYY-MM-DD format")
|
||||
model: Optional[str] = Field(None, description="Model signature filter")
|
||||
detail: Literal["minimal", "full"] = Field(default="minimal", description="Response detail level")
|
||||
```
|
||||
|
||||
**Response Models:**
|
||||
```python
|
||||
class JobProgress(BaseModel):
|
||||
total_model_days: int
|
||||
completed: int
|
||||
failed: int
|
||||
current: Optional[dict] = None # {"date": str, "model": str}
|
||||
details: Optional[list] = None # List of JobDetailResponse
|
||||
|
||||
class TriggerSimulationResponse(BaseModel):
|
||||
job_id: str
|
||||
status: str
|
||||
date_range: list[str]
|
||||
models: list[str]
|
||||
created_at: str
|
||||
message: str
|
||||
progress: Optional[JobProgress] = None
|
||||
|
||||
class JobStatusResponse(BaseModel):
|
||||
job_id: str
|
||||
status: str
|
||||
date_range: list[str]
|
||||
models: list[str]
|
||||
progress: JobProgress
|
||||
created_at: str
|
||||
updated_at: Optional[str] = None
|
||||
completed_at: Optional[str] = None
|
||||
total_duration_seconds: Optional[float] = None
|
||||
|
||||
class DailyPnL(BaseModel):
|
||||
profit: float
|
||||
return_pct: float
|
||||
portfolio_value: float
|
||||
|
||||
class Trade(BaseModel):
|
||||
id: int
|
||||
action: str
|
||||
symbol: str
|
||||
amount: int
|
||||
price: Optional[float] = None
|
||||
total: Optional[float] = None
|
||||
|
||||
class AIReasoning(BaseModel):
|
||||
total_steps: int
|
||||
stop_signal_received: bool
|
||||
reasoning_summary: str
|
||||
tool_usage: dict[str, int]
|
||||
|
||||
class ModelResult(BaseModel):
|
||||
model: str
|
||||
positions: dict[str, float]
|
||||
daily_pnl: DailyPnL
|
||||
trades: Optional[list[Trade]] = None
|
||||
ai_reasoning: Optional[AIReasoning] = None
|
||||
log_file_path: Optional[str] = None
|
||||
|
||||
class ResultsResponse(BaseModel):
|
||||
date: str
|
||||
results: list[ModelResult]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Configuration Management
|
||||
|
||||
### 3.1 Environment Variables
|
||||
|
||||
Required environment variables remain the same as batch mode:
|
||||
```bash
|
||||
# OpenAI API Configuration
|
||||
OPENAI_API_BASE=https://api.openai.com/v1
|
||||
OPENAI_API_KEY=sk-...
|
||||
|
||||
# Alpha Vantage API
|
||||
ALPHAADVANTAGE_API_KEY=...
|
||||
|
||||
# Jina Search API
|
||||
JINA_API_KEY=...
|
||||
|
||||
# Runtime Config Path (now shared by API and worker)
|
||||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
|
||||
# MCP Service Ports
|
||||
MATH_HTTP_PORT=8000
|
||||
SEARCH_HTTP_PORT=8001
|
||||
TRADE_HTTP_PORT=8002
|
||||
GETPRICE_HTTP_PORT=8003
|
||||
|
||||
# API Server Configuration
|
||||
API_HOST=0.0.0.0
|
||||
API_PORT=8080
|
||||
|
||||
# Job Configuration
|
||||
MAX_CONCURRENT_JOBS=1 # Only one simulation job at a time
|
||||
```
|
||||
|
||||
### 3.2 Runtime State Management
|
||||
|
||||
**Challenge:** Multiple model-days running concurrently need isolated `runtime_env.json` state.
|
||||
|
||||
**Solution:** Per-job runtime config files
|
||||
- `runtime_env_base.json` - Template
|
||||
- `runtime_env_{job_id}_{model}_{date}.json` - Job-specific runtime config
|
||||
- Worker passes custom `RUNTIME_ENV_PATH` to each simulation execution
|
||||
|
||||
**Modified `write_config_value()` and `get_config_value()`:**
|
||||
- Accept optional `runtime_path` parameter
|
||||
- Worker manages lifecycle: create → use → cleanup
|
||||
|
||||
---
|
||||
|
||||
## 4. Error Handling
|
||||
|
||||
### 4.1 Error Response Format
|
||||
|
||||
All errors follow this structure:
|
||||
```json
|
||||
{
|
||||
"error": "error_code",
|
||||
"message": "Human-readable error description",
|
||||
"details": {
|
||||
// Optional additional context
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 HTTP Status Codes
|
||||
|
||||
- `200 OK` - Successful request
|
||||
- `202 Accepted` - Job queued successfully
|
||||
- `400 Bad Request` - Invalid input parameters
|
||||
- `404 Not Found` - Resource not found (job, results)
|
||||
- `409 Conflict` - Concurrent job conflict
|
||||
- `500 Internal Server Error` - Unexpected server error
|
||||
- `503 Service Unavailable` - Health check failed
|
||||
|
||||
### 4.3 Retry Strategy for Workers
|
||||
|
||||
Models run independently - failure of one model doesn't block others:
|
||||
```python
|
||||
async def run_model_day(job_id: str, date: str, model_config: dict):
|
||||
try:
|
||||
# Execute simulation for this model-day
|
||||
await agent.run_trading_session(date)
|
||||
update_job_detail_status(job_id, date, model, "completed")
|
||||
except Exception as e:
|
||||
# Log error, update status to failed, continue with next model-day
|
||||
update_job_detail_status(job_id, date, model, "failed", error=str(e))
|
||||
# Do NOT raise - let other models continue
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Concurrency & Locking
|
||||
|
||||
### 5.1 Job Execution Policy
|
||||
|
||||
**Rule:** Maximum 1 running job at a time (configurable via `MAX_CONCURRENT_JOBS`)
|
||||
|
||||
**Enforcement:**
|
||||
```python
|
||||
def can_start_new_job() -> bool:
|
||||
running_jobs = db.query(
|
||||
"SELECT COUNT(*) FROM jobs WHERE status IN ('pending', 'running')"
|
||||
).fetchone()[0]
|
||||
return running_jobs < MAX_CONCURRENT_JOBS
|
||||
```
|
||||
|
||||
### 5.2 Position File Concurrency
|
||||
|
||||
**Challenge:** Multiple model-days writing to same model's `position.jsonl`
|
||||
|
||||
**Solution:** Sequential execution per model
|
||||
```python
|
||||
# For each date in date_range:
|
||||
# For each model in parallel: ← Models run in parallel
|
||||
# Execute model-day sequentially ← Dates for same model run sequentially
|
||||
```
|
||||
|
||||
**Execution Pattern:**
|
||||
```
|
||||
Date 2025-01-16:
|
||||
- Model A (running)
|
||||
- Model B (running)
|
||||
- Model C (running)
|
||||
|
||||
Date 2025-01-17: ← Starts only after all models finish 2025-01-16
|
||||
- Model A (running)
|
||||
- Model B (running)
|
||||
- Model C (running)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Models write to different position files → No conflict
|
||||
- Same model's dates run sequentially → No race condition on position.jsonl
|
||||
- Date-level parallelism across models → Faster overall execution
|
||||
|
||||
---
|
||||
|
||||
## 6. Performance Considerations
|
||||
|
||||
### 6.1 Execution Time Estimates
|
||||
|
||||
Based on current implementation:
|
||||
- Single model-day: ~30-60 seconds (depends on AI model latency + tool calls)
|
||||
- 3 models × 5 days = 15 model-days ≈ 7.5-15 minutes (parallel execution)
|
||||
|
||||
### 6.2 Timeout Configuration
|
||||
|
||||
**API Request Timeout:**
|
||||
- `/simulate/trigger`: 10 seconds (just queue job)
|
||||
- `/simulate/status`: 5 seconds (read from DB)
|
||||
- `/results`: 30 seconds (file I/O + parsing)
|
||||
|
||||
**Worker Timeout:**
|
||||
- Per model-day: 5 minutes (inherited from `max_retries` × `base_delay`)
|
||||
- Entire job: No timeout (job runs until all model-days complete or fail)
|
||||
|
||||
### 6.3 Optimization Opportunities (Future)
|
||||
|
||||
1. **Results caching:** Store computed daily_pnl in SQLite to avoid recomputation
|
||||
2. **Parallel date execution:** If position file locking is implemented, run dates in parallel
|
||||
3. **Streaming responses:** For `/simulate/status`, use SSE to push updates instead of polling
|
||||
|
||||
---
|
||||
|
||||
## 7. Logging & Observability
|
||||
|
||||
### 7.1 Structured Logging
|
||||
|
||||
All API logs use JSON format:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"level": "INFO",
|
||||
"logger": "api.worker",
|
||||
"message": "Starting simulation for model-day",
|
||||
"job_id": "550e8400-...",
|
||||
"date": "2025-01-16",
|
||||
"model": "gpt-5"
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 Log Levels
|
||||
|
||||
- `DEBUG` - Detailed execution flow (tool calls, price fetches)
|
||||
- `INFO` - Job lifecycle events (created, started, completed)
|
||||
- `WARNING` - Recoverable errors (retry attempts)
|
||||
- `ERROR` - Model-day failures (logged but job continues)
|
||||
- `CRITICAL` - System failures (MCP services down, DB corruption)
|
||||
|
||||
### 7.3 Audit Trail
|
||||
|
||||
All job state transitions logged to `api_audit.log`:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-01-20T14:30:00Z",
|
||||
"event": "job_created",
|
||||
"job_id": "550e8400-...",
|
||||
"user": "windmill-service", // Future: from auth header
|
||||
"details": {"date_range": [...], "models": [...]}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Security Considerations
|
||||
|
||||
### 8.1 Authentication (Future)
|
||||
|
||||
For MVP, API relies on network isolation (Docker network). Future enhancements:
|
||||
- API key authentication via header: `X-API-Key: <token>`
|
||||
- JWT tokens for Windmill integration
|
||||
- Rate limiting per API key
|
||||
|
||||
### 8.2 Input Validation
|
||||
|
||||
- All date parameters validated with regex: `^\d{4}-\d{2}-\d{2}$`
|
||||
- Config paths restricted to `configs/` directory (prevent path traversal)
|
||||
- Model signatures sanitized (alphanumeric + hyphens only)
|
||||
|
||||
### 8.3 File Access Controls
|
||||
|
||||
- Results API only reads from `data/agent_data/` directory
|
||||
- Config API only reads from `configs/` directory
|
||||
- No arbitrary file read via API parameters
|
||||
|
||||
---
|
||||
|
||||
## 9. Deployment Configuration
|
||||
|
||||
### 9.1 Docker Compose
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
ai-trader-api:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
- ./configs:/app/configs
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
- MODE=api
|
||||
- API_PORT=8080
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
### 9.2 Dockerfile Modifications
|
||||
|
||||
```dockerfile
|
||||
# ... existing layers ...
|
||||
|
||||
# Install API dependencies
|
||||
COPY requirements-api.txt /app/
|
||||
RUN pip install --no-cache-dir -r requirements-api.txt
|
||||
|
||||
# Copy API application code
|
||||
COPY api/ /app/api/
|
||||
|
||||
# Copy entrypoint script
|
||||
COPY docker-entrypoint.sh /app/
|
||||
RUN chmod +x /app/docker-entrypoint.sh
|
||||
|
||||
EXPOSE 8080
|
||||
|
||||
CMD ["/app/docker-entrypoint.sh"]
|
||||
```
|
||||
|
||||
### 9.3 Entrypoint Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "Starting MCP services..."
|
||||
cd /app/agent_tools
|
||||
python start_mcp_services.py &
|
||||
MCP_PID=$!
|
||||
|
||||
echo "Waiting for MCP services to be ready..."
|
||||
sleep 10
|
||||
|
||||
echo "Starting API server..."
|
||||
cd /app
|
||||
uvicorn api.main:app --host ${API_HOST:-0.0.0.0} --port ${API_PORT:-8080} --workers 1
|
||||
|
||||
# Cleanup on exit
|
||||
trap "kill $MCP_PID 2>/dev/null || true" EXIT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. API Versioning (Future)
|
||||
|
||||
For v2 and beyond:
|
||||
- URL prefix: `/api/v1/simulate/trigger`, `/api/v2/simulate/trigger`
|
||||
- Header-based: `Accept: application/vnd.ai-trader.v1+json`
|
||||
|
||||
MVP uses unversioned endpoints (implied v1).
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After reviewing this specification, we'll proceed to:
|
||||
1. **Component 2:** Job Manager & SQLite Schema Implementation
|
||||
2. **Component 3:** Background Worker Architecture
|
||||
3. **Component 4:** BaseAgent Refactoring for Single-Day Execution
|
||||
4. **Component 5:** Docker & Deployment Configuration
|
||||
5. **Component 6:** Windmill Integration Flows
|
||||
|
||||
Please review this API specification and provide feedback or approval to continue.
|
||||
5. **Component 6:** Windmill Integration Flows
|
||||
|
||||
Please review this API specification and provide feedback or approval to continue.
|
||||
@@ -1,911 +0,0 @@
|
||||
# Enhanced Database Specification - Results Storage in SQLite
|
||||
|
||||
## 1. Overview
|
||||
|
||||
**Change from Original Spec:** Instead of reading `position.jsonl` on-demand, simulation results are written to SQLite during execution for faster retrieval and queryability.
|
||||
|
||||
**Benefits:**
|
||||
- **Faster `/results` endpoint** - No file I/O on every request
|
||||
- **Advanced querying** - Filter by date range, model, performance metrics
|
||||
- **Aggregations** - Portfolio timeseries, leaderboards, statistics
|
||||
- **Data integrity** - Single source of truth with ACID guarantees
|
||||
- **Backup/restore** - Single database file instead of scattered JSONL files
|
||||
|
||||
**Tradeoff:** Additional database writes during simulation (minimal performance impact)
|
||||
|
||||
---
|
||||
|
||||
## 2. Enhanced Database Schema
|
||||
|
||||
### 2.1 Complete Table Structure
|
||||
|
||||
```sql
|
||||
-- Job tracking tables (from original spec)
|
||||
CREATE TABLE IF NOT EXISTS jobs (
|
||||
job_id TEXT PRIMARY KEY,
|
||||
config_path TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'partial', 'failed')),
|
||||
date_range TEXT NOT NULL,
|
||||
models TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL,
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
total_duration_seconds REAL,
|
||||
error TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS job_details (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed')),
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
duration_seconds REAL,
|
||||
error TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- NEW: Simulation results storage
|
||||
CREATE TABLE IF NOT EXISTS positions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
action_id INTEGER NOT NULL, -- Sequence number within that day
|
||||
action_type TEXT CHECK(action_type IN ('buy', 'sell', 'no_trade')),
|
||||
symbol TEXT,
|
||||
amount INTEGER,
|
||||
price REAL,
|
||||
cash REAL NOT NULL,
|
||||
portfolio_value REAL NOT NULL,
|
||||
daily_profit REAL,
|
||||
daily_return_pct REAL,
|
||||
cumulative_profit REAL,
|
||||
cumulative_return_pct REAL,
|
||||
created_at TEXT NOT NULL,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS holdings (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
position_id INTEGER NOT NULL,
|
||||
symbol TEXT NOT NULL,
|
||||
quantity INTEGER NOT NULL,
|
||||
FOREIGN KEY (position_id) REFERENCES positions(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- NEW: AI reasoning logs (optional - for detail=full)
|
||||
CREATE TABLE IF NOT EXISTS reasoning_logs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
step_number INTEGER NOT NULL,
|
||||
timestamp TEXT NOT NULL,
|
||||
role TEXT CHECK(role IN ('user', 'assistant', 'tool')),
|
||||
content TEXT,
|
||||
tool_name TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- NEW: Tool usage statistics
|
||||
CREATE TABLE IF NOT EXISTS tool_usage (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
tool_name TEXT NOT NULL,
|
||||
call_count INTEGER NOT NULL DEFAULT 1,
|
||||
total_duration_seconds REAL,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- Indexes for performance
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_status ON jobs(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_created_at ON jobs(created_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_job_details_job_id ON job_details(job_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_job_details_status ON job_details(status);
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS idx_job_details_unique ON job_details(job_id, date, model);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_positions_job_id ON positions(job_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_positions_date ON positions(date);
|
||||
CREATE INDEX IF NOT EXISTS idx_positions_model ON positions(model);
|
||||
CREATE INDEX IF NOT EXISTS idx_positions_date_model ON positions(date, model);
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS idx_positions_unique ON positions(job_id, date, model, action_id);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_holdings_position_id ON holdings(position_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_holdings_symbol ON holdings(symbol);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_reasoning_logs_job_date_model ON reasoning_logs(job_id, date, model);
|
||||
CREATE INDEX IF NOT EXISTS idx_tool_usage_job_date_model ON tool_usage(job_id, date, model);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2.2 Table Relationships
|
||||
|
||||
```
|
||||
jobs (1) ──┬──> (N) job_details
|
||||
│
|
||||
├──> (N) positions ──> (N) holdings
|
||||
│
|
||||
├──> (N) reasoning_logs
|
||||
│
|
||||
└──> (N) tool_usage
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2.3 Data Examples
|
||||
|
||||
#### positions table
|
||||
```
|
||||
id | job_id | date | model | action_id | action_type | symbol | amount | price | cash | portfolio_value | daily_profit | daily_return_pct | cumulative_profit | cumulative_return_pct | created_at
|
||||
---|------------|------------|-------|-----------|-------------|--------|--------|--------|---------|-----------------|--------------|------------------|-------------------|----------------------|------------
|
||||
1 | abc-123... | 2025-01-16 | gpt-5 | 0 | no_trade | NULL | NULL | NULL | 10000.0 | 10000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2025-01-16T09:30:00Z
|
||||
2 | abc-123... | 2025-01-16 | gpt-5 | 1 | buy | AAPL | 10 | 255.88 | 7441.2 | 10000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2025-01-16T09:35:12Z
|
||||
3 | abc-123... | 2025-01-17 | gpt-5 | 0 | no_trade | NULL | NULL | NULL | 7441.2 | 10150.5 | 150.5 | 1.51 | 150.5 | 1.51 | 2025-01-17T09:30:00Z
|
||||
4 | abc-123... | 2025-01-17 | gpt-5 | 1 | sell | AAPL | 5 | 262.24 | 8752.4 | 10150.5 | 150.5 | 1.51 | 150.5 | 1.51 | 2025-01-17T09:42:38Z
|
||||
```
|
||||
|
||||
#### holdings table
|
||||
```
|
||||
id | position_id | symbol | quantity
|
||||
---|-------------|--------|----------
|
||||
1 | 2 | AAPL | 10
|
||||
2 | 3 | AAPL | 10
|
||||
3 | 4 | AAPL | 5
|
||||
```
|
||||
|
||||
#### tool_usage table
|
||||
```
|
||||
id | job_id | date | model | tool_name | call_count | total_duration_seconds
|
||||
---|------------|------------|-------|------------|------------|-----------------------
|
||||
1 | abc-123... | 2025-01-16 | gpt-5 | get_price | 5 | 2.3
|
||||
2 | abc-123... | 2025-01-16 | gpt-5 | search | 3 | 12.7
|
||||
3 | abc-123... | 2025-01-16 | gpt-5 | trade | 1 | 0.8
|
||||
4 | abc-123... | 2025-01-16 | gpt-5 | math | 2 | 0.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Data Migration from position.jsonl
|
||||
|
||||
### 3.1 Migration Strategy
|
||||
|
||||
**During execution:** Write to BOTH SQLite AND position.jsonl for backward compatibility
|
||||
|
||||
**Migration path:**
|
||||
1. **Phase 1:** Dual-write mode (write to both SQLite and JSONL)
|
||||
2. **Phase 2:** Verify SQLite data matches JSONL
|
||||
3. **Phase 3:** Switch `/results` endpoint to read from SQLite
|
||||
4. **Phase 4:** (Optional) Deprecate JSONL writes
|
||||
|
||||
**Import existing data:** One-time migration script to populate SQLite from existing position.jsonl files
|
||||
|
||||
---
|
||||
|
||||
### 3.2 Import Script
|
||||
|
||||
```python
|
||||
# api/import_historical_data.py
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from api.database import get_db_connection
|
||||
|
||||
def import_position_jsonl(
|
||||
model_signature: str,
|
||||
position_file: Path,
|
||||
job_id: str = "historical-import"
|
||||
) -> int:
|
||||
"""
|
||||
Import existing position.jsonl data into SQLite.
|
||||
|
||||
Args:
|
||||
model_signature: Model signature (e.g., "gpt-5")
|
||||
position_file: Path to position.jsonl
|
||||
job_id: Job ID to associate with (use "historical-import" for existing data)
|
||||
|
||||
Returns:
|
||||
Number of records imported
|
||||
"""
|
||||
conn = get_db_connection()
|
||||
cursor = conn.cursor()
|
||||
|
||||
imported_count = 0
|
||||
initial_cash = 10000.0
|
||||
|
||||
with open(position_file, 'r') as f:
|
||||
for line in f:
|
||||
if not line.strip():
|
||||
continue
|
||||
|
||||
record = json.loads(line)
|
||||
date = record['date']
|
||||
action_id = record['id']
|
||||
action = record.get('this_action', {})
|
||||
positions = record.get('positions', {})
|
||||
|
||||
# Extract action details
|
||||
action_type = action.get('action', 'no_trade')
|
||||
symbol = action.get('symbol', None)
|
||||
amount = action.get('amount', None)
|
||||
price = None # Not stored in original position.jsonl
|
||||
|
||||
# Extract holdings
|
||||
cash = positions.get('CASH', 0.0)
|
||||
holdings = {k: v for k, v in positions.items() if k != 'CASH' and v > 0}
|
||||
|
||||
# Calculate portfolio value (approximate - need price data)
|
||||
portfolio_value = cash # Base value
|
||||
|
||||
# Calculate profits (need previous record)
|
||||
daily_profit = 0.0
|
||||
daily_return_pct = 0.0
|
||||
cumulative_profit = cash - initial_cash # Simplified
|
||||
cumulative_return_pct = (cumulative_profit / initial_cash) * 100
|
||||
|
||||
# Insert position record
|
||||
cursor.execute("""
|
||||
INSERT INTO positions (
|
||||
job_id, date, model, action_id, action_type, symbol, amount, price,
|
||||
cash, portfolio_value, daily_profit, daily_return_pct,
|
||||
cumulative_profit, cumulative_return_pct, created_at
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""", (
|
||||
job_id, date, model_signature, action_id, action_type, symbol, amount, price,
|
||||
cash, portfolio_value, daily_profit, daily_return_pct,
|
||||
cumulative_profit, cumulative_return_pct, datetime.utcnow().isoformat() + "Z"
|
||||
))
|
||||
|
||||
position_id = cursor.lastrowid
|
||||
|
||||
# Insert holdings
|
||||
for sym, qty in holdings.items():
|
||||
cursor.execute("""
|
||||
INSERT INTO holdings (position_id, symbol, quantity)
|
||||
VALUES (?, ?, ?)
|
||||
""", (position_id, sym, qty))
|
||||
|
||||
imported_count += 1
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return imported_count
|
||||
|
||||
|
||||
def import_all_historical_data(base_path: Path = Path("data/agent_data")) -> dict:
|
||||
"""
|
||||
Import all existing position.jsonl files from data/agent_data/.
|
||||
|
||||
Returns:
|
||||
Summary dict with import counts per model
|
||||
"""
|
||||
summary = {}
|
||||
|
||||
for model_dir in base_path.iterdir():
|
||||
if not model_dir.is_dir():
|
||||
continue
|
||||
|
||||
model_signature = model_dir.name
|
||||
position_file = model_dir / "position" / "position.jsonl"
|
||||
|
||||
if not position_file.exists():
|
||||
continue
|
||||
|
||||
print(f"Importing {model_signature}...")
|
||||
count = import_position_jsonl(model_signature, position_file)
|
||||
summary[model_signature] = count
|
||||
print(f" Imported {count} records")
|
||||
|
||||
return summary
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Starting historical data import...")
|
||||
summary = import_all_historical_data()
|
||||
print(f"\nImport complete: {summary}")
|
||||
print(f"Total records: {sum(summary.values())}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Updated Results Service
|
||||
|
||||
### 4.1 ResultsService Class
|
||||
|
||||
```python
|
||||
# api/results_service.py
|
||||
|
||||
from typing import List, Dict, Optional
|
||||
from datetime import datetime
|
||||
from api.database import get_db_connection
|
||||
|
||||
class ResultsService:
|
||||
"""
|
||||
Service for retrieving simulation results from SQLite.
|
||||
|
||||
Replaces on-demand reading of position.jsonl files.
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: str = "data/jobs.db"):
|
||||
self.db_path = db_path
|
||||
|
||||
def get_results(
|
||||
self,
|
||||
date: str,
|
||||
model: Optional[str] = None,
|
||||
detail: str = "minimal"
|
||||
) -> Dict:
|
||||
"""
|
||||
Get simulation results for specified date and model(s).
|
||||
|
||||
Args:
|
||||
date: Trading date (YYYY-MM-DD)
|
||||
model: Optional model signature filter
|
||||
detail: "minimal" or "full"
|
||||
|
||||
Returns:
|
||||
{
|
||||
"date": str,
|
||||
"results": [
|
||||
{
|
||||
"model": str,
|
||||
"positions": {...},
|
||||
"daily_pnl": {...},
|
||||
"trades": [...], // if detail=full
|
||||
"ai_reasoning": {...} // if detail=full
|
||||
}
|
||||
]
|
||||
}
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
|
||||
# Get all models for this date (or specific model)
|
||||
if model:
|
||||
models = [model]
|
||||
else:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("""
|
||||
SELECT DISTINCT model FROM positions WHERE date = ?
|
||||
""", (date,))
|
||||
models = [row[0] for row in cursor.fetchall()]
|
||||
|
||||
results = []
|
||||
|
||||
for mdl in models:
|
||||
result = self._get_model_result(conn, date, mdl, detail)
|
||||
if result:
|
||||
results.append(result)
|
||||
|
||||
conn.close()
|
||||
|
||||
return {
|
||||
"date": date,
|
||||
"results": results
|
||||
}
|
||||
|
||||
def _get_model_result(
|
||||
self,
|
||||
conn,
|
||||
date: str,
|
||||
model: str,
|
||||
detail: str
|
||||
) -> Optional[Dict]:
|
||||
"""Get result for single model on single date"""
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Get latest position for this date (highest action_id)
|
||||
cursor.execute("""
|
||||
SELECT
|
||||
cash, portfolio_value, daily_profit, daily_return_pct,
|
||||
cumulative_profit, cumulative_return_pct
|
||||
FROM positions
|
||||
WHERE date = ? AND model = ?
|
||||
ORDER BY action_id DESC
|
||||
LIMIT 1
|
||||
""", (date, model))
|
||||
|
||||
row = cursor.fetchone()
|
||||
if not row:
|
||||
return None
|
||||
|
||||
cash, portfolio_value, daily_profit, daily_return_pct, cumulative_profit, cumulative_return_pct = row
|
||||
|
||||
# Get holdings for latest position
|
||||
cursor.execute("""
|
||||
SELECT h.symbol, h.quantity
|
||||
FROM holdings h
|
||||
JOIN positions p ON h.position_id = p.id
|
||||
WHERE p.date = ? AND p.model = ?
|
||||
ORDER BY p.action_id DESC
|
||||
LIMIT 100 -- One position worth of holdings
|
||||
""", (date, model))
|
||||
|
||||
holdings = {row[0]: row[1] for row in cursor.fetchall()}
|
||||
holdings['CASH'] = cash
|
||||
|
||||
result = {
|
||||
"model": model,
|
||||
"positions": holdings,
|
||||
"daily_pnl": {
|
||||
"profit": daily_profit,
|
||||
"return_pct": daily_return_pct,
|
||||
"portfolio_value": portfolio_value
|
||||
},
|
||||
"cumulative_pnl": {
|
||||
"profit": cumulative_profit,
|
||||
"return_pct": cumulative_return_pct
|
||||
}
|
||||
}
|
||||
|
||||
# Add full details if requested
|
||||
if detail == "full":
|
||||
result["trades"] = self._get_trades(cursor, date, model)
|
||||
result["ai_reasoning"] = self._get_reasoning(cursor, date, model)
|
||||
result["tool_usage"] = self._get_tool_usage(cursor, date, model)
|
||||
|
||||
return result
|
||||
|
||||
def _get_trades(self, cursor, date: str, model: str) -> List[Dict]:
|
||||
"""Get all trades executed on this date"""
|
||||
cursor.execute("""
|
||||
SELECT action_id, action_type, symbol, amount, price
|
||||
FROM positions
|
||||
WHERE date = ? AND model = ? AND action_type IN ('buy', 'sell')
|
||||
ORDER BY action_id
|
||||
""", (date, model))
|
||||
|
||||
trades = []
|
||||
for row in cursor.fetchall():
|
||||
trades.append({
|
||||
"id": row[0],
|
||||
"action": row[1],
|
||||
"symbol": row[2],
|
||||
"amount": row[3],
|
||||
"price": row[4],
|
||||
"total": row[3] * row[4] if row[3] and row[4] else None
|
||||
})
|
||||
|
||||
return trades
|
||||
|
||||
def _get_reasoning(self, cursor, date: str, model: str) -> Dict:
|
||||
"""Get AI reasoning summary"""
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) as total_steps,
|
||||
COUNT(CASE WHEN role = 'assistant' THEN 1 END) as assistant_messages,
|
||||
COUNT(CASE WHEN role = 'tool' THEN 1 END) as tool_messages
|
||||
FROM reasoning_logs
|
||||
WHERE date = ? AND model = ?
|
||||
""", (date, model))
|
||||
|
||||
row = cursor.fetchone()
|
||||
total_steps = row[0] if row else 0
|
||||
|
||||
# Get reasoning summary (last assistant message with FINISH_SIGNAL)
|
||||
cursor.execute("""
|
||||
SELECT content FROM reasoning_logs
|
||||
WHERE date = ? AND model = ? AND role = 'assistant'
|
||||
AND content LIKE '%<FINISH_SIGNAL>%'
|
||||
ORDER BY step_number DESC
|
||||
LIMIT 1
|
||||
""", (date, model))
|
||||
|
||||
row = cursor.fetchone()
|
||||
reasoning_summary = row[0] if row else "No reasoning summary available"
|
||||
|
||||
return {
|
||||
"total_steps": total_steps,
|
||||
"stop_signal_received": "<FINISH_SIGNAL>" in reasoning_summary,
|
||||
"reasoning_summary": reasoning_summary[:500] # Truncate for brevity
|
||||
}
|
||||
|
||||
def _get_tool_usage(self, cursor, date: str, model: str) -> Dict[str, int]:
|
||||
"""Get tool usage counts"""
|
||||
cursor.execute("""
|
||||
SELECT tool_name, call_count
|
||||
FROM tool_usage
|
||||
WHERE date = ? AND model = ?
|
||||
""", (date, model))
|
||||
|
||||
return {row[0]: row[1] for row in cursor.fetchall()}
|
||||
|
||||
def get_portfolio_timeseries(
|
||||
self,
|
||||
model: str,
|
||||
start_date: Optional[str] = None,
|
||||
end_date: Optional[str] = None
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
Get portfolio value over time for a model.
|
||||
|
||||
Returns:
|
||||
[
|
||||
{"date": "2025-01-16", "portfolio_value": 10000.0, "daily_return_pct": 0.0},
|
||||
{"date": "2025-01-17", "portfolio_value": 10150.5, "daily_return_pct": 1.51},
|
||||
...
|
||||
]
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
query = """
|
||||
SELECT date, portfolio_value, daily_return_pct, cumulative_return_pct
|
||||
FROM (
|
||||
SELECT date, portfolio_value, daily_return_pct, cumulative_return_pct,
|
||||
ROW_NUMBER() OVER (PARTITION BY date ORDER BY action_id DESC) as rn
|
||||
FROM positions
|
||||
WHERE model = ?
|
||||
)
|
||||
WHERE rn = 1
|
||||
"""
|
||||
|
||||
params = [model]
|
||||
|
||||
if start_date:
|
||||
query += " AND date >= ?"
|
||||
params.append(start_date)
|
||||
if end_date:
|
||||
query += " AND date <= ?"
|
||||
params.append(end_date)
|
||||
|
||||
query += " ORDER BY date ASC"
|
||||
|
||||
cursor.execute(query, params)
|
||||
|
||||
timeseries = []
|
||||
for row in cursor.fetchall():
|
||||
timeseries.append({
|
||||
"date": row[0],
|
||||
"portfolio_value": row[1],
|
||||
"daily_return_pct": row[2],
|
||||
"cumulative_return_pct": row[3]
|
||||
})
|
||||
|
||||
conn.close()
|
||||
return timeseries
|
||||
|
||||
def get_leaderboard(self, date: Optional[str] = None) -> List[Dict]:
|
||||
"""
|
||||
Get model performance leaderboard.
|
||||
|
||||
Args:
|
||||
date: Optional date filter (latest results if not specified)
|
||||
|
||||
Returns:
|
||||
[
|
||||
{"model": "gpt-5", "portfolio_value": 10500, "cumulative_return_pct": 5.0, "rank": 1},
|
||||
{"model": "claude-3.7-sonnet", "portfolio_value": 10300, "cumulative_return_pct": 3.0, "rank": 2},
|
||||
...
|
||||
]
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
if date:
|
||||
# Specific date leaderboard
|
||||
cursor.execute("""
|
||||
SELECT model, portfolio_value, cumulative_return_pct
|
||||
FROM (
|
||||
SELECT model, portfolio_value, cumulative_return_pct,
|
||||
ROW_NUMBER() OVER (PARTITION BY model ORDER BY action_id DESC) as rn
|
||||
FROM positions
|
||||
WHERE date = ?
|
||||
)
|
||||
WHERE rn = 1
|
||||
ORDER BY portfolio_value DESC
|
||||
""", (date,))
|
||||
else:
|
||||
# Latest results for each model
|
||||
cursor.execute("""
|
||||
SELECT model, portfolio_value, cumulative_return_pct
|
||||
FROM (
|
||||
SELECT model, portfolio_value, cumulative_return_pct,
|
||||
ROW_NUMBER() OVER (PARTITION BY model ORDER BY date DESC, action_id DESC) as rn
|
||||
FROM positions
|
||||
)
|
||||
WHERE rn = 1
|
||||
ORDER BY portfolio_value DESC
|
||||
""")
|
||||
|
||||
leaderboard = []
|
||||
rank = 1
|
||||
for row in cursor.fetchall():
|
||||
leaderboard.append({
|
||||
"rank": rank,
|
||||
"model": row[0],
|
||||
"portfolio_value": row[1],
|
||||
"cumulative_return_pct": row[2]
|
||||
})
|
||||
rank += 1
|
||||
|
||||
conn.close()
|
||||
return leaderboard
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Updated Executor - Write to SQLite
|
||||
|
||||
```python
|
||||
# api/executor.py (additions to existing code)
|
||||
|
||||
class ModelDayExecutor:
|
||||
# ... existing code ...
|
||||
|
||||
async def run_model_day(
|
||||
self,
|
||||
job_id: str,
|
||||
date: str,
|
||||
model_config: Dict[str, Any],
|
||||
agent_class: type,
|
||||
config: Dict[str, Any]
|
||||
) -> None:
|
||||
"""Execute simulation for one model on one date"""
|
||||
|
||||
# ... existing execution code ...
|
||||
|
||||
try:
|
||||
# Execute trading session
|
||||
await agent.run_trading_session(date)
|
||||
|
||||
# NEW: Extract and store results in SQLite
|
||||
self._store_results_to_db(job_id, date, model_sig)
|
||||
|
||||
# Mark as completed
|
||||
self.job_manager.update_job_detail_status(
|
||||
job_id, date, model_sig, "completed"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
# ... error handling ...
|
||||
|
||||
def _store_results_to_db(self, job_id: str, date: str, model: str) -> None:
|
||||
"""
|
||||
Extract data from position.jsonl and log.jsonl, store in SQLite.
|
||||
|
||||
This runs after agent.run_trading_session() completes.
|
||||
"""
|
||||
from api.database import get_db_connection
|
||||
from pathlib import Path
|
||||
import json
|
||||
|
||||
conn = get_db_connection()
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Read position.jsonl for this model
|
||||
position_file = Path(f"data/agent_data/{model}/position/position.jsonl")
|
||||
|
||||
if not position_file.exists():
|
||||
logger.warning(f"Position file not found: {position_file}")
|
||||
return
|
||||
|
||||
# Find records for this date
|
||||
with open(position_file, 'r') as f:
|
||||
for line in f:
|
||||
if not line.strip():
|
||||
continue
|
||||
|
||||
record = json.loads(line)
|
||||
if record['date'] != date:
|
||||
continue # Skip other dates
|
||||
|
||||
# Extract fields
|
||||
action_id = record['id']
|
||||
action = record.get('this_action', {})
|
||||
positions = record.get('positions', {})
|
||||
|
||||
action_type = action.get('action', 'no_trade')
|
||||
symbol = action.get('symbol')
|
||||
amount = action.get('amount')
|
||||
price = None # TODO: Get from price data if needed
|
||||
|
||||
cash = positions.get('CASH', 0.0)
|
||||
holdings = {k: v for k, v in positions.items() if k != 'CASH' and v > 0}
|
||||
|
||||
# Calculate portfolio value (simplified - improve with actual prices)
|
||||
portfolio_value = cash # + sum(holdings value)
|
||||
|
||||
# Calculate daily P&L (compare to previous day's closing value)
|
||||
# TODO: Implement proper P&L calculation
|
||||
|
||||
# Insert position
|
||||
cursor.execute("""
|
||||
INSERT INTO positions (
|
||||
job_id, date, model, action_id, action_type, symbol, amount, price,
|
||||
cash, portfolio_value, daily_profit, daily_return_pct,
|
||||
cumulative_profit, cumulative_return_pct, created_at
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""", (
|
||||
job_id, date, model, action_id, action_type, symbol, amount, price,
|
||||
cash, portfolio_value, 0.0, 0.0, # TODO: Calculate P&L
|
||||
0.0, 0.0, # TODO: Calculate cumulative P&L
|
||||
datetime.utcnow().isoformat() + "Z"
|
||||
))
|
||||
|
||||
position_id = cursor.lastrowid
|
||||
|
||||
# Insert holdings
|
||||
for sym, qty in holdings.items():
|
||||
cursor.execute("""
|
||||
INSERT INTO holdings (position_id, symbol, quantity)
|
||||
VALUES (?, ?, ?)
|
||||
""", (position_id, sym, qty))
|
||||
|
||||
# Parse log.jsonl for reasoning (if detail=full is needed later)
|
||||
# TODO: Implement log parsing and storage in reasoning_logs table
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
logger.info(f"Stored results for {model} on {date} in SQLite")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Migration Path
|
||||
|
||||
### 6.1 Backward Compatibility
|
||||
|
||||
**Keep position.jsonl writes** to ensure existing tools/scripts continue working:
|
||||
|
||||
```python
|
||||
# In agent/base_agent/base_agent.py - no changes needed
|
||||
# position.jsonl writing continues as normal
|
||||
|
||||
# In api/executor.py - AFTER position.jsonl is written
|
||||
await agent.run_trading_session(date) # Writes to position.jsonl
|
||||
self._store_results_to_db(job_id, date, model_sig) # Copies to SQLite
|
||||
```
|
||||
|
||||
### 6.2 Gradual Migration
|
||||
|
||||
**Week 1:** Deploy with dual-write (JSONL + SQLite)
|
||||
**Week 2:** Verify data consistency, fix any discrepancies
|
||||
**Week 3:** Switch `/results` endpoint to read from SQLite
|
||||
**Week 4:** (Optional) Remove JSONL writes
|
||||
|
||||
---
|
||||
|
||||
## 7. Updated API Endpoints
|
||||
|
||||
### 7.1 Enhanced `/results` Endpoint
|
||||
|
||||
```python
|
||||
# api/main.py
|
||||
|
||||
from api.results_service import ResultsService
|
||||
|
||||
results_service = ResultsService()
|
||||
|
||||
@app.get("/results")
|
||||
async def get_results(
|
||||
date: str,
|
||||
model: Optional[str] = None,
|
||||
detail: str = "minimal"
|
||||
):
|
||||
"""Get simulation results from SQLite (fast!)"""
|
||||
# Validate date format
|
||||
try:
|
||||
datetime.strptime(date, "%Y-%m-%d")
|
||||
except ValueError:
|
||||
raise HTTPException(status_code=400, detail="Invalid date format (use YYYY-MM-DD)")
|
||||
|
||||
results = results_service.get_results(date, model, detail)
|
||||
|
||||
if not results["results"]:
|
||||
raise HTTPException(status_code=404, detail=f"No data found for date {date}")
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
### 7.2 New Endpoints for Advanced Queries
|
||||
|
||||
```python
|
||||
@app.get("/portfolio/timeseries")
|
||||
async def get_portfolio_timeseries(
|
||||
model: str,
|
||||
start_date: Optional[str] = None,
|
||||
end_date: Optional[str] = None
|
||||
):
|
||||
"""Get portfolio value over time for a model"""
|
||||
timeseries = results_service.get_portfolio_timeseries(model, start_date, end_date)
|
||||
|
||||
if not timeseries:
|
||||
raise HTTPException(status_code=404, detail=f"No data found for model {model}")
|
||||
|
||||
return {
|
||||
"model": model,
|
||||
"timeseries": timeseries
|
||||
}
|
||||
|
||||
|
||||
@app.get("/leaderboard")
|
||||
async def get_leaderboard(date: Optional[str] = None):
|
||||
"""Get model performance leaderboard"""
|
||||
leaderboard = results_service.get_leaderboard(date)
|
||||
|
||||
return {
|
||||
"date": date or "latest",
|
||||
"leaderboard": leaderboard
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Database Maintenance
|
||||
|
||||
### 8.1 Cleanup Old Data
|
||||
|
||||
```python
|
||||
# api/job_manager.py (add method)
|
||||
|
||||
def cleanup_old_data(self, days: int = 90) -> dict:
|
||||
"""
|
||||
Delete jobs and associated data older than specified days.
|
||||
|
||||
Returns:
|
||||
Summary of deleted records
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cutoff_date = (datetime.utcnow() - timedelta(days=days)).isoformat() + "Z"
|
||||
|
||||
# Count records before deletion
|
||||
cursor.execute("SELECT COUNT(*) FROM jobs WHERE created_at < ?", (cutoff_date,))
|
||||
jobs_to_delete = cursor.fetchone()[0]
|
||||
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) FROM positions
|
||||
WHERE job_id IN (SELECT job_id FROM jobs WHERE created_at < ?)
|
||||
""", (cutoff_date,))
|
||||
positions_to_delete = cursor.fetchone()[0]
|
||||
|
||||
# Delete (CASCADE will handle related tables)
|
||||
cursor.execute("DELETE FROM jobs WHERE created_at < ?", (cutoff_date,))
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return {
|
||||
"cutoff_date": cutoff_date,
|
||||
"jobs_deleted": jobs_to_delete,
|
||||
"positions_deleted": positions_to_delete
|
||||
}
|
||||
```
|
||||
|
||||
### 8.2 Vacuum Database
|
||||
|
||||
```python
|
||||
def vacuum_database(self) -> None:
|
||||
"""Reclaim disk space after deletes"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
conn.execute("VACUUM")
|
||||
conn.close()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Enhanced database schema** with 6 tables:
|
||||
- `jobs`, `job_details` (job tracking)
|
||||
- `positions`, `holdings` (simulation results)
|
||||
- `reasoning_logs`, `tool_usage` (AI details)
|
||||
|
||||
**Benefits:**
|
||||
- ⚡ **10-100x faster** `/results` queries (no file I/O)
|
||||
- 📊 **Advanced analytics** - timeseries, leaderboards, aggregations
|
||||
- 🔒 **Data integrity** - ACID compliance, foreign keys
|
||||
- 🗄️ **Single source of truth** - all data in one place
|
||||
|
||||
**Migration strategy:** Dual-write (JSONL + SQLite) for backward compatibility
|
||||
|
||||
**Next:** Comprehensive testing suite specification
|
||||
95
docs/deployment/docker-deployment.md
Normal file
95
docs/deployment/docker-deployment.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# Docker Deployment
|
||||
|
||||
Production Docker deployment guide.
|
||||
|
||||
---
|
||||
|
||||
## Quick Deployment
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Xe138/AI-Trader.git
|
||||
cd AI-Trader
|
||||
cp .env.example .env
|
||||
# Edit .env with API keys
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Production Configuration
|
||||
|
||||
### Use Pre-built Image
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
ai-trader:
|
||||
image: ghcr.io/xe138/ai-trader:latest
|
||||
# ... rest of config
|
||||
```
|
||||
|
||||
### Build Locally
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
ai-trader:
|
||||
build: .
|
||||
# ... rest of config
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Volume Persistence
|
||||
|
||||
Ensure data persists across restarts:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- ./data:/app/data # Required: database and cache
|
||||
- ./logs:/app/logs # Recommended: application logs
|
||||
- ./configs:/app/configs # Required: model configurations
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Security
|
||||
|
||||
- Never commit `.env` to version control
|
||||
- Use secrets management (Docker secrets, Kubernetes secrets)
|
||||
- Rotate API keys regularly
|
||||
- Restrict network access to API port
|
||||
|
||||
---
|
||||
|
||||
## Health Checks
|
||||
|
||||
Docker automatically restarts unhealthy containers:
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
```bash
|
||||
# Container status
|
||||
docker ps
|
||||
|
||||
# Resource usage
|
||||
docker stats ai-trader
|
||||
|
||||
# Logs
|
||||
docker logs -f ai-trader
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
See [DOCKER_API.md](../../DOCKER_API.md) for detailed Docker documentation.
|
||||
49
docs/deployment/monitoring.md
Normal file
49
docs/deployment/monitoring.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Monitoring
|
||||
|
||||
Health checks, logging, and metrics.
|
||||
|
||||
---
|
||||
|
||||
## Health Checks
|
||||
|
||||
```bash
|
||||
# Manual check
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Automated monitoring (cron)
|
||||
*/5 * * * * curl -f http://localhost:8080/health || echo "API down" | mail -s "Alert" admin@example.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Logging
|
||||
|
||||
```bash
|
||||
# View logs
|
||||
docker logs -f ai-trader
|
||||
|
||||
# Filter errors
|
||||
docker logs ai-trader 2>&1 | grep -i error
|
||||
|
||||
# Export logs
|
||||
docker logs ai-trader > ai-trader.log 2>&1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Monitoring
|
||||
|
||||
```bash
|
||||
# Database size
|
||||
docker exec ai-trader du -h /app/data/jobs.db
|
||||
|
||||
# Job statistics
|
||||
docker exec ai-trader sqlite3 /app/data/jobs.db \
|
||||
"SELECT status, COUNT(*) FROM jobs GROUP BY status;"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Metrics (Future)
|
||||
|
||||
Prometheus metrics planned for v0.4.0.
|
||||
50
docs/deployment/production-checklist.md
Normal file
50
docs/deployment/production-checklist.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Production Deployment Checklist
|
||||
|
||||
Pre-deployment verification.
|
||||
|
||||
---
|
||||
|
||||
## Pre-Deployment
|
||||
|
||||
- [ ] API keys configured in `.env`
|
||||
- [ ] Environment variables reviewed
|
||||
- [ ] Model configuration validated
|
||||
- [ ] Port availability confirmed
|
||||
- [ ] Volume mounts configured
|
||||
- [ ] Health checks enabled
|
||||
- [ ] Restart policy set
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
- [ ] `bash scripts/validate_docker_build.sh` passes
|
||||
- [ ] `bash scripts/test_api_endpoints.sh` passes
|
||||
- [ ] Health endpoint responds correctly
|
||||
- [ ] Sample simulation completes successfully
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
- [ ] Log aggregation configured
|
||||
- [ ] Health check monitoring enabled
|
||||
- [ ] Alerting configured for failures
|
||||
- [ ] Database backup strategy defined
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
- [ ] API keys stored securely (not in code)
|
||||
- [ ] `.env` excluded from version control
|
||||
- [ ] Network access restricted
|
||||
- [ ] SSL/TLS configured (if exposing publicly)
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
- [ ] Runbook created for operations team
|
||||
- [ ] Escalation procedures documented
|
||||
- [ ] Recovery procedures tested
|
||||
46
docs/deployment/scaling.md
Normal file
46
docs/deployment/scaling.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Scaling
|
||||
|
||||
Running multiple instances and load balancing.
|
||||
|
||||
---
|
||||
|
||||
## Current Limitations
|
||||
|
||||
- Maximum 1 concurrent job per instance
|
||||
- No built-in load balancing
|
||||
- Single SQLite database per instance
|
||||
|
||||
---
|
||||
|
||||
## Multi-Instance Deployment
|
||||
|
||||
For parallel simulations, deploy multiple instances:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
ai-trader-1:
|
||||
image: ghcr.io/xe138/ai-trader:latest
|
||||
ports:
|
||||
- "8081:8080"
|
||||
volumes:
|
||||
- ./data1:/app/data
|
||||
|
||||
ai-trader-2:
|
||||
image: ghcr.io/xe138/ai-trader:latest
|
||||
ports:
|
||||
- "8082:8080"
|
||||
volumes:
|
||||
- ./data2:/app/data
|
||||
```
|
||||
|
||||
**Note:** Each instance needs separate database and data volumes.
|
||||
|
||||
---
|
||||
|
||||
## Load Balancing (Future)
|
||||
|
||||
Planned for v0.4.0:
|
||||
- Shared PostgreSQL database
|
||||
- Job queue with multiple workers
|
||||
- Horizontal scaling support
|
||||
48
docs/developer/CONTRIBUTING.md
Normal file
48
docs/developer/CONTRIBUTING.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Contributing to AI-Trader
|
||||
|
||||
Guidelines for contributing to the project.
|
||||
|
||||
---
|
||||
|
||||
## Development Setup
|
||||
|
||||
See [development-setup.md](development-setup.md)
|
||||
|
||||
---
|
||||
|
||||
## Pull Request Process
|
||||
|
||||
1. Fork the repository
|
||||
2. Create feature branch: `git checkout -b feature/my-feature`
|
||||
3. Make changes
|
||||
4. Run tests: `pytest tests/`
|
||||
5. Update documentation
|
||||
6. Commit: `git commit -m "Add feature: description"`
|
||||
7. Push: `git push origin feature/my-feature`
|
||||
8. Create Pull Request
|
||||
|
||||
---
|
||||
|
||||
## Code Style
|
||||
|
||||
- Follow PEP 8 for Python
|
||||
- Use type hints
|
||||
- Add docstrings to public functions
|
||||
- Keep functions focused and small
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
- Unit tests for new functionality
|
||||
- Integration tests for API changes
|
||||
- Maintain test coverage >80%
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
- Update README.md for new features
|
||||
- Add entries to CHANGELOG.md
|
||||
- Update API_REFERENCE.md for endpoint changes
|
||||
- Include examples in relevant guides
|
||||
69
docs/developer/adding-models.md
Normal file
69
docs/developer/adding-models.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Adding Custom AI Models
|
||||
|
||||
How to add and configure custom AI models.
|
||||
|
||||
---
|
||||
|
||||
## Basic Setup
|
||||
|
||||
Edit `configs/default_config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "Your Model Name",
|
||||
"basemodel": "provider/model-id",
|
||||
"signature": "unique-identifier",
|
||||
"enabled": true
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### OpenAI Models
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "GPT-4",
|
||||
"basemodel": "openai/gpt-4",
|
||||
"signature": "gpt-4",
|
||||
"enabled": true
|
||||
}
|
||||
```
|
||||
|
||||
### Anthropic Claude
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Claude 3.7 Sonnet",
|
||||
"basemodel": "anthropic/claude-3.7-sonnet",
|
||||
"signature": "claude-3.7-sonnet",
|
||||
"enabled": true,
|
||||
"openai_base_url": "https://api.anthropic.com/v1",
|
||||
"openai_api_key": "your-anthropic-key"
|
||||
}
|
||||
```
|
||||
|
||||
### Via OpenRouter
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "DeepSeek",
|
||||
"basemodel": "deepseek/deepseek-chat",
|
||||
"signature": "deepseek",
|
||||
"enabled": true,
|
||||
"openai_base_url": "https://openrouter.ai/api/v1",
|
||||
"openai_api_key": "your-openrouter-key"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Field Reference
|
||||
|
||||
See [docs/user-guide/configuration.md](../user-guide/configuration.md#model-configuration-fields) for complete field descriptions.
|
||||
68
docs/developer/architecture.md
Normal file
68
docs/developer/architecture.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Architecture
|
||||
|
||||
System design and component overview.
|
||||
|
||||
---
|
||||
|
||||
## Component Diagram
|
||||
|
||||
See README.md for architecture diagram.
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### FastAPI Server (`api/main.py`)
|
||||
- REST API endpoints
|
||||
- Request validation
|
||||
- Response formatting
|
||||
|
||||
### Job Manager (`api/job_manager.py`)
|
||||
- Job lifecycle management
|
||||
- SQLite operations
|
||||
- Concurrency control
|
||||
|
||||
### Simulation Worker (`api/simulation_worker.py`)
|
||||
- Background job execution
|
||||
- Date-sequential, model-parallel orchestration
|
||||
- Error handling
|
||||
|
||||
### Model-Day Executor (`api/model_day_executor.py`)
|
||||
- Single model-day execution
|
||||
- Runtime config isolation
|
||||
- Agent invocation
|
||||
|
||||
### Base Agent (`agent/base_agent/base_agent.py`)
|
||||
- Trading session execution
|
||||
- MCP tool integration
|
||||
- Position management
|
||||
|
||||
### MCP Services (`agent_tools/`)
|
||||
- Math, Search, Trade, Price tools
|
||||
- Internal HTTP servers
|
||||
- Localhost-only access
|
||||
|
||||
---
|
||||
|
||||
## Data Flow
|
||||
|
||||
1. API receives trigger request
|
||||
2. Job Manager validates and creates job
|
||||
3. Worker starts background execution
|
||||
4. For each date (sequential):
|
||||
- For each model (parallel):
|
||||
- Executor creates isolated runtime config
|
||||
- Agent executes trading session
|
||||
- Results stored in database
|
||||
5. Job status updated
|
||||
6. Results available via API
|
||||
|
||||
---
|
||||
|
||||
## Anti-Look-Ahead Controls
|
||||
|
||||
- `TODAY_DATE` in runtime config limits data access
|
||||
- Price queries filter by date
|
||||
- Search results filtered by publication date
|
||||
|
||||
See [CLAUDE.md](../../CLAUDE.md) for implementation details.
|
||||
94
docs/developer/database-schema.md
Normal file
94
docs/developer/database-schema.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Database Schema
|
||||
|
||||
SQLite database schema reference.
|
||||
|
||||
---
|
||||
|
||||
## Tables
|
||||
|
||||
### jobs
|
||||
Job metadata and overall status.
|
||||
|
||||
```sql
|
||||
CREATE TABLE jobs (
|
||||
job_id TEXT PRIMARY KEY,
|
||||
config_path TEXT NOT NULL,
|
||||
status TEXT CHECK(status IN ('pending', 'running', 'completed', 'partial', 'failed')),
|
||||
date_range TEXT, -- JSON array
|
||||
models TEXT, -- JSON array
|
||||
created_at TEXT,
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
total_duration_seconds REAL,
|
||||
error TEXT
|
||||
);
|
||||
```
|
||||
|
||||
### job_details
|
||||
Per model-day execution details.
|
||||
|
||||
```sql
|
||||
CREATE TABLE job_details (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT,
|
||||
model_signature TEXT,
|
||||
trading_date TEXT,
|
||||
status TEXT CHECK(status IN ('pending', 'running', 'completed', 'failed')),
|
||||
start_time TEXT,
|
||||
end_time TEXT,
|
||||
duration_seconds REAL,
|
||||
error TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
```
|
||||
|
||||
### positions
|
||||
Trading position records with P&L.
|
||||
|
||||
```sql
|
||||
CREATE TABLE positions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT,
|
||||
date TEXT,
|
||||
model TEXT,
|
||||
action_id INTEGER,
|
||||
action_type TEXT,
|
||||
symbol TEXT,
|
||||
amount INTEGER,
|
||||
price REAL,
|
||||
cash REAL,
|
||||
portfolio_value REAL,
|
||||
daily_profit REAL,
|
||||
daily_return_pct REAL,
|
||||
created_at TEXT
|
||||
);
|
||||
```
|
||||
|
||||
### holdings
|
||||
Portfolio holdings breakdown per position.
|
||||
|
||||
```sql
|
||||
CREATE TABLE holdings (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
position_id INTEGER,
|
||||
symbol TEXT,
|
||||
quantity REAL,
|
||||
FOREIGN KEY (position_id) REFERENCES positions(id) ON DELETE CASCADE
|
||||
);
|
||||
```
|
||||
|
||||
### price_data
|
||||
Cached historical price data.
|
||||
|
||||
### price_coverage
|
||||
Data availability tracking per symbol.
|
||||
|
||||
### reasoning_logs
|
||||
AI decision reasoning (when enabled).
|
||||
|
||||
### tool_usage
|
||||
MCP tool usage statistics.
|
||||
|
||||
---
|
||||
|
||||
See `api/database.py` for complete schema definitions.
|
||||
71
docs/developer/development-setup.md
Normal file
71
docs/developer/development-setup.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Development Setup
|
||||
|
||||
Local development without Docker.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- pip
|
||||
- virtualenv
|
||||
|
||||
---
|
||||
|
||||
## Setup Steps
|
||||
|
||||
### 1. Clone Repository
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Xe138/AI-Trader.git
|
||||
cd AI-Trader
|
||||
```
|
||||
|
||||
### 2. Create Virtual Environment
|
||||
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # Linux/Mac
|
||||
# venv\Scripts\activate # Windows
|
||||
```
|
||||
|
||||
### 3. Install Dependencies
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 4. Configure Environment
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your API keys
|
||||
```
|
||||
|
||||
### 5. Start MCP Services
|
||||
|
||||
```bash
|
||||
cd agent_tools
|
||||
python start_mcp_services.py &
|
||||
cd ..
|
||||
```
|
||||
|
||||
### 6. Start API Server
|
||||
|
||||
```bash
|
||||
python -m uvicorn api.main:app --reload --port 8080
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
pytest tests/ -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
See [CLAUDE.md](../../CLAUDE.md) for complete project structure.
|
||||
64
docs/developer/testing.md
Normal file
64
docs/developer/testing.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Testing Guide
|
||||
|
||||
Guide for testing AI-Trader during development.
|
||||
|
||||
---
|
||||
|
||||
## Automated Testing
|
||||
|
||||
### Docker Build Validation
|
||||
|
||||
```bash
|
||||
chmod +x scripts/*.sh
|
||||
bash scripts/validate_docker_build.sh
|
||||
```
|
||||
|
||||
Validates:
|
||||
- Docker installation
|
||||
- Environment configuration
|
||||
- Image build
|
||||
- Container startup
|
||||
- Health endpoint
|
||||
|
||||
### API Endpoint Testing
|
||||
|
||||
```bash
|
||||
bash scripts/test_api_endpoints.sh
|
||||
```
|
||||
|
||||
Tests all API endpoints with real simulations.
|
||||
|
||||
---
|
||||
|
||||
## Unit Tests
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run tests
|
||||
pytest tests/ -v
|
||||
|
||||
# With coverage
|
||||
pytest tests/ -v --cov=api --cov-report=term-missing
|
||||
|
||||
# Specific test file
|
||||
pytest tests/unit/test_job_manager.py -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Tests
|
||||
|
||||
```bash
|
||||
# Run integration tests only
|
||||
pytest tests/integration/ -v
|
||||
|
||||
# Test with real API server
|
||||
docker-compose up -d
|
||||
pytest tests/integration/test_api_endpoints.py -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
For detailed testing procedures, see root [TESTING_GUIDE.md](../../TESTING_GUIDE.md).
|
||||
@@ -1,873 +0,0 @@
|
||||
# Implementation Specifications: Agent, Docker, and Windmill Integration
|
||||
|
||||
## Part 1: BaseAgent Refactoring
|
||||
|
||||
### 1.1 Current State Analysis
|
||||
|
||||
**Current `base_agent.py` structure:**
|
||||
- `run_date_range(init_date, end_date)` - Loops through all dates
|
||||
- `run_trading_session(today_date)` - Executes single day
|
||||
- `get_trading_dates()` - Calculates dates from position.jsonl
|
||||
|
||||
**What works well:**
|
||||
- `run_trading_session()` is already isolated for single-day execution ✅
|
||||
- Agent initialization is separate from execution ✅
|
||||
- Position tracking via position.jsonl ✅
|
||||
|
||||
**What needs modification:**
|
||||
- `runtime_env.json` management (move to RuntimeConfigManager)
|
||||
- `get_trading_dates()` logic (move to API layer for date range calculation)
|
||||
|
||||
### 1.2 Required Changes
|
||||
|
||||
#### Change 1: No modifications needed to core execution logic
|
||||
|
||||
**Rationale:** `BaseAgent.run_trading_session(today_date)` already supports single-day execution. The worker will call this method directly.
|
||||
|
||||
```python
|
||||
# Current code (already suitable for API mode):
|
||||
async def run_trading_session(self, today_date: str) -> None:
|
||||
"""Run single day trading session"""
|
||||
# This method is perfect as-is for worker to call
|
||||
```
|
||||
|
||||
**Action:** ✅ No changes needed
|
||||
|
||||
---
|
||||
|
||||
#### Change 2: Make runtime config path injectable
|
||||
|
||||
**Current issue:**
|
||||
```python
|
||||
# In base_agent.py, uses global config
|
||||
from tools.general_tools import get_config_value, write_config_value
|
||||
```
|
||||
|
||||
**Problem:** `get_config_value()` reads from `os.environ["RUNTIME_ENV_PATH"]`, which the worker will override per execution.
|
||||
|
||||
**Solution:** Already works! The worker sets `RUNTIME_ENV_PATH` before calling agent methods:
|
||||
|
||||
```python
|
||||
# In executor.py
|
||||
os.environ["RUNTIME_ENV_PATH"] = runtime_config_path
|
||||
await agent.run_trading_session(date)
|
||||
```
|
||||
|
||||
**Action:** ✅ No changes needed (env var override is sufficient)
|
||||
|
||||
---
|
||||
|
||||
#### Change 3: Optional - Separate agent initialization from date-range logic
|
||||
|
||||
**Current code in `main.py`:**
|
||||
```python
|
||||
# Creates agent
|
||||
agent = AgentClass(...)
|
||||
await agent.initialize()
|
||||
|
||||
# Runs all dates
|
||||
await agent.run_date_range(INIT_DATE, END_DATE)
|
||||
```
|
||||
|
||||
**For API mode:**
|
||||
```python
|
||||
# Worker creates agent
|
||||
agent = AgentClass(...)
|
||||
await agent.initialize()
|
||||
|
||||
# Worker calls run_trading_session directly for each date
|
||||
for date in date_range:
|
||||
await agent.run_trading_session(date)
|
||||
```
|
||||
|
||||
**Action:** ✅ Worker will not use `run_date_range()` method. No changes needed to agent.
|
||||
|
||||
---
|
||||
|
||||
### 1.3 Summary: BaseAgent Changes
|
||||
|
||||
**Result:** **NO CODE CHANGES REQUIRED** to `base_agent.py`!
|
||||
|
||||
The existing architecture is already compatible with the API worker pattern:
|
||||
- `run_trading_session()` is the perfect interface
|
||||
- Runtime config is managed via environment variables
|
||||
- Position tracking works as-is
|
||||
|
||||
**Only change needed:** Worker must call `agent.register_agent()` if position file doesn't exist (already handled by `get_trading_dates()` logic).
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Docker Configuration
|
||||
|
||||
### 2.1 Current Docker Setup
|
||||
|
||||
**Existing files:**
|
||||
- `Dockerfile` - Multi-stage build for batch mode
|
||||
- `docker-compose.yml` - Service definition
|
||||
- `docker-entrypoint.sh` - Launches data fetch + main.py
|
||||
|
||||
### 2.2 Modified Dockerfile
|
||||
|
||||
```dockerfile
|
||||
# Existing stages remain the same...
|
||||
FROM python:3.10-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy requirements
|
||||
COPY requirements.txt requirements-api.txt ./
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
RUN pip install --no-cache-dir -r requirements-api.txt
|
||||
|
||||
# Copy application code
|
||||
COPY . /app
|
||||
|
||||
# Create data directories
|
||||
RUN mkdir -p /app/data /app/configs
|
||||
|
||||
# Copy and set permissions for entrypoint
|
||||
COPY docker-entrypoint-api.sh /app/
|
||||
RUN chmod +x /app/docker-entrypoint-api.sh
|
||||
|
||||
# Expose API port
|
||||
EXPOSE 8080
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
|
||||
CMD curl -f http://localhost:8080/health || exit 1
|
||||
|
||||
# Run API service
|
||||
CMD ["/app/docker-entrypoint-api.sh"]
|
||||
```
|
||||
|
||||
### 2.3 New requirements-api.txt
|
||||
|
||||
```
|
||||
fastapi==0.109.0
|
||||
uvicorn[standard]==0.27.0
|
||||
pydantic==2.5.3
|
||||
pydantic-settings==2.1.0
|
||||
python-multipart==0.0.6
|
||||
```
|
||||
|
||||
### 2.4 New docker-entrypoint-api.sh
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "=================================="
|
||||
echo "AI-Trader API Service Starting"
|
||||
echo "=================================="
|
||||
|
||||
# Cleanup stale runtime configs from previous runs
|
||||
echo "Cleaning up stale runtime configs..."
|
||||
python3 -c "from api.runtime_manager import RuntimeConfigManager; RuntimeConfigManager().cleanup_all_runtime_configs()"
|
||||
|
||||
# Start MCP services in background
|
||||
echo "Starting MCP services..."
|
||||
cd /app/agent_tools
|
||||
python3 start_mcp_services.py &
|
||||
MCP_PID=$!
|
||||
|
||||
# Wait for MCP services to be ready
|
||||
echo "Waiting for MCP services to initialize..."
|
||||
sleep 10
|
||||
|
||||
# Verify MCP services are running
|
||||
echo "Verifying MCP services..."
|
||||
for port in ${MATH_HTTP_PORT:-8000} ${SEARCH_HTTP_PORT:-8001} ${TRADE_HTTP_PORT:-8002} ${GETPRICE_HTTP_PORT:-8003}; do
|
||||
if ! curl -f -s http://localhost:$port/health > /dev/null 2>&1; then
|
||||
echo "WARNING: MCP service on port $port not responding"
|
||||
else
|
||||
echo "✓ MCP service on port $port is healthy"
|
||||
fi
|
||||
done
|
||||
|
||||
# Start API server
|
||||
echo "Starting FastAPI server..."
|
||||
cd /app
|
||||
|
||||
# Use environment variables for host and port
|
||||
API_HOST=${API_HOST:-0.0.0.0}
|
||||
API_PORT=${API_PORT:-8080}
|
||||
|
||||
echo "API will be available at http://${API_HOST}:${API_PORT}"
|
||||
echo "=================================="
|
||||
|
||||
# Start uvicorn with single worker (for simplicity in MVP)
|
||||
exec uvicorn api.main:app \
|
||||
--host ${API_HOST} \
|
||||
--port ${API_PORT} \
|
||||
--workers 1 \
|
||||
--log-level info
|
||||
|
||||
# Cleanup function (called on exit)
|
||||
trap "echo 'Shutting down...'; kill $MCP_PID 2>/dev/null || true" EXIT SIGTERM SIGINT
|
||||
```
|
||||
|
||||
### 2.5 Updated docker-compose.yml
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
ai-trader:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
container_name: ai-trader-api
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
- ./configs:/app/configs
|
||||
- ./logs:/app/logs
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
- API_HOST=0.0.0.0
|
||||
- API_PORT=8080
|
||||
- RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- ai-trader-network
|
||||
|
||||
networks:
|
||||
ai-trader-network:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
### 2.6 Environment Variables Reference
|
||||
|
||||
```bash
|
||||
# .env file example for API mode
|
||||
|
||||
# OpenAI Configuration
|
||||
OPENAI_API_BASE=https://api.openai.com/v1
|
||||
OPENAI_API_KEY=sk-...
|
||||
|
||||
# API Keys
|
||||
ALPHAADVANTAGE_API_KEY=your_alpha_vantage_key
|
||||
JINA_API_KEY=your_jina_key
|
||||
|
||||
# MCP Service Ports
|
||||
MATH_HTTP_PORT=8000
|
||||
SEARCH_HTTP_PORT=8001
|
||||
TRADE_HTTP_PORT=8002
|
||||
GETPRICE_HTTP_PORT=8003
|
||||
|
||||
# API Configuration
|
||||
API_HOST=0.0.0.0
|
||||
API_PORT=8080
|
||||
|
||||
# Runtime Config
|
||||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
|
||||
# Job Configuration
|
||||
MAX_CONCURRENT_JOBS=1
|
||||
```
|
||||
|
||||
### 2.7 Docker Commands Reference
|
||||
|
||||
```bash
|
||||
# Build image
|
||||
docker-compose build
|
||||
|
||||
# Start service
|
||||
docker-compose up
|
||||
|
||||
# Start in background
|
||||
docker-compose up -d
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Check health
|
||||
docker-compose ps
|
||||
|
||||
# Stop service
|
||||
docker-compose down
|
||||
|
||||
# Restart service
|
||||
docker-compose restart
|
||||
|
||||
# Execute command in running container
|
||||
docker-compose exec ai-trader python3 -c "from api.job_manager import JobManager; jm = JobManager(); print(jm.get_current_job())"
|
||||
|
||||
# Access container shell
|
||||
docker-compose exec ai-trader bash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Windmill Integration
|
||||
|
||||
### 3.1 Windmill Overview
|
||||
|
||||
Windmill (windmill.dev) is a workflow automation platform that can:
|
||||
- Schedule cron jobs
|
||||
- Execute TypeScript/Python scripts
|
||||
- Store state between runs
|
||||
- Build UI dashboards
|
||||
|
||||
**Integration approach:**
|
||||
1. Windmill cron job triggers simulation daily
|
||||
2. Windmill polls for job completion
|
||||
3. Windmill retrieves results and stores in internal database
|
||||
4. Windmill dashboard displays performance metrics
|
||||
|
||||
### 3.2 Flow 1: Daily Simulation Trigger
|
||||
|
||||
**File:** `windmill/trigger_simulation.ts`
|
||||
|
||||
```typescript
|
||||
import { Resource } from "https://deno.land/x/windmill@v1.0.0/mod.ts";
|
||||
|
||||
export async function main(
|
||||
ai_trader_api: Resource<"ai_trader_api">
|
||||
) {
|
||||
const apiUrl = ai_trader_api.base_url; // e.g., "http://ai-trader:8080"
|
||||
|
||||
// Trigger simulation
|
||||
const response = await fetch(`${apiUrl}/simulate/trigger`, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
config_path: "configs/default_config.json"
|
||||
}),
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(`API error: ${response.status} ${response.statusText}`);
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
|
||||
// Handle different response types
|
||||
if (data.status === "current") {
|
||||
console.log("Simulation already up-to-date");
|
||||
return {
|
||||
action: "skipped",
|
||||
message: data.message,
|
||||
last_date: data.last_simulation_date
|
||||
};
|
||||
}
|
||||
|
||||
// Store job_id in Windmill state for poller to pick up
|
||||
await Deno.writeTextFile(
|
||||
`/tmp/current_job_id.txt`,
|
||||
data.job_id
|
||||
);
|
||||
|
||||
console.log(`Simulation triggered: ${data.job_id}`);
|
||||
console.log(`Date range: ${data.date_range.join(", ")}`);
|
||||
console.log(`Models: ${data.models.join(", ")}`);
|
||||
|
||||
return {
|
||||
action: "triggered",
|
||||
job_id: data.job_id,
|
||||
date_range: data.date_range,
|
||||
models: data.models,
|
||||
status: data.status
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Windmill Resource Configuration:**
|
||||
```json
|
||||
{
|
||||
"resource_type": "ai_trader_api",
|
||||
"base_url": "http://ai-trader:8080"
|
||||
}
|
||||
```
|
||||
|
||||
**Schedule:** Every day at 6:00 AM
|
||||
|
||||
---
|
||||
|
||||
### 3.3 Flow 2: Job Status Poller
|
||||
|
||||
**File:** `windmill/poll_simulation_status.ts`
|
||||
|
||||
```typescript
|
||||
import { Resource } from "https://deno.land/x/windmill@v1.0.0/mod.ts";
|
||||
|
||||
export async function main(
|
||||
ai_trader_api: Resource<"ai_trader_api">,
|
||||
job_id?: string
|
||||
) {
|
||||
const apiUrl = ai_trader_api.base_url;
|
||||
|
||||
// Get job_id from parameter or from current job file
|
||||
let jobId = job_id;
|
||||
if (!jobId) {
|
||||
try {
|
||||
jobId = await Deno.readTextFile("/tmp/current_job_id.txt");
|
||||
} catch {
|
||||
// No current job
|
||||
return {
|
||||
status: "no_job",
|
||||
message: "No active simulation job"
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Poll status
|
||||
const response = await fetch(`${apiUrl}/simulate/status/${jobId}`);
|
||||
|
||||
if (!response.ok) {
|
||||
if (response.status === 404) {
|
||||
return {
|
||||
status: "not_found",
|
||||
message: "Job not found",
|
||||
job_id: jobId
|
||||
};
|
||||
}
|
||||
throw new Error(`API error: ${response.status}`);
|
||||
}
|
||||
|
||||
const data = await response.json();
|
||||
|
||||
console.log(`Job ${jobId}: ${data.status}`);
|
||||
console.log(`Progress: ${data.progress.completed}/${data.progress.total_model_days} model-days`);
|
||||
|
||||
// If job is complete, retrieve results
|
||||
if (data.status === "completed" || data.status === "partial") {
|
||||
console.log("Job finished, retrieving results...");
|
||||
|
||||
const results = [];
|
||||
for (const date of data.date_range) {
|
||||
const resultsResponse = await fetch(
|
||||
`${apiUrl}/results?date=${date}&detail=minimal`
|
||||
);
|
||||
|
||||
if (resultsResponse.ok) {
|
||||
const dateResults = await resultsResponse.json();
|
||||
results.push(dateResults);
|
||||
}
|
||||
}
|
||||
|
||||
// Clean up job_id file
|
||||
try {
|
||||
await Deno.remove("/tmp/current_job_id.txt");
|
||||
} catch {
|
||||
// Ignore
|
||||
}
|
||||
|
||||
return {
|
||||
status: data.status,
|
||||
job_id: jobId,
|
||||
completed_at: data.completed_at,
|
||||
duration_seconds: data.total_duration_seconds,
|
||||
results: results
|
||||
};
|
||||
}
|
||||
|
||||
// Job still running
|
||||
return {
|
||||
status: data.status,
|
||||
job_id: jobId,
|
||||
progress: data.progress,
|
||||
started_at: data.created_at
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Schedule:** Every 5 minutes (will skip if no active job)
|
||||
|
||||
---
|
||||
|
||||
### 3.4 Flow 3: Results Retrieval and Storage
|
||||
|
||||
**File:** `windmill/store_simulation_results.py`
|
||||
|
||||
```python
|
||||
import wmill
|
||||
from datetime import datetime
|
||||
|
||||
def main(
|
||||
job_results: dict,
|
||||
database: str = "simulation_results"
|
||||
):
|
||||
"""
|
||||
Store simulation results in Windmill's internal database.
|
||||
|
||||
Args:
|
||||
job_results: Output from poll_simulation_status flow
|
||||
database: Database name for storage
|
||||
"""
|
||||
if job_results.get("status") not in ("completed", "partial"):
|
||||
return {"message": "Job not complete, skipping storage"}
|
||||
|
||||
# Extract results
|
||||
job_id = job_results["job_id"]
|
||||
results = job_results.get("results", [])
|
||||
|
||||
stored_count = 0
|
||||
|
||||
for date_result in results:
|
||||
date = date_result["date"]
|
||||
|
||||
for model_result in date_result["results"]:
|
||||
model = model_result["model"]
|
||||
positions = model_result["positions"]
|
||||
pnl = model_result["daily_pnl"]
|
||||
|
||||
# Store in Windmill database
|
||||
record = {
|
||||
"job_id": job_id,
|
||||
"date": date,
|
||||
"model": model,
|
||||
"cash": positions.get("CASH", 0),
|
||||
"portfolio_value": pnl["portfolio_value"],
|
||||
"daily_profit": pnl["profit"],
|
||||
"daily_return_pct": pnl["return_pct"],
|
||||
"stored_at": datetime.utcnow().isoformat()
|
||||
}
|
||||
|
||||
# Use Windmill's internal storage
|
||||
wmill.set_variable(
|
||||
path=f"{database}/{model}/{date}",
|
||||
value=record
|
||||
)
|
||||
|
||||
stored_count += 1
|
||||
|
||||
return {
|
||||
"stored_count": stored_count,
|
||||
"job_id": job_id,
|
||||
"message": f"Stored {stored_count} model-day results"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.5 Windmill Dashboard Example
|
||||
|
||||
**File:** `windmill/dashboard.json` (Windmill App Builder)
|
||||
|
||||
```json
|
||||
{
|
||||
"grid": [
|
||||
{
|
||||
"type": "table",
|
||||
"id": "performance_table",
|
||||
"configuration": {
|
||||
"title": "Model Performance Summary",
|
||||
"data_source": {
|
||||
"type": "script",
|
||||
"path": "f/simulation_results/get_latest_performance"
|
||||
},
|
||||
"columns": [
|
||||
{"field": "model", "header": "Model"},
|
||||
{"field": "latest_date", "header": "Latest Date"},
|
||||
{"field": "portfolio_value", "header": "Portfolio Value"},
|
||||
{"field": "total_return_pct", "header": "Total Return %"},
|
||||
{"field": "daily_return_pct", "header": "Daily Return %"}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "chart",
|
||||
"id": "portfolio_chart",
|
||||
"configuration": {
|
||||
"title": "Portfolio Value Over Time",
|
||||
"chart_type": "line",
|
||||
"data_source": {
|
||||
"type": "script",
|
||||
"path": "f/simulation_results/get_timeseries"
|
||||
},
|
||||
"x_axis": "date",
|
||||
"y_axis": "portfolio_value",
|
||||
"series": "model"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Supporting Script:** `windmill/get_latest_performance.py`
|
||||
|
||||
```python
|
||||
import wmill
|
||||
|
||||
def main(database: str = "simulation_results"):
|
||||
"""Get latest performance for each model"""
|
||||
|
||||
# Query Windmill variables
|
||||
all_vars = wmill.list_variables(path_prefix=f"{database}/")
|
||||
|
||||
# Group by model
|
||||
models = {}
|
||||
for var in all_vars:
|
||||
parts = var["path"].split("/")
|
||||
if len(parts) >= 3:
|
||||
model = parts[1]
|
||||
date = parts[2]
|
||||
|
||||
value = wmill.get_variable(var["path"])
|
||||
|
||||
if model not in models:
|
||||
models[model] = []
|
||||
models[model].append(value)
|
||||
|
||||
# Compute summary for each model
|
||||
summary = []
|
||||
for model, records in models.items():
|
||||
# Sort by date
|
||||
records.sort(key=lambda x: x["date"], reverse=True)
|
||||
latest = records[0]
|
||||
|
||||
# Calculate total return
|
||||
initial_value = 10000 # Initial cash
|
||||
total_return_pct = ((latest["portfolio_value"] - initial_value) / initial_value) * 100
|
||||
|
||||
summary.append({
|
||||
"model": model,
|
||||
"latest_date": latest["date"],
|
||||
"portfolio_value": latest["portfolio_value"],
|
||||
"total_return_pct": round(total_return_pct, 2),
|
||||
"daily_return_pct": latest["daily_return_pct"]
|
||||
})
|
||||
|
||||
return summary
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.6 Windmill Workflow Orchestration
|
||||
|
||||
**Main Workflow:** `windmill/daily_simulation_workflow.yaml`
|
||||
|
||||
```yaml
|
||||
name: Daily AI Trader Simulation
|
||||
description: Trigger simulation, poll status, and store results
|
||||
|
||||
triggers:
|
||||
- type: cron
|
||||
schedule: "0 6 * * *" # Every day at 6 AM
|
||||
|
||||
steps:
|
||||
- id: trigger
|
||||
name: Trigger Simulation
|
||||
script: f/ai_trader/trigger_simulation
|
||||
outputs:
|
||||
- job_id
|
||||
- action
|
||||
|
||||
- id: wait
|
||||
name: Wait for Job Start
|
||||
type: sleep
|
||||
duration: 10s
|
||||
|
||||
- id: poll_loop
|
||||
name: Poll Until Complete
|
||||
type: loop
|
||||
max_iterations: 60 # Poll for up to 5 hours (60 × 5min)
|
||||
interval: 5m
|
||||
script: f/ai_trader/poll_simulation_status
|
||||
inputs:
|
||||
job_id: ${{ steps.trigger.outputs.job_id }}
|
||||
break_condition: |
|
||||
${{ steps.poll_loop.outputs.status in ['completed', 'partial', 'failed'] }}
|
||||
|
||||
- id: store_results
|
||||
name: Store Results in Database
|
||||
script: f/ai_trader/store_simulation_results
|
||||
inputs:
|
||||
job_results: ${{ steps.poll_loop.outputs }}
|
||||
condition: |
|
||||
${{ steps.poll_loop.outputs.status in ['completed', 'partial'] }}
|
||||
|
||||
- id: notify
|
||||
name: Send Notification
|
||||
type: email
|
||||
to: admin@example.com
|
||||
subject: "AI Trader Simulation Complete"
|
||||
body: |
|
||||
Simulation completed for ${{ steps.poll_loop.outputs.job_id }}
|
||||
Status: ${{ steps.poll_loop.outputs.status }}
|
||||
Duration: ${{ steps.poll_loop.outputs.duration_seconds }}s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.7 Testing Windmill Integration Locally
|
||||
|
||||
**1. Start AI-Trader API:**
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
**2. Test trigger endpoint:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"config_path": "configs/default_config.json"}'
|
||||
```
|
||||
|
||||
**3. Test status polling:**
|
||||
```bash
|
||||
JOB_ID="<job_id_from_step_2>"
|
||||
curl http://localhost:8080/simulate/status/$JOB_ID
|
||||
```
|
||||
|
||||
**4. Test results retrieval:**
|
||||
```bash
|
||||
curl "http://localhost:8080/results?date=2025-01-16&model=gpt-5&detail=minimal"
|
||||
```
|
||||
|
||||
**5. Deploy to Windmill:**
|
||||
```bash
|
||||
# Install Windmill CLI
|
||||
npm install -g windmill-cli
|
||||
|
||||
# Login to your Windmill instance
|
||||
wmill login https://your-windmill-instance.com
|
||||
|
||||
# Deploy scripts
|
||||
wmill script push windmill/trigger_simulation.ts
|
||||
wmill script push windmill/poll_simulation_status.ts
|
||||
wmill script push windmill/store_simulation_results.py
|
||||
|
||||
# Deploy workflow
|
||||
wmill flow push windmill/daily_simulation_workflow.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Complete File Structure
|
||||
|
||||
After implementation, the project structure will be:
|
||||
|
||||
```
|
||||
AI-Trader/
|
||||
├── api/
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # FastAPI application
|
||||
│ ├── models.py # Pydantic request/response models
|
||||
│ ├── job_manager.py # Job lifecycle management
|
||||
│ ├── database.py # SQLite utilities
|
||||
│ ├── worker.py # Background simulation worker
|
||||
│ ├── executor.py # Single model-day execution
|
||||
│ └── runtime_manager.py # Runtime config isolation
|
||||
│
|
||||
├── docs/
|
||||
│ ├── api-specification.md
|
||||
│ ├── job-manager-specification.md
|
||||
│ ├── worker-specification.md
|
||||
│ └── implementation-specifications.md
|
||||
│
|
||||
├── windmill/
|
||||
│ ├── trigger_simulation.ts
|
||||
│ ├── poll_simulation_status.ts
|
||||
│ ├── store_simulation_results.py
|
||||
│ ├── get_latest_performance.py
|
||||
│ ├── daily_simulation_workflow.yaml
|
||||
│ └── dashboard.json
|
||||
│
|
||||
├── agent/
|
||||
│ └── base_agent/
|
||||
│ └── base_agent.py # NO CHANGES NEEDED
|
||||
│
|
||||
├── agent_tools/
|
||||
│ └── ... (existing MCP tools)
|
||||
│
|
||||
├── data/
|
||||
│ ├── jobs.db # SQLite database (created automatically)
|
||||
│ ├── runtime_env*.json # Runtime configs (temporary)
|
||||
│ ├── agent_data/ # Existing position/log data
|
||||
│ └── merged.jsonl # Existing price data
|
||||
│
|
||||
├── Dockerfile # Updated for API mode
|
||||
├── docker-compose.yml # Updated service definition
|
||||
├── docker-entrypoint-api.sh # New API entrypoint
|
||||
├── requirements-api.txt # FastAPI dependencies
|
||||
├── .env # Environment configuration
|
||||
└── main.py # Existing (used by worker)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 5: Implementation Checklist
|
||||
|
||||
### Phase 1: API Foundation (Days 1-2)
|
||||
- [ ] Create `api/` directory structure
|
||||
- [ ] Implement `api/models.py` with Pydantic models
|
||||
- [ ] Implement `api/database.py` with SQLite utilities
|
||||
- [ ] Implement `api/job_manager.py` with job CRUD operations
|
||||
- [ ] Write unit tests for job_manager
|
||||
- [ ] Test database operations manually
|
||||
|
||||
### Phase 2: Worker & Executor (Days 3-4)
|
||||
- [ ] Implement `api/runtime_manager.py`
|
||||
- [ ] Implement `api/executor.py` for single model-day execution
|
||||
- [ ] Implement `api/worker.py` for job orchestration
|
||||
- [ ] Test worker with mock agent
|
||||
- [ ] Test runtime config isolation
|
||||
|
||||
### Phase 3: FastAPI Endpoints (Days 5-6)
|
||||
- [ ] Implement `api/main.py` with all endpoints
|
||||
- [ ] Implement `/simulate/trigger` with background tasks
|
||||
- [ ] Implement `/simulate/status/{job_id}`
|
||||
- [ ] Implement `/simulate/current`
|
||||
- [ ] Implement `/results` with detail levels
|
||||
- [ ] Implement `/health` with MCP checks
|
||||
- [ ] Test all endpoints with Postman/curl
|
||||
|
||||
### Phase 4: Docker Integration (Day 7)
|
||||
- [ ] Update `Dockerfile`
|
||||
- [ ] Create `docker-entrypoint-api.sh`
|
||||
- [ ] Create `requirements-api.txt`
|
||||
- [ ] Update `docker-compose.yml`
|
||||
- [ ] Test Docker build
|
||||
- [ ] Test container startup and health checks
|
||||
- [ ] Test end-to-end simulation via API in Docker
|
||||
|
||||
### Phase 5: Windmill Integration (Days 8-9)
|
||||
- [ ] Create Windmill scripts (trigger, poll, store)
|
||||
- [ ] Test scripts locally against Docker API
|
||||
- [ ] Deploy scripts to Windmill instance
|
||||
- [ ] Create Windmill workflow
|
||||
- [ ] Test workflow end-to-end
|
||||
- [ ] Create Windmill dashboard
|
||||
- [ ] Document Windmill setup process
|
||||
|
||||
### Phase 6: Testing & Documentation (Day 10)
|
||||
- [ ] Integration tests for complete workflow
|
||||
- [ ] Load testing (multiple concurrent requests)
|
||||
- [ ] Error scenario testing (MCP down, API timeout)
|
||||
- [ ] Update README.md with API usage
|
||||
- [ ] Create API documentation (Swagger/OpenAPI)
|
||||
- [ ] Create deployment guide
|
||||
- [ ] Create troubleshooting guide
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
This comprehensive specification covers:
|
||||
|
||||
1. **BaseAgent Refactoring:** Minimal changes needed (existing code compatible)
|
||||
2. **Docker Configuration:** API service mode with health checks and proper entrypoint
|
||||
3. **Windmill Integration:** Complete workflow automation with TypeScript/Python scripts
|
||||
4. **File Structure:** Clear organization of new API components
|
||||
5. **Implementation Checklist:** Step-by-step plan for 10-day implementation
|
||||
|
||||
**Total estimated implementation time:** 10 working days for MVP
|
||||
|
||||
**Next Step:** Review all specifications (api-specification.md, job-manager-specification.md, worker-specification.md, and this document) and approve before beginning implementation.
|
||||
@@ -1,963 +0,0 @@
|
||||
# Job Manager & Database Specification
|
||||
|
||||
## 1. Overview
|
||||
|
||||
The Job Manager is responsible for:
|
||||
1. **Job lifecycle management** - Creating, tracking, updating job status
|
||||
2. **Database operations** - SQLite CRUD operations for jobs and job_details
|
||||
3. **Concurrency control** - Ensuring only one simulation runs at a time
|
||||
4. **State persistence** - Maintaining job state across API restarts
|
||||
|
||||
---
|
||||
|
||||
## 2. Database Schema
|
||||
|
||||
### 2.1 SQLite Database Location
|
||||
|
||||
```
|
||||
data/jobs.db
|
||||
```
|
||||
|
||||
**Rationale:** Co-located with simulation data for easy volume mounting
|
||||
|
||||
### 2.2 Table: jobs
|
||||
|
||||
**Purpose:** Track high-level job metadata and status
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS jobs (
|
||||
job_id TEXT PRIMARY KEY,
|
||||
config_path TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'partial', 'failed')),
|
||||
date_range TEXT NOT NULL, -- JSON array: ["2025-01-16", "2025-01-17"]
|
||||
models TEXT NOT NULL, -- JSON array: ["claude-3.7-sonnet", "gpt-5"]
|
||||
created_at TEXT NOT NULL, -- ISO 8601: "2025-01-20T14:30:00Z"
|
||||
started_at TEXT, -- When first model-day started
|
||||
completed_at TEXT, -- When last model-day finished
|
||||
total_duration_seconds REAL,
|
||||
error TEXT -- Top-level error message if job failed
|
||||
);
|
||||
|
||||
-- Indexes for performance
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_status ON jobs(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_created_at ON jobs(created_at DESC);
|
||||
```
|
||||
|
||||
**Field Details:**
|
||||
- `job_id`: UUID v4 (e.g., `550e8400-e29b-41d4-a716-446655440000`)
|
||||
- `status`: Current job state
|
||||
- `pending`: Job created, not started yet
|
||||
- `running`: At least one model-day is executing
|
||||
- `completed`: All model-days succeeded
|
||||
- `partial`: Some model-days succeeded, some failed
|
||||
- `failed`: All model-days failed (rare edge case)
|
||||
- `date_range`: JSON string for easy querying
|
||||
- `models`: JSON string of enabled model signatures
|
||||
|
||||
### 2.3 Table: job_details
|
||||
|
||||
**Purpose:** Track individual model-day execution status
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS job_details (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL, -- "2025-01-16"
|
||||
model TEXT NOT NULL, -- "gpt-5"
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed')),
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
duration_seconds REAL,
|
||||
error TEXT, -- Error message if this model-day failed
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- Indexes
|
||||
CREATE INDEX IF NOT EXISTS idx_job_details_job_id ON job_details(job_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_job_details_status ON job_details(status);
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS idx_job_details_unique ON job_details(job_id, date, model);
|
||||
```
|
||||
|
||||
**Field Details:**
|
||||
- Each row represents one model-day (e.g., `gpt-5` on `2025-01-16`)
|
||||
- `UNIQUE INDEX` prevents duplicate execution entries
|
||||
- `ON DELETE CASCADE` ensures orphaned records are cleaned up
|
||||
|
||||
### 2.4 Example Data
|
||||
|
||||
**jobs table:**
|
||||
```
|
||||
job_id | config_path | status | date_range | models | created_at | started_at | completed_at | total_duration_seconds
|
||||
--------------------------------------|--------------------------|-----------|-----------------------------------|---------------------------------|----------------------|----------------------|----------------------|----------------------
|
||||
550e8400-e29b-41d4-a716-446655440000 | configs/default_config.json | completed | ["2025-01-16","2025-01-17"] | ["gpt-5","claude-3.7-sonnet"] | 2025-01-20T14:25:00Z | 2025-01-20T14:25:10Z | 2025-01-20T14:29:45Z | 275.3
|
||||
```
|
||||
|
||||
**job_details table:**
|
||||
```
|
||||
id | job_id | date | model | status | started_at | completed_at | duration_seconds | error
|
||||
---|--------------------------------------|------------|--------------------|-----------|----------------------|----------------------|------------------|------
|
||||
1 | 550e8400-e29b-41d4-a716-446655440000 | 2025-01-16 | gpt-5 | completed | 2025-01-20T14:25:10Z | 2025-01-20T14:25:48Z | 38.2 | NULL
|
||||
2 | 550e8400-e29b-41d4-a716-446655440000 | 2025-01-16 | claude-3.7-sonnet | completed | 2025-01-20T14:25:10Z | 2025-01-20T14:25:55Z | 45.1 | NULL
|
||||
3 | 550e8400-e29b-41d4-a716-446655440000 | 2025-01-17 | gpt-5 | completed | 2025-01-20T14:25:56Z | 2025-01-20T14:26:36Z | 40.0 | NULL
|
||||
4 | 550e8400-e29b-41d4-a716-446655440000 | 2025-01-17 | claude-3.7-sonnet | completed | 2025-01-20T14:25:56Z | 2025-01-20T14:26:42Z | 46.5 | NULL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Job Manager Class
|
||||
|
||||
### 3.1 File Structure
|
||||
|
||||
```
|
||||
api/
|
||||
├── job_manager.py # Core JobManager class
|
||||
├── database.py # SQLite connection and utilities
|
||||
└── models.py # Pydantic models
|
||||
```
|
||||
|
||||
### 3.2 JobManager Interface
|
||||
|
||||
```python
|
||||
# api/job_manager.py
|
||||
|
||||
from datetime import datetime
|
||||
from typing import Optional, List, Dict, Tuple
|
||||
import uuid
|
||||
import json
|
||||
from api.database import get_db_connection
|
||||
|
||||
class JobManager:
|
||||
"""Manages simulation job lifecycle and database operations"""
|
||||
|
||||
def __init__(self, db_path: str = "data/jobs.db"):
|
||||
self.db_path = db_path
|
||||
self._initialize_database()
|
||||
|
||||
def _initialize_database(self) -> None:
|
||||
"""Create tables if they don't exist"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
# Execute CREATE TABLE statements from section 2.2 and 2.3
|
||||
conn.close()
|
||||
|
||||
# ========== Job Creation ==========
|
||||
|
||||
def create_job(
|
||||
self,
|
||||
config_path: str,
|
||||
date_range: List[str],
|
||||
models: List[str]
|
||||
) -> str:
|
||||
"""
|
||||
Create a new simulation job.
|
||||
|
||||
Args:
|
||||
config_path: Path to config file
|
||||
date_range: List of trading dates to simulate
|
||||
models: List of model signatures to run
|
||||
|
||||
Returns:
|
||||
job_id: UUID of created job
|
||||
|
||||
Raises:
|
||||
ValueError: If another job is already running
|
||||
"""
|
||||
# 1. Check if any jobs are currently running
|
||||
if not self.can_start_new_job():
|
||||
raise ValueError("Another simulation job is already running")
|
||||
|
||||
# 2. Generate job ID
|
||||
job_id = str(uuid.uuid4())
|
||||
|
||||
# 3. Create job record
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
INSERT INTO jobs (
|
||||
job_id, config_path, status, date_range, models, created_at
|
||||
) VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (
|
||||
job_id,
|
||||
config_path,
|
||||
"pending",
|
||||
json.dumps(date_range),
|
||||
json.dumps(models),
|
||||
datetime.utcnow().isoformat() + "Z"
|
||||
))
|
||||
|
||||
# 4. Create job_details records for each model-day
|
||||
for date in date_range:
|
||||
for model in models:
|
||||
cursor.execute("""
|
||||
INSERT INTO job_details (
|
||||
job_id, date, model, status
|
||||
) VALUES (?, ?, ?, ?)
|
||||
""", (job_id, date, model, "pending"))
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return job_id
|
||||
|
||||
# ========== Job Retrieval ==========
|
||||
|
||||
def get_job(self, job_id: str) -> Optional[Dict]:
|
||||
"""
|
||||
Get job metadata by ID.
|
||||
|
||||
Returns:
|
||||
Job dict with keys: job_id, config_path, status, date_range (list),
|
||||
models (list), created_at, started_at, completed_at, total_duration_seconds
|
||||
|
||||
Returns None if job not found.
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT * FROM jobs WHERE job_id = ?", (job_id,))
|
||||
row = cursor.fetchone()
|
||||
conn.close()
|
||||
|
||||
if row is None:
|
||||
return None
|
||||
|
||||
return {
|
||||
"job_id": row[0],
|
||||
"config_path": row[1],
|
||||
"status": row[2],
|
||||
"date_range": json.loads(row[3]),
|
||||
"models": json.loads(row[4]),
|
||||
"created_at": row[5],
|
||||
"started_at": row[6],
|
||||
"completed_at": row[7],
|
||||
"total_duration_seconds": row[8],
|
||||
"error": row[9]
|
||||
}
|
||||
|
||||
def get_current_job(self) -> Optional[Dict]:
|
||||
"""Get most recent job (for /simulate/current endpoint)"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
SELECT * FROM jobs
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 1
|
||||
""")
|
||||
row = cursor.fetchone()
|
||||
conn.close()
|
||||
|
||||
if row is None:
|
||||
return None
|
||||
|
||||
return self._row_to_job_dict(row)
|
||||
|
||||
def get_running_jobs(self) -> List[Dict]:
|
||||
"""Get all running or pending jobs"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
SELECT * FROM jobs
|
||||
WHERE status IN ('pending', 'running')
|
||||
ORDER BY created_at DESC
|
||||
""")
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
return [self._row_to_job_dict(row) for row in rows]
|
||||
|
||||
# ========== Job Status Updates ==========
|
||||
|
||||
def update_job_status(
|
||||
self,
|
||||
job_id: str,
|
||||
status: str,
|
||||
error: Optional[str] = None
|
||||
) -> None:
|
||||
"""Update job status (pending → running → completed/partial/failed)"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
updates = {"status": status}
|
||||
|
||||
if status == "running" and self.get_job(job_id)["status"] == "pending":
|
||||
updates["started_at"] = datetime.utcnow().isoformat() + "Z"
|
||||
|
||||
if status in ("completed", "partial", "failed"):
|
||||
updates["completed_at"] = datetime.utcnow().isoformat() + "Z"
|
||||
# Calculate total duration
|
||||
job = self.get_job(job_id)
|
||||
if job["started_at"]:
|
||||
started = datetime.fromisoformat(job["started_at"].replace("Z", ""))
|
||||
completed = datetime.utcnow()
|
||||
updates["total_duration_seconds"] = (completed - started).total_seconds()
|
||||
|
||||
if error:
|
||||
updates["error"] = error
|
||||
|
||||
# Build dynamic UPDATE query
|
||||
set_clause = ", ".join([f"{k} = ?" for k in updates.keys()])
|
||||
values = list(updates.values()) + [job_id]
|
||||
|
||||
cursor.execute(f"""
|
||||
UPDATE jobs
|
||||
SET {set_clause}
|
||||
WHERE job_id = ?
|
||||
""", values)
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
def update_job_detail_status(
|
||||
self,
|
||||
job_id: str,
|
||||
date: str,
|
||||
model: str,
|
||||
status: str,
|
||||
error: Optional[str] = None
|
||||
) -> None:
|
||||
"""Update individual model-day status"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
updates = {"status": status}
|
||||
|
||||
# Get current detail status to determine if this is a status transition
|
||||
cursor.execute("""
|
||||
SELECT status, started_at FROM job_details
|
||||
WHERE job_id = ? AND date = ? AND model = ?
|
||||
""", (job_id, date, model))
|
||||
row = cursor.fetchone()
|
||||
|
||||
if row:
|
||||
current_status = row[0]
|
||||
|
||||
if status == "running" and current_status == "pending":
|
||||
updates["started_at"] = datetime.utcnow().isoformat() + "Z"
|
||||
|
||||
if status in ("completed", "failed"):
|
||||
updates["completed_at"] = datetime.utcnow().isoformat() + "Z"
|
||||
# Calculate duration if started_at exists
|
||||
if row[1]: # started_at
|
||||
started = datetime.fromisoformat(row[1].replace("Z", ""))
|
||||
completed = datetime.utcnow()
|
||||
updates["duration_seconds"] = (completed - started).total_seconds()
|
||||
|
||||
if error:
|
||||
updates["error"] = error
|
||||
|
||||
# Build UPDATE query
|
||||
set_clause = ", ".join([f"{k} = ?" for k in updates.keys()])
|
||||
values = list(updates.values()) + [job_id, date, model]
|
||||
|
||||
cursor.execute(f"""
|
||||
UPDATE job_details
|
||||
SET {set_clause}
|
||||
WHERE job_id = ? AND date = ? AND model = ?
|
||||
""", values)
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
# After updating detail, check if overall job status needs update
|
||||
self._update_job_status_from_details(job_id)
|
||||
|
||||
def _update_job_status_from_details(self, job_id: str) -> None:
|
||||
"""
|
||||
Recalculate job status based on job_details statuses.
|
||||
|
||||
Logic:
|
||||
- If any detail is 'running' → job is 'running'
|
||||
- If all details are 'completed' → job is 'completed'
|
||||
- If some details are 'completed' and some 'failed' → job is 'partial'
|
||||
- If all details are 'failed' → job is 'failed'
|
||||
- If all details are 'pending' → job is 'pending'
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
SELECT status, COUNT(*)
|
||||
FROM job_details
|
||||
WHERE job_id = ?
|
||||
GROUP BY status
|
||||
""", (job_id,))
|
||||
|
||||
status_counts = {row[0]: row[1] for row in cursor.fetchall()}
|
||||
conn.close()
|
||||
|
||||
# Determine overall job status
|
||||
if status_counts.get("running", 0) > 0:
|
||||
new_status = "running"
|
||||
elif status_counts.get("pending", 0) > 0:
|
||||
# Some details still pending, job is either pending or running
|
||||
current_job = self.get_job(job_id)
|
||||
new_status = current_job["status"] # Keep current status
|
||||
elif status_counts.get("failed", 0) > 0 and status_counts.get("completed", 0) > 0:
|
||||
new_status = "partial"
|
||||
elif status_counts.get("failed", 0) > 0:
|
||||
new_status = "failed"
|
||||
else:
|
||||
new_status = "completed"
|
||||
|
||||
self.update_job_status(job_id, new_status)
|
||||
|
||||
# ========== Job Progress ==========
|
||||
|
||||
def get_job_progress(self, job_id: str) -> Dict:
|
||||
"""
|
||||
Get detailed progress for a job.
|
||||
|
||||
Returns:
|
||||
{
|
||||
"total_model_days": int,
|
||||
"completed": int,
|
||||
"failed": int,
|
||||
"current": {"date": str, "model": str} | None,
|
||||
"details": [
|
||||
{"date": str, "model": str, "status": str, "duration_seconds": float | None, "error": str | None},
|
||||
...
|
||||
]
|
||||
}
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Get all details for this job
|
||||
cursor.execute("""
|
||||
SELECT date, model, status, started_at, completed_at, duration_seconds, error
|
||||
FROM job_details
|
||||
WHERE job_id = ?
|
||||
ORDER BY date ASC, model ASC
|
||||
""", (job_id,))
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
if not rows:
|
||||
return {
|
||||
"total_model_days": 0,
|
||||
"completed": 0,
|
||||
"failed": 0,
|
||||
"current": None,
|
||||
"details": []
|
||||
}
|
||||
|
||||
total = len(rows)
|
||||
completed = sum(1 for row in rows if row[2] == "completed")
|
||||
failed = sum(1 for row in rows if row[2] == "failed")
|
||||
|
||||
# Find currently running model-day
|
||||
current = None
|
||||
for row in rows:
|
||||
if row[2] == "running":
|
||||
current = {"date": row[0], "model": row[1]}
|
||||
break
|
||||
|
||||
# Build details list
|
||||
details = []
|
||||
for row in rows:
|
||||
details.append({
|
||||
"date": row[0],
|
||||
"model": row[1],
|
||||
"status": row[2],
|
||||
"started_at": row[3],
|
||||
"completed_at": row[4],
|
||||
"duration_seconds": row[5],
|
||||
"error": row[6]
|
||||
})
|
||||
|
||||
return {
|
||||
"total_model_days": total,
|
||||
"completed": completed,
|
||||
"failed": failed,
|
||||
"current": current,
|
||||
"details": details
|
||||
}
|
||||
|
||||
# ========== Concurrency Control ==========
|
||||
|
||||
def can_start_new_job(self) -> bool:
|
||||
"""Check if a new job can be started (max 1 concurrent job)"""
|
||||
running_jobs = self.get_running_jobs()
|
||||
return len(running_jobs) == 0
|
||||
|
||||
def find_job_by_date_range(self, date_range: List[str]) -> Optional[Dict]:
|
||||
"""Find job with exact matching date range (for idempotency check)"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Query recent jobs (last 24 hours)
|
||||
cursor.execute("""
|
||||
SELECT * FROM jobs
|
||||
WHERE created_at > datetime('now', '-1 day')
|
||||
ORDER BY created_at DESC
|
||||
""")
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
# Check each job's date_range
|
||||
target_range = set(date_range)
|
||||
for row in rows:
|
||||
job_range = set(json.loads(row[3])) # date_range column
|
||||
if job_range == target_range:
|
||||
return self._row_to_job_dict(row)
|
||||
|
||||
return None
|
||||
|
||||
# ========== Utility Methods ==========
|
||||
|
||||
def _row_to_job_dict(self, row: tuple) -> Dict:
|
||||
"""Convert DB row to job dictionary"""
|
||||
return {
|
||||
"job_id": row[0],
|
||||
"config_path": row[1],
|
||||
"status": row[2],
|
||||
"date_range": json.loads(row[3]),
|
||||
"models": json.loads(row[4]),
|
||||
"created_at": row[5],
|
||||
"started_at": row[6],
|
||||
"completed_at": row[7],
|
||||
"total_duration_seconds": row[8],
|
||||
"error": row[9]
|
||||
}
|
||||
|
||||
def cleanup_old_jobs(self, days: int = 30) -> int:
|
||||
"""
|
||||
Delete jobs older than specified days (cleanup maintenance).
|
||||
|
||||
Returns:
|
||||
Number of jobs deleted
|
||||
"""
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
DELETE FROM jobs
|
||||
WHERE created_at < datetime('now', '-' || ? || ' days')
|
||||
""", (days,))
|
||||
|
||||
deleted_count = cursor.rowcount
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return deleted_count
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Database Utility Module
|
||||
|
||||
```python
|
||||
# api/database.py
|
||||
|
||||
import sqlite3
|
||||
from typing import Optional
|
||||
import os
|
||||
|
||||
def get_db_connection(db_path: str = "data/jobs.db") -> sqlite3.Connection:
|
||||
"""
|
||||
Get SQLite database connection.
|
||||
|
||||
Ensures:
|
||||
- Database directory exists
|
||||
- Foreign keys are enabled
|
||||
- Row factory returns dict-like objects
|
||||
"""
|
||||
# Ensure data directory exists
|
||||
os.makedirs(os.path.dirname(db_path), exist_ok=True)
|
||||
|
||||
conn = sqlite3.connect(db_path, check_same_thread=False)
|
||||
conn.execute("PRAGMA foreign_keys = ON") # Enable FK constraints
|
||||
conn.row_factory = sqlite3.Row # Return rows as dict-like objects
|
||||
|
||||
return conn
|
||||
|
||||
def initialize_database(db_path: str = "data/jobs.db") -> None:
|
||||
"""Create database tables if they don't exist"""
|
||||
conn = get_db_connection(db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create jobs table
|
||||
cursor.execute("""
|
||||
CREATE TABLE IF NOT EXISTS jobs (
|
||||
job_id TEXT PRIMARY KEY,
|
||||
config_path TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'partial', 'failed')),
|
||||
date_range TEXT NOT NULL,
|
||||
models TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL,
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
total_duration_seconds REAL,
|
||||
error TEXT
|
||||
)
|
||||
""")
|
||||
|
||||
# Create indexes
|
||||
cursor.execute("""
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_status ON jobs(status)
|
||||
""")
|
||||
cursor.execute("""
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_created_at ON jobs(created_at DESC)
|
||||
""")
|
||||
|
||||
# Create job_details table
|
||||
cursor.execute("""
|
||||
CREATE TABLE IF NOT EXISTS job_details (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
job_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed')),
|
||||
started_at TEXT,
|
||||
completed_at TEXT,
|
||||
duration_seconds REAL,
|
||||
error TEXT,
|
||||
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
|
||||
)
|
||||
""")
|
||||
|
||||
# Create indexes
|
||||
cursor.execute("""
|
||||
CREATE INDEX IF NOT EXISTS idx_job_details_job_id ON job_details(job_id)
|
||||
""")
|
||||
cursor.execute("""
|
||||
CREATE INDEX IF NOT EXISTS idx_job_details_status ON job_details(status)
|
||||
""")
|
||||
cursor.execute("""
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS idx_job_details_unique
|
||||
ON job_details(job_id, date, model)
|
||||
""")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. State Transitions
|
||||
|
||||
### 5.1 Job Status State Machine
|
||||
|
||||
```
|
||||
pending ──────────────> running ──────────> completed
|
||||
│ │
|
||||
│ │
|
||||
└────────────> partial
|
||||
│ │
|
||||
└────────────> failed
|
||||
```
|
||||
|
||||
**Transition Logic:**
|
||||
- `pending → running`: When first model-day starts executing
|
||||
- `running → completed`: When all model-days complete successfully
|
||||
- `running → partial`: When some model-days succeed, some fail
|
||||
- `running → failed`: When all model-days fail (rare)
|
||||
|
||||
### 5.2 Job Detail Status State Machine
|
||||
|
||||
```
|
||||
pending ──────> running ──────> completed
|
||||
│
|
||||
└───────────> failed
|
||||
```
|
||||
|
||||
**Transition Logic:**
|
||||
- `pending → running`: When worker starts executing that model-day
|
||||
- `running → completed`: When `agent.run_trading_session()` succeeds
|
||||
- `running → failed`: When `agent.run_trading_session()` raises exception after retries
|
||||
|
||||
---
|
||||
|
||||
## 6. Concurrency Scenarios
|
||||
|
||||
### 6.1 Scenario: Duplicate Trigger Requests
|
||||
|
||||
**Timeline:**
|
||||
1. Request A: POST /simulate/trigger → Job created with date_range=[2025-01-16, 2025-01-17]
|
||||
2. Request B (5 seconds later): POST /simulate/trigger → Same date range
|
||||
|
||||
**Expected Behavior:**
|
||||
- Request A: Returns `{"job_id": "abc123", "status": "accepted"}`
|
||||
- Request B: `find_job_by_date_range()` finds Job abc123
|
||||
- Request B: Returns `{"job_id": "abc123", "status": "running", ...}` (same job)
|
||||
|
||||
**Code:**
|
||||
```python
|
||||
# In /simulate/trigger endpoint
|
||||
existing_job = job_manager.find_job_by_date_range(date_range)
|
||||
if existing_job:
|
||||
# Return existing job instead of creating duplicate
|
||||
return existing_job
|
||||
```
|
||||
|
||||
### 6.2 Scenario: Concurrent Jobs with Different Dates
|
||||
|
||||
**Timeline:**
|
||||
1. Job A running: date_range=[2025-01-01 to 2025-01-10] (started 5 min ago)
|
||||
2. Request: POST /simulate/trigger with date_range=[2025-01-11 to 2025-01-15]
|
||||
|
||||
**Expected Behavior:**
|
||||
- `can_start_new_job()` returns False (Job A is still running)
|
||||
- Request returns 409 Conflict with details of Job A
|
||||
|
||||
### 6.3 Scenario: Job Cleanup on API Restart
|
||||
|
||||
**Problem:** API crashes while job is running. On restart, job stuck in "running" state.
|
||||
|
||||
**Solution:** On API startup, detect stale jobs and mark as failed:
|
||||
```python
|
||||
# In api/main.py startup event
|
||||
@app.on_event("startup")
|
||||
async def startup_event():
|
||||
job_manager = JobManager()
|
||||
|
||||
# Find jobs stuck in 'running' or 'pending' state
|
||||
stale_jobs = job_manager.get_running_jobs()
|
||||
|
||||
for job in stale_jobs:
|
||||
# Mark as failed with explanation
|
||||
job_manager.update_job_status(
|
||||
job["job_id"],
|
||||
"failed",
|
||||
error="API restarted while job was running"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Testing Strategy
|
||||
|
||||
### 7.1 Unit Tests
|
||||
|
||||
```python
|
||||
# tests/test_job_manager.py
|
||||
|
||||
import pytest
|
||||
from api.job_manager import JobManager
|
||||
import tempfile
|
||||
import os
|
||||
|
||||
@pytest.fixture
|
||||
def job_manager():
|
||||
# Use temporary database for tests
|
||||
temp_db = tempfile.NamedTemporaryFile(delete=False, suffix=".db")
|
||||
temp_db.close()
|
||||
|
||||
jm = JobManager(db_path=temp_db.name)
|
||||
yield jm
|
||||
|
||||
# Cleanup
|
||||
os.unlink(temp_db.name)
|
||||
|
||||
def test_create_job(job_manager):
|
||||
job_id = job_manager.create_job(
|
||||
config_path="configs/test.json",
|
||||
date_range=["2025-01-16", "2025-01-17"],
|
||||
models=["gpt-5", "claude-3.7-sonnet"]
|
||||
)
|
||||
|
||||
assert job_id is not None
|
||||
job = job_manager.get_job(job_id)
|
||||
assert job["status"] == "pending"
|
||||
assert job["date_range"] == ["2025-01-16", "2025-01-17"]
|
||||
|
||||
# Check job_details created
|
||||
progress = job_manager.get_job_progress(job_id)
|
||||
assert progress["total_model_days"] == 4 # 2 dates × 2 models
|
||||
|
||||
def test_concurrent_job_blocked(job_manager):
|
||||
# Create first job
|
||||
job1_id = job_manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
|
||||
|
||||
# Try to create second job while first is pending
|
||||
with pytest.raises(ValueError, match="Another simulation job is already running"):
|
||||
job_manager.create_job("configs/test.json", ["2025-01-17"], ["gpt-5"])
|
||||
|
||||
# Mark first job as completed
|
||||
job_manager.update_job_status(job1_id, "completed")
|
||||
|
||||
# Now second job should be allowed
|
||||
job2_id = job_manager.create_job("configs/test.json", ["2025-01-17"], ["gpt-5"])
|
||||
assert job2_id is not None
|
||||
|
||||
def test_job_status_transitions(job_manager):
|
||||
job_id = job_manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
|
||||
|
||||
# Update job detail to running
|
||||
job_manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
|
||||
|
||||
# Job should now be 'running'
|
||||
job = job_manager.get_job(job_id)
|
||||
assert job["status"] == "running"
|
||||
assert job["started_at"] is not None
|
||||
|
||||
# Complete the detail
|
||||
job_manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "completed")
|
||||
|
||||
# Job should now be 'completed'
|
||||
job = job_manager.get_job(job_id)
|
||||
assert job["status"] == "completed"
|
||||
assert job["completed_at"] is not None
|
||||
|
||||
def test_partial_job_status(job_manager):
|
||||
job_id = job_manager.create_job(
|
||||
"configs/test.json",
|
||||
["2025-01-16"],
|
||||
["gpt-5", "claude-3.7-sonnet"]
|
||||
)
|
||||
|
||||
# One model succeeds
|
||||
job_manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
|
||||
job_manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "completed")
|
||||
|
||||
# One model fails
|
||||
job_manager.update_job_detail_status(job_id, "2025-01-16", "claude-3.7-sonnet", "running")
|
||||
job_manager.update_job_detail_status(
|
||||
job_id, "2025-01-16", "claude-3.7-sonnet", "failed",
|
||||
error="API timeout"
|
||||
)
|
||||
|
||||
# Job should be 'partial'
|
||||
job = job_manager.get_job(job_id)
|
||||
assert job["status"] == "partial"
|
||||
|
||||
progress = job_manager.get_job_progress(job_id)
|
||||
assert progress["completed"] == 1
|
||||
assert progress["failed"] == 1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Performance Considerations
|
||||
|
||||
### 8.1 Database Indexing
|
||||
|
||||
- `idx_jobs_status`: Fast filtering for running jobs
|
||||
- `idx_jobs_created_at DESC`: Fast retrieval of most recent job
|
||||
- `idx_job_details_unique`: Prevent duplicate model-day entries
|
||||
|
||||
### 8.2 Connection Pooling
|
||||
|
||||
For MVP, using `sqlite3.connect()` per operation is acceptable (low concurrency).
|
||||
|
||||
For higher concurrency (future), consider:
|
||||
- SQLAlchemy ORM with connection pooling
|
||||
- PostgreSQL for production deployments
|
||||
|
||||
### 8.3 Query Optimization
|
||||
|
||||
**Avoid N+1 queries:**
|
||||
```python
|
||||
# BAD: Separate query for each job's progress
|
||||
for job in jobs:
|
||||
progress = job_manager.get_job_progress(job["job_id"])
|
||||
|
||||
# GOOD: Join jobs and job_details in single query
|
||||
SELECT
|
||||
jobs.*,
|
||||
COUNT(job_details.id) as total,
|
||||
SUM(CASE WHEN job_details.status = 'completed' THEN 1 ELSE 0 END) as completed
|
||||
FROM jobs
|
||||
LEFT JOIN job_details ON jobs.job_id = job_details.job_id
|
||||
GROUP BY jobs.job_id
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Error Handling
|
||||
|
||||
### 9.1 Database Errors
|
||||
|
||||
**Scenario:** SQLite database is locked or corrupted
|
||||
|
||||
**Handling:**
|
||||
```python
|
||||
try:
|
||||
job_id = job_manager.create_job(...)
|
||||
except sqlite3.OperationalError as e:
|
||||
# Database locked - retry with exponential backoff
|
||||
logger.error(f"Database error: {e}")
|
||||
raise HTTPException(status_code=503, detail="Database temporarily unavailable")
|
||||
except sqlite3.IntegrityError as e:
|
||||
# Constraint violation (e.g., duplicate job_id)
|
||||
logger.error(f"Integrity error: {e}")
|
||||
raise HTTPException(status_code=400, detail="Invalid job data")
|
||||
```
|
||||
|
||||
### 9.2 Foreign Key Violations
|
||||
|
||||
**Scenario:** Attempt to create job_detail for non-existent job
|
||||
|
||||
**Prevention:**
|
||||
- Always create job record before job_details records
|
||||
- Use transactions to ensure atomicity
|
||||
|
||||
```python
|
||||
def create_job(self, ...):
|
||||
conn = get_db_connection(self.db_path)
|
||||
try:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Insert job
|
||||
cursor.execute("INSERT INTO jobs ...")
|
||||
|
||||
# Insert job_details
|
||||
for date in date_range:
|
||||
for model in models:
|
||||
cursor.execute("INSERT INTO job_details ...")
|
||||
|
||||
conn.commit() # Atomic commit
|
||||
except Exception as e:
|
||||
conn.rollback() # Rollback on any error
|
||||
raise
|
||||
finally:
|
||||
conn.close()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Migration Strategy
|
||||
|
||||
### 10.1 Schema Versioning
|
||||
|
||||
For future schema changes, use migration scripts:
|
||||
|
||||
```
|
||||
data/
|
||||
└── migrations/
|
||||
├── 001_initial_schema.sql
|
||||
├── 002_add_priority_column.sql
|
||||
└── ...
|
||||
```
|
||||
|
||||
Track applied migrations in database:
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS schema_migrations (
|
||||
version INTEGER PRIMARY KEY,
|
||||
applied_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 10.2 Backward Compatibility
|
||||
|
||||
When adding columns:
|
||||
- Use `ALTER TABLE ADD COLUMN ... DEFAULT ...` for backward compatibility
|
||||
- Never remove columns (deprecate instead)
|
||||
- Version API responses to handle schema changes
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The Job Manager provides:
|
||||
1. **Robust job tracking** with SQLite persistence
|
||||
2. **Concurrency control** ensuring single-job execution
|
||||
3. **Granular progress monitoring** at model-day level
|
||||
4. **Flexible status handling** (completed/partial/failed)
|
||||
5. **Idempotency** for duplicate trigger requests
|
||||
|
||||
Next specification: **Background Worker Architecture**
|
||||
@@ -1,197 +0,0 @@
|
||||
# Data Cache Reuse Design
|
||||
|
||||
**Date:** 2025-10-30
|
||||
**Status:** Approved
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Docker containers currently fetch all 103 NASDAQ 100 tickers from Alpha Vantage on every startup, even when price data is volume-mounted and already cached in `./data`. This causes:
|
||||
- Slow startup times (103 API calls)
|
||||
- Unnecessary API quota consumption
|
||||
- Rate limit risks during frequent development iterations
|
||||
|
||||
## Solution Overview
|
||||
|
||||
Implement staleness-based data refresh with configurable age threshold. Container checks all `daily_prices_*.json` files and only refetches if any file is missing or older than `MAX_DATA_AGE_DAYS`.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Architecture Choice
|
||||
**Selected:** Check all `daily_prices_*.json` files individually
|
||||
**Rationale:** Ensures data integrity by detecting partial/missing files, not just stale merged data
|
||||
|
||||
### Implementation Location
|
||||
**Selected:** Bash wrapper logic in `entrypoint.sh`
|
||||
**Rationale:** Keeps data fetching scripts unchanged, adds orchestration at container startup layer
|
||||
|
||||
### Staleness Threshold
|
||||
**Selected:** Configurable via `MAX_DATA_AGE_DAYS` environment variable (default: 7 days)
|
||||
**Rationale:** Balances freshness with API usage; flexible for different use cases (development vs production)
|
||||
|
||||
## Technical Design
|
||||
|
||||
### Components
|
||||
|
||||
#### 1. Staleness Check Function
|
||||
Location: `entrypoint.sh` (after environment validation, before data fetch)
|
||||
|
||||
```bash
|
||||
should_refresh_data() {
|
||||
MAX_AGE=${MAX_DATA_AGE_DAYS:-7}
|
||||
|
||||
# Check if at least one price file exists
|
||||
if ! ls /app/data/daily_prices_*.json >/dev/null 2>&1; then
|
||||
echo "📭 No price data found"
|
||||
return 0 # Need refresh
|
||||
fi
|
||||
|
||||
# Find any files older than MAX_AGE days
|
||||
STALE_COUNT=$(find /app/data -name "daily_prices_*.json" -mtime +$MAX_AGE | wc -l)
|
||||
TOTAL_COUNT=$(ls /app/data/daily_prices_*.json 2>/dev/null | wc -l)
|
||||
|
||||
if [ $STALE_COUNT -gt 0 ]; then
|
||||
echo "📅 Found $STALE_COUNT stale files (>$MAX_AGE days old)"
|
||||
return 0 # Need refresh
|
||||
fi
|
||||
|
||||
echo "✅ All $TOTAL_COUNT price files are fresh (<$MAX_AGE days old)"
|
||||
return 1 # Skip refresh
|
||||
}
|
||||
```
|
||||
|
||||
**Logic:**
|
||||
- Uses `find -mtime +N` to detect files modified more than N days ago
|
||||
- Returns shell exit codes: 0 (refresh needed), 1 (skip refresh)
|
||||
- Logs informative messages for debugging
|
||||
|
||||
#### 2. Conditional Data Fetch
|
||||
Location: `entrypoint.sh` lines 40-46 (replace existing unconditional fetch)
|
||||
|
||||
```bash
|
||||
# Step 1: Data preparation (conditional)
|
||||
echo "📊 Checking price data freshness..."
|
||||
|
||||
if should_refresh_data; then
|
||||
echo "🔄 Fetching and merging price data..."
|
||||
cd /app/data
|
||||
python /app/scripts/get_daily_price.py
|
||||
python /app/scripts/merge_jsonl.py
|
||||
cd /app
|
||||
else
|
||||
echo "⏭️ Skipping data fetch (using cached data)"
|
||||
fi
|
||||
```
|
||||
|
||||
#### 3. Environment Configuration
|
||||
**docker-compose.yml:**
|
||||
```yaml
|
||||
environment:
|
||||
- MAX_DATA_AGE_DAYS=${MAX_DATA_AGE_DAYS:-7}
|
||||
```
|
||||
|
||||
**.env.example:**
|
||||
```bash
|
||||
# Data Refresh Configuration
|
||||
MAX_DATA_AGE_DAYS=7 # Refresh price data older than N days (0=always refresh)
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **Container Startup** → entrypoint.sh begins execution
|
||||
2. **Environment Validation** → Check required API keys (existing logic)
|
||||
3. **Staleness Check** → `should_refresh_data()` scans `/app/data/daily_prices_*.json`
|
||||
- No files found → Return 0 (refresh)
|
||||
- Any file older than `MAX_DATA_AGE_DAYS` → Return 0 (refresh)
|
||||
- All files fresh → Return 1 (skip)
|
||||
4. **Conditional Fetch** → Run get_daily_price.py only if refresh needed
|
||||
5. **Merge Data** → Always run merge_jsonl.py (handles missing merged.jsonl)
|
||||
6. **MCP Services** → Start services (existing logic)
|
||||
7. **Trading Agent** → Begin trading (existing logic)
|
||||
|
||||
### Edge Cases
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| **First run (no data)** | Detects no files → triggers full fetch |
|
||||
| **Restart within 7 days** | All files fresh → skips fetch (fast startup) |
|
||||
| **Restart after 7 days** | Files stale → refreshes all data |
|
||||
| **Partial data (some files missing)** | Missing files treated as infinitely old → triggers refresh |
|
||||
| **Corrupt merged.jsonl but fresh price files** | Skips fetch, re-runs merge to rebuild merged.jsonl |
|
||||
| **MAX_DATA_AGE_DAYS=0** | Always refresh (useful for testing/production) |
|
||||
| **MAX_DATA_AGE_DAYS unset** | Defaults to 7 days |
|
||||
| **Alpha Vantage rate limit** | get_daily_price.py handles with warning (existing behavior) |
|
||||
|
||||
## Configuration Options
|
||||
|
||||
| Variable | Default | Purpose |
|
||||
|----------|---------|---------|
|
||||
| `MAX_DATA_AGE_DAYS` | 7 | Days before price data considered stale |
|
||||
|
||||
**Special Values:**
|
||||
- `0` → Always refresh (force fresh data)
|
||||
- `999` → Never refresh (use cached data indefinitely)
|
||||
|
||||
## User Experience
|
||||
|
||||
### Scenario 1: Fresh Container
|
||||
```
|
||||
🚀 Starting AI-Trader...
|
||||
🔍 Validating environment variables...
|
||||
✅ Environment variables validated
|
||||
📊 Checking price data freshness...
|
||||
📭 No price data found
|
||||
🔄 Fetching and merging price data...
|
||||
✓ Fetched NVDA
|
||||
✓ Fetched MSFT
|
||||
...
|
||||
```
|
||||
|
||||
### Scenario 2: Restart Within 7 Days
|
||||
```
|
||||
🚀 Starting AI-Trader...
|
||||
🔍 Validating environment variables...
|
||||
✅ Environment variables validated
|
||||
📊 Checking price data freshness...
|
||||
✅ All 103 price files are fresh (<7 days old)
|
||||
⏭️ Skipping data fetch (using cached data)
|
||||
🔧 Starting MCP services...
|
||||
```
|
||||
|
||||
### Scenario 3: Restart After 7 Days
|
||||
```
|
||||
🚀 Starting AI-Trader...
|
||||
🔍 Validating environment variables...
|
||||
✅ Environment variables validated
|
||||
📊 Checking price data freshness...
|
||||
📅 Found 103 stale files (>7 days old)
|
||||
🔄 Fetching and merging price data...
|
||||
✓ Fetched NVDA
|
||||
✓ Fetched MSFT
|
||||
...
|
||||
```
|
||||
|
||||
## Testing Plan
|
||||
|
||||
1. **Test fresh container:** Delete `./data/daily_prices_*.json`, start container → should fetch all
|
||||
2. **Test cached data:** Restart immediately → should skip fetch
|
||||
3. **Test staleness:** `touch -d "8 days ago" ./data/daily_prices_AAPL.json`, restart → should refresh
|
||||
4. **Test partial data:** Delete 10 random price files → should refresh all
|
||||
5. **Test MAX_DATA_AGE_DAYS=0:** Restart with env var set → should always fetch
|
||||
6. **Test MAX_DATA_AGE_DAYS=30:** Restart with 8-day-old data → should skip
|
||||
|
||||
## Documentation Updates
|
||||
|
||||
Files requiring updates:
|
||||
- `entrypoint.sh` → Add function and conditional logic
|
||||
- `docker-compose.yml` → Add MAX_DATA_AGE_DAYS environment variable
|
||||
- `.env.example` → Document MAX_DATA_AGE_DAYS with default value
|
||||
- `CLAUDE.md` → Update "Docker Deployment" section with new env var
|
||||
- `docs/DOCKER.md` (if exists) → Explain data caching behavior
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Development:** Instant container restarts during iteration
|
||||
- **API Quota:** ~103 fewer API calls per restart
|
||||
- **Reliability:** No rate limit risks during frequent testing
|
||||
- **Flexibility:** Configurable threshold for different use cases
|
||||
- **Consistency:** Checks all files to ensure complete data
|
||||
@@ -1,491 +0,0 @@
|
||||
# Docker Deployment and CI/CD Design
|
||||
|
||||
**Date:** 2025-10-30
|
||||
**Status:** Approved
|
||||
**Target:** Development/local testing environment
|
||||
|
||||
## Overview
|
||||
|
||||
Package AI-Trader as a Docker container with docker-compose orchestration and automated image builds via GitHub Actions on release tags. Focus on simplicity and ease of use for researchers and developers.
|
||||
|
||||
## Requirements
|
||||
|
||||
- **Primary Use Case:** Development and local testing
|
||||
- **Deployment Target:** Single monolithic container (all MCP services + trading agent)
|
||||
- **Secrets Management:** Environment variables (no mounted .env file)
|
||||
- **Data Strategy:** Fetch price data on container startup
|
||||
- **Container Registry:** GitHub Container Registry (ghcr.io)
|
||||
- **Trigger:** Build images automatically on release tag push (`v*` pattern)
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **Dockerfile** - Builds Python 3.10 image with all dependencies
|
||||
2. **docker-compose.yml** - Orchestrates container with volume mounts and environment config
|
||||
3. **entrypoint.sh** - Sequential startup script (data fetch → MCP services → trading agent)
|
||||
4. **GitHub Actions Workflow** - Automated image build and push on release tags
|
||||
5. **.dockerignore** - Excludes unnecessary files from image
|
||||
6. **Documentation** - Docker usage guide and examples
|
||||
|
||||
### Execution Flow
|
||||
|
||||
```
|
||||
Container Start
|
||||
↓
|
||||
entrypoint.sh
|
||||
↓
|
||||
1. Fetch/merge price data (get_daily_price.py → merge_jsonl.py)
|
||||
↓
|
||||
2. Start MCP services in background (start_mcp_services.py)
|
||||
↓
|
||||
3. Wait 3 seconds for service stabilization
|
||||
↓
|
||||
4. Run trading agent (main.py with config)
|
||||
↓
|
||||
Container Exit → Cleanup MCP services
|
||||
```
|
||||
|
||||
## Detailed Design
|
||||
|
||||
### 1. Dockerfile
|
||||
|
||||
**Multi-stage build:**
|
||||
|
||||
```dockerfile
|
||||
# Base stage
|
||||
FROM python:3.10-slim as base
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install dependencies
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Application stage
|
||||
FROM base
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Copy application code
|
||||
COPY . .
|
||||
|
||||
# Create necessary directories
|
||||
RUN mkdir -p data logs data/agent_data
|
||||
|
||||
# Make entrypoint executable
|
||||
RUN chmod +x entrypoint.sh
|
||||
|
||||
# Expose MCP service ports
|
||||
EXPOSE 8000 8001 8002 8003
|
||||
|
||||
# Set Python to run unbuffered
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
# Use entrypoint script
|
||||
ENTRYPOINT ["./entrypoint.sh"]
|
||||
CMD ["configs/default_config.json"]
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- `python:3.10-slim` base for smaller image size
|
||||
- Multi-stage for dependency caching
|
||||
- Non-root user NOT included (dev/testing focus, can add later)
|
||||
- Unbuffered Python output for real-time logs
|
||||
- Default config path with override support
|
||||
|
||||
### 2. docker-compose.yml
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
ai-trader:
|
||||
build: .
|
||||
container_name: ai-trader-app
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
- ./logs:/app/logs
|
||||
environment:
|
||||
- OPENAI_API_BASE=${OPENAI_API_BASE}
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
- ALPHAADVANTAGE_API_KEY=${ALPHAADVANTAGE_API_KEY}
|
||||
- JINA_API_KEY=${JINA_API_KEY}
|
||||
- RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
- MATH_HTTP_PORT=${MATH_HTTP_PORT:-8000}
|
||||
- SEARCH_HTTP_PORT=${SEARCH_HTTP_PORT:-8001}
|
||||
- TRADE_HTTP_PORT=${TRADE_HTTP_PORT:-8002}
|
||||
- GETPRICE_HTTP_PORT=${GETPRICE_HTTP_PORT:-8003}
|
||||
- AGENT_MAX_STEP=${AGENT_MAX_STEP:-30}
|
||||
ports:
|
||||
- "8000:8000"
|
||||
- "8001:8001"
|
||||
- "8002:8002"
|
||||
- "8003:8003"
|
||||
- "8888:8888" # Optional: web dashboard
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- Volume mounts for data/logs persistence
|
||||
- Environment variables interpolated from `.env` file (Docker Compose reads automatically)
|
||||
- No `.env` file mounted into container (cleaner separation)
|
||||
- Default port values with override support
|
||||
- Restart policy for recovery
|
||||
|
||||
### 3. entrypoint.sh
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -e # Exit on any error
|
||||
|
||||
echo "🚀 Starting AI-Trader..."
|
||||
|
||||
# Step 1: Data preparation
|
||||
echo "📊 Fetching and merging price data..."
|
||||
cd /app/data
|
||||
python get_daily_price.py
|
||||
python merge_jsonl.py
|
||||
cd /app
|
||||
|
||||
# Step 2: Start MCP services in background
|
||||
echo "🔧 Starting MCP services..."
|
||||
cd /app/agent_tools
|
||||
python start_mcp_services.py &
|
||||
MCP_PID=$!
|
||||
cd /app
|
||||
|
||||
# Step 3: Wait for services to initialize
|
||||
echo "⏳ Waiting for MCP services to start..."
|
||||
sleep 3
|
||||
|
||||
# Step 4: Run trading agent with config file
|
||||
echo "🤖 Starting trading agent..."
|
||||
CONFIG_FILE="${1:-configs/default_config.json}"
|
||||
python main.py "$CONFIG_FILE"
|
||||
|
||||
# Cleanup on exit
|
||||
trap "echo '🛑 Stopping MCP services...'; kill $MCP_PID 2>/dev/null" EXIT
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- Sequential execution with clear logging
|
||||
- MCP services run in background with PID capture
|
||||
- Trap ensures cleanup on container exit
|
||||
- Config file path as argument (defaults to `configs/default_config.json`)
|
||||
- Fail-fast with `set -e`
|
||||
|
||||
### 4. GitHub Actions Workflow
|
||||
|
||||
**File:** `.github/workflows/docker-release.yml`
|
||||
|
||||
```yaml
|
||||
name: Build and Push Docker Image
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*' # Triggers on v1.0.0, v2.1.3, etc.
|
||||
workflow_dispatch: # Manual trigger option
|
||||
|
||||
jobs:
|
||||
build-and-push:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
packages: write
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Login to GitHub Container Registry
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: ${{ github.actor }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Extract version from tag
|
||||
id: meta
|
||||
run: |
|
||||
VERSION=${GITHUB_REF#refs/tags/v}
|
||||
echo "version=$VERSION" >> $GITHUB_OUTPUT
|
||||
|
||||
- name: Build and push Docker image
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
push: true
|
||||
tags: |
|
||||
ghcr.io/${{ github.repository_owner }}/ai-trader:${{ steps.meta.outputs.version }}
|
||||
ghcr.io/${{ github.repository_owner }}/ai-trader:latest
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- Triggers on `v*` tags (e.g., `git tag v1.0.0 && git push origin v1.0.0`)
|
||||
- Manual dispatch option for testing
|
||||
- Uses `GITHUB_TOKEN` (automatically provided, no secrets needed)
|
||||
- Builds with caching for faster builds
|
||||
- Tags both version and `latest`
|
||||
- Multi-platform support possible by adding `platforms: linux/amd64,linux/arm64`
|
||||
|
||||
### 5. .dockerignore
|
||||
|
||||
```
|
||||
# Version control
|
||||
.git/
|
||||
.gitignore
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
venv/
|
||||
env/
|
||||
ENV/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
|
||||
# Environment and secrets
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
|
||||
# Data files (fetched at runtime)
|
||||
data/*.json
|
||||
data/agent_data/
|
||||
data/merged.jsonl
|
||||
|
||||
# Logs
|
||||
logs/
|
||||
*.log
|
||||
|
||||
# Runtime state
|
||||
runtime_env.json
|
||||
|
||||
# Documentation (not needed in image)
|
||||
*.md
|
||||
docs/
|
||||
!README.md
|
||||
|
||||
# CI/CD
|
||||
.github/
|
||||
```
|
||||
|
||||
**Purpose:**
|
||||
- Reduces image size
|
||||
- Keeps secrets out of image
|
||||
- Excludes generated files
|
||||
- Keeps only necessary source code and scripts
|
||||
|
||||
## Documentation Updates
|
||||
|
||||
### New File: docs/DOCKER.md
|
||||
|
||||
Create comprehensive Docker usage guide including:
|
||||
|
||||
1. **Quick Start**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your API keys
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
2. **Configuration**
|
||||
- Required environment variables
|
||||
- Optional configuration overrides
|
||||
- Custom config file usage
|
||||
|
||||
3. **Usage Examples**
|
||||
```bash
|
||||
# Run with default config
|
||||
docker-compose up
|
||||
|
||||
# Run with custom config
|
||||
docker-compose run ai-trader configs/my_config.json
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Stop and clean up
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
4. **Data Persistence**
|
||||
- How volume mounts work
|
||||
- Where data is stored
|
||||
- How to backup/restore
|
||||
|
||||
5. **Troubleshooting**
|
||||
- MCP services not starting → Check logs, verify ports available
|
||||
- Missing API keys → Check .env file
|
||||
- Data fetch failures → API rate limits or invalid keys
|
||||
- Permission issues → Volume mount permissions
|
||||
|
||||
6. **Using Pre-built Images**
|
||||
```bash
|
||||
docker pull ghcr.io/hkuds/ai-trader:latest
|
||||
docker run --env-file .env -v $(pwd)/data:/app/data ghcr.io/hkuds/ai-trader:latest
|
||||
```
|
||||
|
||||
### Update .env.example
|
||||
|
||||
Add/clarify Docker-specific variables:
|
||||
|
||||
```bash
|
||||
# AI Model API Configuration
|
||||
OPENAI_API_BASE=https://your-openai-proxy.com/v1
|
||||
OPENAI_API_KEY=your_openai_key
|
||||
|
||||
# Data Source Configuration
|
||||
ALPHAADVANTAGE_API_KEY=your_alpha_vantage_key
|
||||
JINA_API_KEY=your_jina_api_key
|
||||
|
||||
# System Configuration (Docker defaults)
|
||||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
|
||||
# MCP Service Ports
|
||||
MATH_HTTP_PORT=8000
|
||||
SEARCH_HTTP_PORT=8001
|
||||
TRADE_HTTP_PORT=8002
|
||||
GETPRICE_HTTP_PORT=8003
|
||||
|
||||
# Agent Configuration
|
||||
AGENT_MAX_STEP=30
|
||||
```
|
||||
|
||||
### Update Main README.md
|
||||
|
||||
Add Docker section after "Quick Start":
|
||||
|
||||
```markdown
|
||||
## Docker Deployment
|
||||
|
||||
### Using Docker Compose (Recommended)
|
||||
|
||||
```bash
|
||||
# Setup environment
|
||||
cp .env.example .env
|
||||
# Edit .env with your API keys
|
||||
|
||||
# Run with docker-compose
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
### Using Pre-built Images
|
||||
|
||||
```bash
|
||||
# Pull latest image
|
||||
docker pull ghcr.io/hkuds/ai-trader:latest
|
||||
|
||||
# Run container
|
||||
docker run --env-file .env \
|
||||
-v $(pwd)/data:/app/data \
|
||||
-v $(pwd)/logs:/app/logs \
|
||||
ghcr.io/hkuds/ai-trader:latest
|
||||
```
|
||||
|
||||
See [docs/DOCKER.md](docs/DOCKER.md) for detailed Docker usage guide.
|
||||
```
|
||||
|
||||
## Release Process
|
||||
|
||||
### For Maintainers
|
||||
|
||||
1. **Prepare release:**
|
||||
```bash
|
||||
# Ensure main branch is ready
|
||||
git checkout main
|
||||
git pull origin main
|
||||
```
|
||||
|
||||
2. **Create and push tag:**
|
||||
```bash
|
||||
git tag v1.0.0
|
||||
git push origin v1.0.0
|
||||
```
|
||||
|
||||
3. **GitHub Actions automatically:**
|
||||
- Builds Docker image
|
||||
- Tags with version and `latest`
|
||||
- Pushes to `ghcr.io/hkuds/ai-trader`
|
||||
|
||||
4. **Verify build:**
|
||||
- Check Actions tab for build status
|
||||
- Test pull: `docker pull ghcr.io/hkuds/ai-trader:v1.0.0`
|
||||
|
||||
5. **Optional: Create GitHub Release**
|
||||
- Add release notes
|
||||
- Include Docker pull command
|
||||
|
||||
### For Users
|
||||
|
||||
```bash
|
||||
# Pull specific version
|
||||
docker pull ghcr.io/hkuds/ai-trader:v1.0.0
|
||||
|
||||
# Or always get latest
|
||||
docker pull ghcr.io/hkuds/ai-trader:latest
|
||||
```
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Create Dockerfile with multi-stage build
|
||||
- [ ] Create docker-compose.yml with volume mounts and environment config
|
||||
- [ ] Create entrypoint.sh with sequential startup logic
|
||||
- [ ] Create .dockerignore to exclude unnecessary files
|
||||
- [ ] Create .github/workflows/docker-release.yml for CI/CD
|
||||
- [ ] Create docs/DOCKER.md with comprehensive usage guide
|
||||
- [ ] Update .env.example with Docker-specific variables
|
||||
- [ ] Update main README.md with Docker deployment section
|
||||
- [ ] Test local build: `docker-compose build`
|
||||
- [ ] Test local run: `docker-compose up`
|
||||
- [ ] Test with custom config
|
||||
- [ ] Verify data persistence across container restarts
|
||||
- [ ] Test GitHub Actions workflow (create test tag)
|
||||
- [ ] Verify image pushed to ghcr.io
|
||||
- [ ] Test pulling and running pre-built image
|
||||
- [ ] Update CLAUDE.md with Docker commands
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Possible improvements for production use:
|
||||
|
||||
1. **Multi-container Architecture**
|
||||
- Separate containers for each MCP service
|
||||
- Better isolation and independent scaling
|
||||
- More complex orchestration
|
||||
|
||||
2. **Security Hardening**
|
||||
- Non-root user in container
|
||||
- Docker secrets for production
|
||||
- Read-only filesystem where possible
|
||||
|
||||
3. **Monitoring**
|
||||
- Health checks for MCP services
|
||||
- Prometheus metrics export
|
||||
- Logging aggregation
|
||||
|
||||
4. **Optimization**
|
||||
- Multi-platform builds (ARM64 support)
|
||||
- Smaller base image (alpine)
|
||||
- Layer caching optimization
|
||||
|
||||
5. **Development Tools**
|
||||
- docker-compose.dev.yml with hot reload
|
||||
- Debug container with additional tools
|
||||
- Integration test container
|
||||
|
||||
These are deferred to keep initial implementation simple and focused on development/testing use cases.
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,102 +0,0 @@
|
||||
Docker Build Test Results
|
||||
==========================
|
||||
Date: 2025-10-30
|
||||
Branch: docker-deployment
|
||||
Working Directory: /home/bballou/AI-Trader/.worktrees/docker-deployment
|
||||
|
||||
Test 1: Docker Image Build
|
||||
---------------------------
|
||||
Command: docker-compose build
|
||||
Status: SUCCESS
|
||||
Result: Successfully built image 7b36b8f4c0e9
|
||||
|
||||
Build Output Summary:
|
||||
- Base image: python:3.10-slim
|
||||
- Build stages: Multi-stage build (base + application)
|
||||
- Dependencies installed successfully from requirements.txt
|
||||
- Application code copied
|
||||
- Directories created: data, logs, data/agent_data
|
||||
- Entrypoint script made executable
|
||||
- Ports exposed: 8000, 8001, 8002, 8003, 8888
|
||||
- Environment: PYTHONUNBUFFERED=1 set
|
||||
- Image size: 266MB
|
||||
- Build time: ~2 minutes (including dependency installation)
|
||||
|
||||
Key packages installed:
|
||||
- langchain==1.0.2
|
||||
- langchain-openai==1.0.1
|
||||
- langchain-mcp-adapters>=0.1.0
|
||||
- fastmcp==2.12.5
|
||||
- langgraph<1.1.0,>=1.0.0
|
||||
- pydantic<3.0.0,>=2.7.4
|
||||
- openai<3.0.0,>=1.109.1
|
||||
- All dependencies resolved without conflicts
|
||||
|
||||
Test 2: Image Verification
|
||||
---------------------------
|
||||
Command: docker images | grep ai-trader
|
||||
Status: SUCCESS
|
||||
Result: docker-deployment_ai-trader latest 7b36b8f4c0e9 9 seconds ago 266MB
|
||||
|
||||
Image Details:
|
||||
- Repository: docker-deployment_ai-trader
|
||||
- Tag: latest
|
||||
- Image ID: 7b36b8f4c0e9
|
||||
- Created: Just now
|
||||
- Size: 266MB (reasonable for Python 3.10 + ML dependencies)
|
||||
|
||||
Test 3: Configuration Parsing (Dry-Run)
|
||||
----------------------------------------
|
||||
Command: docker-compose --env-file .env.test config
|
||||
Status: SUCCESS
|
||||
Result: Configuration parsed correctly without errors
|
||||
|
||||
Test .env.test contents:
|
||||
OPENAI_API_KEY=test
|
||||
ALPHAADVANTAGE_API_KEY=test
|
||||
JINA_API_KEY=test
|
||||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||||
|
||||
Parsed Configuration:
|
||||
- Service name: ai-trader
|
||||
- Container name: ai-trader-app
|
||||
- Build context: /home/bballou/AI-Trader/.worktrees/docker-deployment
|
||||
- Environment variables correctly injected:
|
||||
* AGENT_MAX_STEP: '30' (default)
|
||||
* ALPHAADVANTAGE_API_KEY: test
|
||||
* GETPRICE_HTTP_PORT: '8003' (default)
|
||||
* JINA_API_KEY: test
|
||||
* MATH_HTTP_PORT: '8000' (default)
|
||||
* OPENAI_API_BASE: '' (not set, defaulted to blank)
|
||||
* OPENAI_API_KEY: test
|
||||
* RUNTIME_ENV_PATH: /app/data/runtime_env.json
|
||||
* SEARCH_HTTP_PORT: '8001' (default)
|
||||
* TRADE_HTTP_PORT: '8002' (default)
|
||||
- Ports correctly mapped: 8000, 8001, 8002, 8003, 8888
|
||||
- Volumes correctly configured:
|
||||
* ./data:/app/data:rw
|
||||
* ./logs:/app/logs:rw
|
||||
- Restart policy: unless-stopped
|
||||
- Docker Compose version: 3.8
|
||||
|
||||
Summary
|
||||
-------
|
||||
All Docker build tests PASSED successfully:
|
||||
✓ Docker image builds without errors
|
||||
✓ Image created with reasonable size (266MB)
|
||||
✓ Multi-stage build optimizes layer caching
|
||||
✓ All Python dependencies install correctly
|
||||
✓ Configuration parsing works with test environment
|
||||
✓ Environment variables properly injected
|
||||
✓ Volume mounts configured correctly
|
||||
✓ Port mappings set up correctly
|
||||
✓ Restart policy configured
|
||||
|
||||
No issues encountered during local Docker build testing.
|
||||
The Docker deployment is ready for use.
|
||||
|
||||
Next Steps:
|
||||
1. Test actual container startup with valid API keys
|
||||
2. Verify MCP services start correctly in container
|
||||
3. Test trading agent execution
|
||||
4. Consider creating test tag for GitHub Actions CI/CD verification
|
||||
30
docs/reference/data-formats.md
Normal file
30
docs/reference/data-formats.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Data Formats
|
||||
|
||||
File formats and schemas used by AI-Trader.
|
||||
|
||||
---
|
||||
|
||||
## Position File (`position.jsonl`)
|
||||
|
||||
```jsonl
|
||||
{"date": "2025-01-16", "id": 1, "this_action": {"action": "buy", "symbol": "AAPL", "amount": 10}, "positions": {"AAPL": 10, "CASH": 9500.0}}
|
||||
{"date": "2025-01-17", "id": 2, "this_action": {"action": "sell", "symbol": "AAPL", "amount": 5}, "positions": {"AAPL": 5, "CASH": 10750.0}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Price Data (`merged.jsonl`)
|
||||
|
||||
```jsonl
|
||||
{"Meta Data": {"2. Symbol": "AAPL", "3. Last Refreshed": "2025-01-16"}, "Time Series (Daily)": {"2025-01-16": {"1. buy price": "250.50", "2. high": "252.00", "3. low": "249.00", "4. sell price": "251.50", "5. volume": "50000000"}}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Log Files (`log.jsonl`)
|
||||
|
||||
Contains complete AI reasoning and tool usage for each trading session.
|
||||
|
||||
---
|
||||
|
||||
See database schema in [docs/developer/database-schema.md](../developer/database-schema.md) for SQLite formats.
|
||||
32
docs/reference/environment-variables.md
Normal file
32
docs/reference/environment-variables.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Environment Variables Reference
|
||||
|
||||
Complete list of configuration variables.
|
||||
|
||||
---
|
||||
|
||||
See [docs/user-guide/configuration.md](../user-guide/configuration.md#environment-variables) for detailed descriptions.
|
||||
|
||||
---
|
||||
|
||||
## Required
|
||||
|
||||
- `OPENAI_API_KEY`
|
||||
- `ALPHAADVANTAGE_API_KEY`
|
||||
- `JINA_API_KEY`
|
||||
|
||||
---
|
||||
|
||||
## Optional
|
||||
|
||||
- `API_PORT` (default: 8080)
|
||||
- `API_HOST` (default: 0.0.0.0)
|
||||
- `OPENAI_API_BASE`
|
||||
- `MAX_CONCURRENT_JOBS` (default: 1)
|
||||
- `MAX_SIMULATION_DAYS` (default: 30)
|
||||
- `AUTO_DOWNLOAD_PRICE_DATA` (default: true)
|
||||
- `AGENT_MAX_STEP` (default: 30)
|
||||
- `VOLUME_PATH` (default: .)
|
||||
- `MATH_HTTP_PORT` (default: 8000)
|
||||
- `SEARCH_HTTP_PORT` (default: 8001)
|
||||
- `TRADE_HTTP_PORT` (default: 8002)
|
||||
- `GETPRICE_HTTP_PORT` (default: 8003)
|
||||
39
docs/reference/mcp-tools.md
Normal file
39
docs/reference/mcp-tools.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# MCP Tools Reference
|
||||
|
||||
Model Context Protocol tools available to AI agents.
|
||||
|
||||
---
|
||||
|
||||
## Available Tools
|
||||
|
||||
### Math Tool (Port 8000)
|
||||
Mathematical calculations and analysis.
|
||||
|
||||
### Search Tool (Port 8001)
|
||||
Market intelligence via Jina AI search.
|
||||
- News articles
|
||||
- Analyst reports
|
||||
- Financial data
|
||||
|
||||
### Trade Tool (Port 8002)
|
||||
Buy/sell execution.
|
||||
- Place orders
|
||||
- Check balances
|
||||
- View positions
|
||||
|
||||
### Price Tool (Port 8003)
|
||||
Historical and current price data.
|
||||
- OHLCV data
|
||||
- Multiple symbols
|
||||
- Date filtering
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
AI agents access tools automatically through MCP protocol.
|
||||
Tools are localhost-only and not exposed to external network.
|
||||
|
||||
---
|
||||
|
||||
See `agent_tools/` directory for implementations.
|
||||
File diff suppressed because it is too large
Load Diff
327
docs/user-guide/configuration.md
Normal file
327
docs/user-guide/configuration.md
Normal file
@@ -0,0 +1,327 @@
|
||||
# Configuration Guide
|
||||
|
||||
Complete guide to configuring AI-Trader.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Set in `.env` file in project root.
|
||||
|
||||
### Required Variables
|
||||
|
||||
```bash
|
||||
# OpenAI API (or compatible endpoint)
|
||||
OPENAI_API_KEY=sk-your-key-here
|
||||
|
||||
# Alpha Vantage (price data)
|
||||
ALPHAADVANTAGE_API_KEY=your-key-here
|
||||
|
||||
# Jina AI (market intelligence search)
|
||||
JINA_API_KEY=your-key-here
|
||||
```
|
||||
|
||||
### Optional Variables
|
||||
|
||||
```bash
|
||||
# API Server Configuration
|
||||
API_PORT=8080 # Host port mapping (default: 8080)
|
||||
API_HOST=0.0.0.0 # Bind address (default: 0.0.0.0)
|
||||
|
||||
# OpenAI Configuration
|
||||
OPENAI_API_BASE=https://api.openai.com/v1 # Custom endpoint
|
||||
|
||||
# Simulation Limits
|
||||
MAX_CONCURRENT_JOBS=1 # Max simultaneous jobs (default: 1)
|
||||
MAX_SIMULATION_DAYS=30 # Max date range per job (default: 30)
|
||||
|
||||
# Price Data Management
|
||||
AUTO_DOWNLOAD_PRICE_DATA=true # Auto-fetch missing data (default: true)
|
||||
|
||||
# Agent Configuration
|
||||
AGENT_MAX_STEP=30 # Max reasoning steps per day (default: 30)
|
||||
|
||||
# Volume Paths
|
||||
VOLUME_PATH=. # Base directory for data (default: .)
|
||||
|
||||
# MCP Service Ports (usually don't need to change)
|
||||
MATH_HTTP_PORT=8000
|
||||
SEARCH_HTTP_PORT=8001
|
||||
TRADE_HTTP_PORT=8002
|
||||
GETPRICE_HTTP_PORT=8003
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Configuration
|
||||
|
||||
Edit `configs/default_config.json` to define available AI models.
|
||||
|
||||
### Configuration Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"agent_type": "BaseAgent",
|
||||
"date_range": {
|
||||
"init_date": "2025-01-01",
|
||||
"end_date": "2025-01-31"
|
||||
},
|
||||
"models": [
|
||||
{
|
||||
"name": "GPT-4",
|
||||
"basemodel": "openai/gpt-4",
|
||||
"signature": "gpt-4",
|
||||
"enabled": true
|
||||
}
|
||||
],
|
||||
"agent_config": {
|
||||
"max_steps": 30,
|
||||
"max_retries": 3,
|
||||
"initial_cash": 10000.0
|
||||
},
|
||||
"log_config": {
|
||||
"log_path": "./data/agent_data"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Model Configuration Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `name` | Yes | Display name for the model |
|
||||
| `basemodel` | Yes | Model identifier (e.g., `openai/gpt-4`, `anthropic/claude-3.7-sonnet`) |
|
||||
| `signature` | Yes | Unique identifier used in API requests and database |
|
||||
| `enabled` | Yes | Whether this model runs when no models specified in API request |
|
||||
| `openai_base_url` | No | Custom API endpoint for this model |
|
||||
| `openai_api_key` | No | Model-specific API key (overrides `OPENAI_API_KEY` env var) |
|
||||
|
||||
### Adding Custom Models
|
||||
|
||||
**Example: Add Claude 3.7 Sonnet**
|
||||
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "Claude 3.7 Sonnet",
|
||||
"basemodel": "anthropic/claude-3.7-sonnet",
|
||||
"signature": "claude-3.7-sonnet",
|
||||
"enabled": true,
|
||||
"openai_base_url": "https://api.anthropic.com/v1",
|
||||
"openai_api_key": "your-anthropic-key"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Example: Add DeepSeek via OpenRouter**
|
||||
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "DeepSeek",
|
||||
"basemodel": "deepseek/deepseek-chat",
|
||||
"signature": "deepseek",
|
||||
"enabled": true,
|
||||
"openai_base_url": "https://openrouter.ai/api/v1",
|
||||
"openai_api_key": "your-openrouter-key"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Agent Configuration
|
||||
|
||||
| Field | Description | Default |
|
||||
|-------|-------------|---------|
|
||||
| `max_steps` | Maximum reasoning iterations per trading day | 30 |
|
||||
| `max_retries` | Retry attempts on API failures | 3 |
|
||||
| `initial_cash` | Starting capital per model | 10000.0 |
|
||||
|
||||
---
|
||||
|
||||
## Port Configuration
|
||||
|
||||
### Default Ports
|
||||
|
||||
| Service | Internal Port | Host Port (configurable) |
|
||||
|---------|---------------|--------------------------|
|
||||
| API Server | 8080 | `API_PORT` (default: 8080) |
|
||||
| MCP Math | 8000 | Not exposed to host |
|
||||
| MCP Search | 8001 | Not exposed to host |
|
||||
| MCP Trade | 8002 | Not exposed to host |
|
||||
| MCP Price | 8003 | Not exposed to host |
|
||||
|
||||
### Changing API Port
|
||||
|
||||
If port 8080 is already in use:
|
||||
|
||||
```bash
|
||||
# Add to .env
|
||||
echo "API_PORT=8889" >> .env
|
||||
|
||||
# Restart
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
|
||||
# Access on new port
|
||||
curl http://localhost:8889/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Volume Configuration
|
||||
|
||||
Docker volumes persist data across container restarts:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- ./data:/app/data # Database, price data, agent data
|
||||
- ./configs:/app/configs # Configuration files
|
||||
- ./logs:/app/logs # Application logs
|
||||
```
|
||||
|
||||
### Data Directory Structure
|
||||
|
||||
```
|
||||
data/
|
||||
├── jobs.db # SQLite database
|
||||
├── merged.jsonl # Cached price data
|
||||
├── daily_prices_*.json # Individual stock data
|
||||
├── price_coverage.json # Data availability tracking
|
||||
└── agent_data/ # Agent execution data
|
||||
└── {signature}/
|
||||
├── position/
|
||||
│ └── position.jsonl # Trading positions
|
||||
└── log/
|
||||
└── {date}/
|
||||
└── log.jsonl # Trading logs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Key Setup
|
||||
|
||||
### OpenAI API Key
|
||||
|
||||
1. Visit [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
|
||||
2. Create new key
|
||||
3. Add to `.env`:
|
||||
```bash
|
||||
OPENAI_API_KEY=sk-...
|
||||
```
|
||||
|
||||
### Alpha Vantage API Key
|
||||
|
||||
1. Visit [alphavantage.co/support/#api-key](https://www.alphavantage.co/support/#api-key)
|
||||
2. Get free key (5 req/min) or premium (75 req/min)
|
||||
3. Add to `.env`:
|
||||
```bash
|
||||
ALPHAADVANTAGE_API_KEY=...
|
||||
```
|
||||
|
||||
### Jina AI API Key
|
||||
|
||||
1. Visit [jina.ai](https://jina.ai/)
|
||||
2. Sign up for free tier
|
||||
3. Add to `.env`:
|
||||
```bash
|
||||
JINA_API_KEY=...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### Development Setup
|
||||
|
||||
```bash
|
||||
# .env
|
||||
API_PORT=8080
|
||||
MAX_CONCURRENT_JOBS=1
|
||||
MAX_SIMULATION_DAYS=5 # Limit for faster testing
|
||||
AUTO_DOWNLOAD_PRICE_DATA=true
|
||||
AGENT_MAX_STEP=10 # Fewer steps for faster iteration
|
||||
```
|
||||
|
||||
### Production Setup
|
||||
|
||||
```bash
|
||||
# .env
|
||||
API_PORT=8080
|
||||
MAX_CONCURRENT_JOBS=1
|
||||
MAX_SIMULATION_DAYS=30
|
||||
AUTO_DOWNLOAD_PRICE_DATA=true
|
||||
AGENT_MAX_STEP=30
|
||||
```
|
||||
|
||||
### Multi-Model Competition
|
||||
|
||||
```json
|
||||
// configs/default_config.json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "GPT-4",
|
||||
"basemodel": "openai/gpt-4",
|
||||
"signature": "gpt-4",
|
||||
"enabled": true
|
||||
},
|
||||
{
|
||||
"name": "Claude 3.7",
|
||||
"basemodel": "anthropic/claude-3.7-sonnet",
|
||||
"signature": "claude-3.7",
|
||||
"enabled": true,
|
||||
"openai_base_url": "https://api.anthropic.com/v1",
|
||||
"openai_api_key": "anthropic-key"
|
||||
},
|
||||
{
|
||||
"name": "GPT-3.5 Turbo",
|
||||
"basemodel": "openai/gpt-3.5-turbo",
|
||||
"signature": "gpt-3.5-turbo",
|
||||
"enabled": false // Not run by default
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variable Priority
|
||||
|
||||
When the same configuration exists in multiple places:
|
||||
|
||||
1. **API request parameters** (highest priority)
|
||||
2. **Model-specific config** (`openai_base_url`, `openai_api_key` in model config)
|
||||
3. **Environment variables** (`.env` file)
|
||||
4. **Default values** (lowest priority)
|
||||
|
||||
Example:
|
||||
```json
|
||||
// If model config has:
|
||||
{
|
||||
"openai_api_key": "model-specific-key"
|
||||
}
|
||||
|
||||
// This overrides OPENAI_API_KEY from .env
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation
|
||||
|
||||
After configuration changes:
|
||||
|
||||
```bash
|
||||
# Restart service
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
|
||||
# Verify health
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Check logs for errors
|
||||
docker logs ai-trader | grep -i error
|
||||
```
|
||||
197
docs/user-guide/integration-examples.md
Normal file
197
docs/user-guide/integration-examples.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Integration Examples
|
||||
|
||||
Examples for integrating AI-Trader with external systems.
|
||||
|
||||
---
|
||||
|
||||
## Python
|
||||
|
||||
See complete Python client in [API_REFERENCE.md](../../API_REFERENCE.md#client-libraries).
|
||||
|
||||
### Async Client
|
||||
|
||||
```python
|
||||
import aiohttp
|
||||
import asyncio
|
||||
|
||||
class AsyncAITraderClient:
|
||||
def __init__(self, base_url="http://localhost:8080"):
|
||||
self.base_url = base_url
|
||||
|
||||
async def trigger_simulation(self, start_date, end_date=None, models=None):
|
||||
payload = {"start_date": start_date}
|
||||
if end_date:
|
||||
payload["end_date"] = end_date
|
||||
if models:
|
||||
payload["models"] = models
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
f"{self.base_url}/simulate/trigger",
|
||||
json=payload
|
||||
) as response:
|
||||
response.raise_for_status()
|
||||
return await response.json()
|
||||
|
||||
async def wait_for_completion(self, job_id, poll_interval=10):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
while True:
|
||||
async with session.get(
|
||||
f"{self.base_url}/simulate/status/{job_id}"
|
||||
) as response:
|
||||
status = await response.json()
|
||||
|
||||
if status["status"] in ["completed", "partial", "failed"]:
|
||||
return status
|
||||
|
||||
await asyncio.sleep(poll_interval)
|
||||
|
||||
# Usage
|
||||
async def main():
|
||||
client = AsyncAITraderClient()
|
||||
job = await client.trigger_simulation("2025-01-16", models=["gpt-4"])
|
||||
result = await client.wait_for_completion(job["job_id"])
|
||||
print(f"Simulation completed: {result['status']}")
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## TypeScript/JavaScript
|
||||
|
||||
See complete TypeScript client in [API_REFERENCE.md](../../API_REFERENCE.md#client-libraries).
|
||||
|
||||
---
|
||||
|
||||
## Bash/Shell Scripts
|
||||
|
||||
### Daily Automation
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# daily_simulation.sh
|
||||
|
||||
API_URL="http://localhost:8080"
|
||||
DATE=$(date -d "yesterday" +%Y-%m-%d)
|
||||
|
||||
echo "Triggering simulation for $DATE"
|
||||
|
||||
# Trigger
|
||||
RESPONSE=$(curl -s -X POST $API_URL/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"start_date\": \"$DATE\", \"models\": [\"gpt-4\"]}")
|
||||
|
||||
JOB_ID=$(echo $RESPONSE | jq -r '.job_id')
|
||||
echo "Job ID: $JOB_ID"
|
||||
|
||||
# Poll
|
||||
while true; do
|
||||
STATUS=$(curl -s $API_URL/simulate/status/$JOB_ID | jq -r '.status')
|
||||
echo "Status: $STATUS"
|
||||
|
||||
if [[ "$STATUS" == "completed" ]] || [[ "$STATUS" == "partial" ]] || [[ "$STATUS" == "failed" ]]; then
|
||||
break
|
||||
fi
|
||||
|
||||
sleep 30
|
||||
done
|
||||
|
||||
# Get results
|
||||
curl -s "$API_URL/results?job_id=$JOB_ID" | jq '.' > results_$DATE.json
|
||||
echo "Results saved to results_$DATE.json"
|
||||
```
|
||||
|
||||
Add to crontab:
|
||||
```bash
|
||||
0 6 * * * /path/to/daily_simulation.sh >> /var/log/ai-trader.log 2>&1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Apache Airflow
|
||||
|
||||
```python
|
||||
from airflow import DAG
|
||||
from airflow.operators.python import PythonOperator
|
||||
from datetime import datetime, timedelta
|
||||
import requests
|
||||
import time
|
||||
|
||||
def trigger_simulation(**context):
|
||||
response = requests.post(
|
||||
"http://ai-trader:8080/simulate/trigger",
|
||||
json={"start_date": "{{ ds }}", "models": ["gpt-4"]}
|
||||
)
|
||||
response.raise_for_status()
|
||||
return response.json()["job_id"]
|
||||
|
||||
def wait_for_completion(**context):
|
||||
job_id = context["task_instance"].xcom_pull(task_ids="trigger")
|
||||
|
||||
while True:
|
||||
response = requests.get(f"http://ai-trader:8080/simulate/status/{job_id}")
|
||||
status = response.json()
|
||||
|
||||
if status["status"] in ["completed", "partial", "failed"]:
|
||||
return status
|
||||
|
||||
time.sleep(30)
|
||||
|
||||
def fetch_results(**context):
|
||||
job_id = context["task_instance"].xcom_pull(task_ids="trigger")
|
||||
response = requests.get(f"http://ai-trader:8080/results?job_id={job_id}")
|
||||
return response.json()
|
||||
|
||||
default_args = {
|
||||
"owner": "airflow",
|
||||
"depends_on_past": False,
|
||||
"start_date": datetime(2025, 1, 1),
|
||||
"retries": 1,
|
||||
"retry_delay": timedelta(minutes=5),
|
||||
}
|
||||
|
||||
dag = DAG(
|
||||
"ai_trader_simulation",
|
||||
default_args=default_args,
|
||||
schedule_interval="0 6 * * *", # Daily at 6 AM
|
||||
catchup=False
|
||||
)
|
||||
|
||||
trigger_task = PythonOperator(
|
||||
task_id="trigger",
|
||||
python_callable=trigger_simulation,
|
||||
dag=dag
|
||||
)
|
||||
|
||||
wait_task = PythonOperator(
|
||||
task_id="wait",
|
||||
python_callable=wait_for_completion,
|
||||
dag=dag
|
||||
)
|
||||
|
||||
fetch_task = PythonOperator(
|
||||
task_id="fetch_results",
|
||||
python_callable=fetch_results,
|
||||
dag=dag
|
||||
)
|
||||
|
||||
trigger_task >> wait_task >> fetch_task
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Generic Workflow Automation
|
||||
|
||||
Any HTTP-capable automation service can integrate with AI-Trader:
|
||||
|
||||
1. **Trigger:** POST to `/simulate/trigger`
|
||||
2. **Poll:** GET `/simulate/status/{job_id}` every 10-30 seconds
|
||||
3. **Retrieve:** GET `/results?job_id={job_id}` when complete
|
||||
4. **Store:** Save results to your database/warehouse
|
||||
|
||||
**Key considerations:**
|
||||
- Handle 400 errors (concurrent jobs) gracefully
|
||||
- Implement exponential backoff for retries
|
||||
- Monitor health endpoint before triggering
|
||||
- Store job_id for tracking and debugging
|
||||
488
docs/user-guide/troubleshooting.md
Normal file
488
docs/user-guide/troubleshooting.md
Normal file
@@ -0,0 +1,488 @@
|
||||
# Troubleshooting Guide
|
||||
|
||||
Common issues and solutions for AI-Trader.
|
||||
|
||||
---
|
||||
|
||||
## Container Issues
|
||||
|
||||
### Container Won't Start
|
||||
|
||||
**Symptoms:**
|
||||
- `docker ps` shows no ai-trader container
|
||||
- Container exits immediately after starting
|
||||
|
||||
**Debug:**
|
||||
```bash
|
||||
# Check logs
|
||||
docker logs ai-trader
|
||||
|
||||
# Check if container exists (stopped)
|
||||
docker ps -a | grep ai-trader
|
||||
```
|
||||
|
||||
**Common Causes & Solutions:**
|
||||
|
||||
**1. Missing API Keys**
|
||||
```bash
|
||||
# Verify .env file
|
||||
cat .env | grep -E "OPENAI_API_KEY|ALPHAADVANTAGE_API_KEY|JINA_API_KEY"
|
||||
|
||||
# Should show all three keys with values
|
||||
```
|
||||
|
||||
**Solution:** Add missing keys to `.env`
|
||||
|
||||
**2. Port Already in Use**
|
||||
```bash
|
||||
# Check what's using port 8080
|
||||
sudo lsof -i :8080 # Linux/Mac
|
||||
netstat -ano | findstr :8080 # Windows
|
||||
```
|
||||
|
||||
**Solution:** Change port in `.env`:
|
||||
```bash
|
||||
echo "API_PORT=8889" >> .env
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
**3. Volume Permission Issues**
|
||||
```bash
|
||||
# Fix permissions
|
||||
chmod -R 755 data logs configs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Health Check Fails
|
||||
|
||||
**Symptoms:**
|
||||
- `curl http://localhost:8080/health` returns error or HTML page
|
||||
- Container running but API not responding
|
||||
|
||||
**Debug:**
|
||||
```bash
|
||||
# Check if API process is running
|
||||
docker exec ai-trader ps aux | grep uvicorn
|
||||
|
||||
# Test internal health (always port 8080 inside container)
|
||||
docker exec ai-trader curl http://localhost:8080/health
|
||||
|
||||
# Check configured port
|
||||
grep API_PORT .env
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
**If you get HTML 404 page:**
|
||||
Another service is using your configured port.
|
||||
|
||||
```bash
|
||||
# Find conflicting service
|
||||
sudo lsof -i :8080
|
||||
|
||||
# Change AI-Trader port
|
||||
echo "API_PORT=8889" >> .env
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
|
||||
# Now use new port
|
||||
curl http://localhost:8889/health
|
||||
```
|
||||
|
||||
**If MCP services didn't start:**
|
||||
```bash
|
||||
# Check MCP processes
|
||||
docker exec ai-trader ps aux | grep python
|
||||
|
||||
# Should see 4 MCP services on ports 8000-8003
|
||||
```
|
||||
|
||||
**If database issues:**
|
||||
```bash
|
||||
# Check database file
|
||||
docker exec ai-trader ls -l /app/data/jobs.db
|
||||
|
||||
# If missing, restart to recreate
|
||||
docker-compose restart
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Simulation Issues
|
||||
|
||||
### Job Stays in "Pending" Status
|
||||
|
||||
**Symptoms:**
|
||||
- Job triggered but never progresses to "running"
|
||||
- Status remains "pending" indefinitely
|
||||
|
||||
**Debug:**
|
||||
```bash
|
||||
# Check worker logs
|
||||
docker logs ai-trader | grep -i "worker\|simulation"
|
||||
|
||||
# Check database
|
||||
docker exec ai-trader sqlite3 /app/data/jobs.db "SELECT * FROM job_details;"
|
||||
|
||||
# Check MCP service accessibility
|
||||
docker exec ai-trader curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# Restart container (jobs resume automatically)
|
||||
docker-compose restart
|
||||
|
||||
# Check specific job status with details
|
||||
curl http://localhost:8080/simulate/status/$JOB_ID | jq '.details'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Job Takes Too Long / Timeouts
|
||||
|
||||
**Symptoms:**
|
||||
- Jobs taking longer than expected
|
||||
- Test scripts timing out
|
||||
|
||||
**Expected Execution Times:**
|
||||
- Single model-day: 2-5 minutes (with cached price data)
|
||||
- First run with data download: 10-15 minutes
|
||||
- 2-date, 2-model job: 10-20 minutes
|
||||
|
||||
**Solutions:**
|
||||
|
||||
**Increase poll timeout in monitoring:**
|
||||
```bash
|
||||
# Instead of fixed polling, use this
|
||||
while true; do
|
||||
STATUS=$(curl -s http://localhost:8080/simulate/status/$JOB_ID | jq -r '.status')
|
||||
echo "$(date): Status = $STATUS"
|
||||
|
||||
if [[ "$STATUS" == "completed" ]] || [[ "$STATUS" == "partial" ]] || [[ "$STATUS" == "failed" ]]; then
|
||||
break
|
||||
fi
|
||||
|
||||
sleep 30
|
||||
done
|
||||
```
|
||||
|
||||
**Check if agent is stuck:**
|
||||
```bash
|
||||
# View real-time logs
|
||||
docker logs -f ai-trader
|
||||
|
||||
# Look for repeated errors or infinite loops
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### "No trading dates with complete price data"
|
||||
|
||||
**Error Message:**
|
||||
```
|
||||
No trading dates with complete price data in range 2025-01-16 to 2025-01-17.
|
||||
All symbols must have data for a date to be tradeable.
|
||||
```
|
||||
|
||||
**Cause:** Missing price data for requested dates.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
**Option 1: Try Recent Dates**
|
||||
|
||||
Use more recent dates where data is more likely available:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"start_date": "2024-12-15", "models": ["gpt-4"]}'
|
||||
```
|
||||
|
||||
**Option 2: Manually Download Data**
|
||||
|
||||
```bash
|
||||
docker exec -it ai-trader bash
|
||||
cd data
|
||||
python get_daily_price.py # Downloads latest data
|
||||
python merge_jsonl.py # Merges into database
|
||||
exit
|
||||
|
||||
# Retry simulation
|
||||
```
|
||||
|
||||
**Option 3: Check Auto-Download Setting**
|
||||
|
||||
```bash
|
||||
# Ensure auto-download is enabled
|
||||
grep AUTO_DOWNLOAD_PRICE_DATA .env
|
||||
|
||||
# Should be: AUTO_DOWNLOAD_PRICE_DATA=true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Rate Limit Errors
|
||||
|
||||
**Symptoms:**
|
||||
- Logs show "rate limit" messages
|
||||
- Partial data downloaded
|
||||
|
||||
**Cause:** Alpha Vantage API rate limits (5 req/min free tier, 75 req/min premium)
|
||||
|
||||
**Solutions:**
|
||||
|
||||
**For free tier:**
|
||||
- Simulations automatically continue with available data
|
||||
- Next simulation resumes downloads
|
||||
- Consider upgrading to premium API key
|
||||
|
||||
**Workaround:**
|
||||
```bash
|
||||
# Pre-download data in batches
|
||||
docker exec -it ai-trader bash
|
||||
cd data
|
||||
|
||||
# Download in stages (wait 1 min between runs)
|
||||
python get_daily_price.py
|
||||
sleep 60
|
||||
python get_daily_price.py
|
||||
sleep 60
|
||||
python get_daily_price.py
|
||||
|
||||
python merge_jsonl.py
|
||||
exit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Issues
|
||||
|
||||
### 400 Bad Request: Another Job Running
|
||||
|
||||
**Error:**
|
||||
```json
|
||||
{
|
||||
"detail": "Another simulation job is already running or pending. Please wait for it to complete."
|
||||
}
|
||||
```
|
||||
|
||||
**Cause:** AI-Trader allows only 1 concurrent job by default.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
**Check current jobs:**
|
||||
```bash
|
||||
# Find running job
|
||||
curl http://localhost:8080/health # Verify API is up
|
||||
|
||||
# Query recent jobs (need to check database)
|
||||
docker exec ai-trader sqlite3 /app/data/jobs.db \
|
||||
"SELECT job_id, status FROM jobs ORDER BY created_at DESC LIMIT 5;"
|
||||
```
|
||||
|
||||
**Wait for completion:**
|
||||
```bash
|
||||
# Get the blocking job's status
|
||||
curl http://localhost:8080/simulate/status/{job_id}
|
||||
```
|
||||
|
||||
**Force-stop stuck job (last resort):**
|
||||
```bash
|
||||
# Update job status in database
|
||||
docker exec ai-trader sqlite3 /app/data/jobs.db \
|
||||
"UPDATE jobs SET status='failed' WHERE status IN ('pending', 'running');"
|
||||
|
||||
# Restart service
|
||||
docker-compose restart
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Invalid Date Format Errors
|
||||
|
||||
**Error:**
|
||||
```json
|
||||
{
|
||||
"detail": "Invalid date format: 2025-1-16. Expected YYYY-MM-DD"
|
||||
}
|
||||
```
|
||||
|
||||
**Solution:** Use zero-padded dates:
|
||||
|
||||
```bash
|
||||
# Wrong
|
||||
{"start_date": "2025-1-16"}
|
||||
|
||||
# Correct
|
||||
{"start_date": "2025-01-16"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Date Range Too Large
|
||||
|
||||
**Error:**
|
||||
```json
|
||||
{
|
||||
"detail": "Date range too large: 45 days. Maximum allowed: 30 days"
|
||||
}
|
||||
```
|
||||
|
||||
**Solution:** Split into smaller batches:
|
||||
|
||||
```bash
|
||||
# Instead of 2025-01-01 to 2025-02-15 (45 days)
|
||||
# Run as two jobs:
|
||||
|
||||
# Job 1: Jan 1-30
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-d '{"start_date": "2025-01-01", "end_date": "2025-01-30"}'
|
||||
|
||||
# Job 2: Jan 31 - Feb 15
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-d '{"start_date": "2025-01-31", "end_date": "2025-02-15"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Issues
|
||||
|
||||
### Database Corruption
|
||||
|
||||
**Symptoms:**
|
||||
- "database disk image is malformed"
|
||||
- Unexpected SQL errors
|
||||
|
||||
**Solutions:**
|
||||
|
||||
**Backup and rebuild:**
|
||||
```bash
|
||||
# Stop service
|
||||
docker-compose down
|
||||
|
||||
# Backup current database
|
||||
cp data/jobs.db data/jobs.db.backup
|
||||
|
||||
# Try recovery
|
||||
docker run --rm -v $(pwd)/data:/data alpine sqlite3 /data/jobs.db "PRAGMA integrity_check;"
|
||||
|
||||
# If corrupted, delete and restart (loses job history)
|
||||
rm data/jobs.db
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Missing Price Data Files
|
||||
|
||||
**Symptoms:**
|
||||
- Errors about missing `merged.jsonl`
|
||||
- Price query failures
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Re-download price data
|
||||
docker exec -it ai-trader bash
|
||||
cd data
|
||||
python get_daily_price.py
|
||||
python merge_jsonl.py
|
||||
ls -lh merged.jsonl # Should exist
|
||||
exit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Issues
|
||||
|
||||
### Slow Simulation Execution
|
||||
|
||||
**Typical speeds:**
|
||||
- Single model-day: 2-5 minutes
|
||||
- With cold start (first time): +3-5 minutes
|
||||
|
||||
**Causes & Solutions:**
|
||||
|
||||
**1. AI Model API is slow**
|
||||
- Check AI provider status page
|
||||
- Try different model
|
||||
- Increase timeout in config
|
||||
|
||||
**2. Network latency**
|
||||
- Check internet connection
|
||||
- Jina Search API might be slow
|
||||
|
||||
**3. MCP services overloaded**
|
||||
```bash
|
||||
# Check CPU usage
|
||||
docker stats ai-trader
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
**Normal:** 500MB - 1GB during simulation
|
||||
|
||||
**If higher:**
|
||||
```bash
|
||||
# Check memory
|
||||
docker stats ai-trader
|
||||
|
||||
# Restart if needed
|
||||
docker-compose restart
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Diagnostic Commands
|
||||
|
||||
```bash
|
||||
# Container status
|
||||
docker ps | grep ai-trader
|
||||
|
||||
# Real-time logs
|
||||
docker logs -f ai-trader
|
||||
|
||||
# Check errors only
|
||||
docker logs ai-trader 2>&1 | grep -i error
|
||||
|
||||
# Container resource usage
|
||||
docker stats ai-trader
|
||||
|
||||
# Access container shell
|
||||
docker exec -it ai-trader bash
|
||||
|
||||
# Database inspection
|
||||
docker exec -it ai-trader sqlite3 /app/data/jobs.db
|
||||
sqlite> SELECT * FROM jobs ORDER BY created_at DESC LIMIT 5;
|
||||
sqlite> SELECT status, COUNT(*) FROM jobs GROUP BY status;
|
||||
sqlite> .quit
|
||||
|
||||
# Check file permissions
|
||||
docker exec ai-trader ls -la /app/data
|
||||
|
||||
# Test API connectivity
|
||||
curl -v http://localhost:8080/health
|
||||
|
||||
# View all environment variables
|
||||
docker exec ai-trader env | sort
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Getting More Help
|
||||
|
||||
If your issue isn't covered here:
|
||||
|
||||
1. **Check logs** for specific error messages
|
||||
2. **Review** [API_REFERENCE.md](../../API_REFERENCE.md) for correct usage
|
||||
3. **Search** [GitHub Issues](https://github.com/Xe138/AI-Trader/issues)
|
||||
4. **Open new issue** with:
|
||||
- Error messages from logs
|
||||
- Steps to reproduce
|
||||
- Environment details (OS, Docker version)
|
||||
- Relevant config files (redact API keys)
|
||||
182
docs/user-guide/using-the-api.md
Normal file
182
docs/user-guide/using-the-api.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# Using the API
|
||||
|
||||
Common workflows and best practices for AI-Trader API.
|
||||
|
||||
---
|
||||
|
||||
## Basic Workflow
|
||||
|
||||
### 1. Trigger Simulation
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"start_date": "2025-01-16",
|
||||
"end_date": "2025-01-17",
|
||||
"models": ["gpt-4"]
|
||||
}'
|
||||
```
|
||||
|
||||
Save the `job_id` from response.
|
||||
|
||||
### 2. Poll for Completion
|
||||
|
||||
```bash
|
||||
JOB_ID="your-job-id-here"
|
||||
|
||||
while true; do
|
||||
STATUS=$(curl -s http://localhost:8080/simulate/status/$JOB_ID | jq -r '.status')
|
||||
echo "Status: $STATUS"
|
||||
|
||||
if [[ "$STATUS" == "completed" ]] || [[ "$STATUS" == "partial" ]] || [[ "$STATUS" == "failed" ]]; then
|
||||
break
|
||||
fi
|
||||
|
||||
sleep 10
|
||||
done
|
||||
```
|
||||
|
||||
### 3. Retrieve Results
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8080/results?job_id=$JOB_ID" | jq '.'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Single-Day Simulation
|
||||
|
||||
Omit `end_date` to simulate just one day:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-d '{"start_date": "2025-01-16", "models": ["gpt-4"]}'
|
||||
```
|
||||
|
||||
### All Enabled Models
|
||||
|
||||
Omit `models` to run all enabled models from config:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/simulate/trigger \
|
||||
-d '{"start_date": "2025-01-16", "end_date": "2025-01-20"}'
|
||||
```
|
||||
|
||||
### Filter Results
|
||||
|
||||
```bash
|
||||
# By date
|
||||
curl "http://localhost:8080/results?date=2025-01-16"
|
||||
|
||||
# By model
|
||||
curl "http://localhost:8080/results?model=gpt-4"
|
||||
|
||||
# Combined
|
||||
curl "http://localhost:8080/results?job_id=$JOB_ID&date=2025-01-16&model=gpt-4"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Check Health Before Triggering
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Only proceed if status is "healthy"
|
||||
```
|
||||
|
||||
### 2. Use Exponential Backoff for Retries
|
||||
|
||||
```python
|
||||
import time
|
||||
import requests
|
||||
|
||||
def trigger_with_retry(max_retries=3):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
response = requests.post(
|
||||
"http://localhost:8080/simulate/trigger",
|
||||
json={"start_date": "2025-01-16"}
|
||||
)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except requests.HTTPError as e:
|
||||
if e.response.status_code == 400:
|
||||
# Don't retry on validation errors
|
||||
raise
|
||||
wait = 2 ** attempt # 1s, 2s, 4s
|
||||
time.sleep(wait)
|
||||
|
||||
raise Exception("Max retries exceeded")
|
||||
```
|
||||
|
||||
### 3. Handle Concurrent Job Conflicts
|
||||
|
||||
```python
|
||||
response = requests.post(
|
||||
"http://localhost:8080/simulate/trigger",
|
||||
json={"start_date": "2025-01-16"}
|
||||
)
|
||||
|
||||
if response.status_code == 400 and "already running" in response.json()["detail"]:
|
||||
print("Another job is running. Waiting...")
|
||||
# Wait and retry, or query existing job status
|
||||
```
|
||||
|
||||
### 4. Monitor Progress with Details
|
||||
|
||||
```python
|
||||
def get_detailed_progress(job_id):
|
||||
response = requests.get(f"http://localhost:8080/simulate/status/{job_id}")
|
||||
status = response.json()
|
||||
|
||||
print(f"Overall: {status['status']}")
|
||||
print(f"Progress: {status['progress']['completed']}/{status['progress']['total_model_days']}")
|
||||
|
||||
# Show per-model-day status
|
||||
for detail in status['details']:
|
||||
print(f" {detail['trading_date']} {detail['model_signature']}: {detail['status']}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Validation Errors (400)
|
||||
|
||||
```python
|
||||
try:
|
||||
response = requests.post(
|
||||
"http://localhost:8080/simulate/trigger",
|
||||
json={"start_date": "2025-1-16"} # Wrong format
|
||||
)
|
||||
response.raise_for_status()
|
||||
except requests.HTTPError as e:
|
||||
if e.response.status_code == 400:
|
||||
print(f"Validation error: {e.response.json()['detail']}")
|
||||
# Fix input and retry
|
||||
```
|
||||
|
||||
### Service Unavailable (503)
|
||||
|
||||
```python
|
||||
try:
|
||||
response = requests.post(
|
||||
"http://localhost:8080/simulate/trigger",
|
||||
json={"start_date": "2025-01-16"}
|
||||
)
|
||||
response.raise_for_status()
|
||||
except requests.HTTPError as e:
|
||||
if e.response.status_code == 503:
|
||||
print("Service unavailable (likely price data download failed)")
|
||||
# Retry later or check ALPHAADVANTAGE_API_KEY
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
See [API_REFERENCE.md](../../API_REFERENCE.md) for complete endpoint documentation.
|
||||
@@ -1,900 +0,0 @@
|
||||
# Background Worker Architecture Specification
|
||||
|
||||
## 1. Overview
|
||||
|
||||
The Background Worker executes simulation jobs asynchronously, allowing the API to return immediately (202 Accepted) while simulations run in the background.
|
||||
|
||||
**Key Responsibilities:**
|
||||
1. Execute simulation jobs queued by `/simulate/trigger` endpoint
|
||||
2. Manage per-model-day execution with status updates
|
||||
3. Handle errors gracefully (model failures don't block other models)
|
||||
4. Coordinate runtime configuration for concurrent model execution
|
||||
5. Update job status in database throughout execution
|
||||
|
||||
---
|
||||
|
||||
## 2. Worker Architecture
|
||||
|
||||
### 2.1 Execution Model
|
||||
|
||||
**Pattern:** Date-sequential, Model-parallel execution
|
||||
|
||||
```
|
||||
Job: Simulate 2025-01-16 to 2025-01-18 for models [gpt-5, claude-3.7-sonnet]
|
||||
|
||||
Execution flow:
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Date: 2025-01-16 │
|
||||
│ ├─ gpt-5 (running) ┐ │
|
||||
│ └─ claude-3.7-sonnet (running) ┘ Parallel │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼ (both complete)
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Date: 2025-01-17 │
|
||||
│ ├─ gpt-5 (running) ┐ │
|
||||
│ └─ claude-3.7-sonnet (running) ┘ Parallel │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Date: 2025-01-18 │
|
||||
│ ├─ gpt-5 (running) ┐ │
|
||||
│ └─ claude-3.7-sonnet (running) ┘ Parallel │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- **Models run in parallel** → Faster total execution (30-60s per model-day, 3 models = ~30-60s per date instead of ~90-180s)
|
||||
- **Dates run sequentially** → Ensures position.jsonl integrity (no concurrent writes to same file)
|
||||
- **Independent failure handling** → One model's failure doesn't block other models
|
||||
|
||||
---
|
||||
|
||||
### 2.2 File Structure
|
||||
|
||||
```
|
||||
api/
|
||||
├── worker.py # SimulationWorker class
|
||||
├── executor.py # Single model-day execution logic
|
||||
└── runtime_manager.py # Runtime config isolation
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Worker Implementation
|
||||
|
||||
### 3.1 SimulationWorker Class
|
||||
|
||||
```python
|
||||
# api/worker.py
|
||||
|
||||
import asyncio
|
||||
from typing import List, Dict
|
||||
from datetime import datetime
|
||||
import logging
|
||||
from api.job_manager import JobManager
|
||||
from api.executor import ModelDayExecutor
|
||||
from main import load_config, get_agent_class
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class SimulationWorker:
|
||||
"""
|
||||
Executes simulation jobs in the background.
|
||||
|
||||
Manages:
|
||||
- Date-sequential, model-parallel execution
|
||||
- Job status updates throughout execution
|
||||
- Error handling and recovery
|
||||
"""
|
||||
|
||||
def __init__(self, job_manager: JobManager):
|
||||
self.job_manager = job_manager
|
||||
self.executor = ModelDayExecutor(job_manager)
|
||||
|
||||
async def run_job(self, job_id: str) -> None:
|
||||
"""
|
||||
Execute a simulation job.
|
||||
|
||||
Args:
|
||||
job_id: UUID of job to execute
|
||||
|
||||
Flow:
|
||||
1. Load job from database
|
||||
2. Load configuration file
|
||||
3. Initialize agents for each model
|
||||
4. For each date sequentially:
|
||||
- Run all models in parallel
|
||||
- Update status after each model-day
|
||||
5. Mark job as completed/partial/failed
|
||||
"""
|
||||
logger.info(f"Starting simulation job {job_id}")
|
||||
|
||||
try:
|
||||
# 1. Load job metadata
|
||||
job = self.job_manager.get_job(job_id)
|
||||
if not job:
|
||||
logger.error(f"Job {job_id} not found")
|
||||
return
|
||||
|
||||
# 2. Update job status to 'running'
|
||||
self.job_manager.update_job_status(job_id, "running")
|
||||
|
||||
# 3. Load configuration
|
||||
config = load_config(job["config_path"])
|
||||
|
||||
# 4. Get enabled models from config
|
||||
enabled_models = [
|
||||
m for m in config["models"]
|
||||
if m.get("signature") in job["models"] and m.get("enabled", True)
|
||||
]
|
||||
|
||||
if not enabled_models:
|
||||
raise ValueError("No enabled models found in configuration")
|
||||
|
||||
# 5. Get agent class
|
||||
agent_type = config.get("agent_type", "BaseAgent")
|
||||
AgentClass = get_agent_class(agent_type)
|
||||
|
||||
# 6. Execute each date sequentially
|
||||
for date in job["date_range"]:
|
||||
logger.info(f"[Job {job_id}] Processing date: {date}")
|
||||
|
||||
# Run all models for this date in parallel
|
||||
tasks = []
|
||||
for model_config in enabled_models:
|
||||
task = self.executor.run_model_day(
|
||||
job_id=job_id,
|
||||
date=date,
|
||||
model_config=model_config,
|
||||
agent_class=AgentClass,
|
||||
config=config
|
||||
)
|
||||
tasks.append(task)
|
||||
|
||||
# Wait for all models to complete this date
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
# Log any exceptions (already handled by executor, just for visibility)
|
||||
for i, result in enumerate(results):
|
||||
if isinstance(result, Exception):
|
||||
model_sig = enabled_models[i]["signature"]
|
||||
logger.error(f"[Job {job_id}] Model {model_sig} failed on {date}: {result}")
|
||||
|
||||
logger.info(f"[Job {job_id}] Date {date} completed")
|
||||
|
||||
# 7. Job execution finished - final status will be set by job_manager
|
||||
# based on job_details statuses
|
||||
logger.info(f"[Job {job_id}] All dates processed")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[Job {job_id}] Fatal error: {e}", exc_info=True)
|
||||
self.job_manager.update_job_status(job_id, "failed", error=str(e))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.2 ModelDayExecutor
|
||||
|
||||
```python
|
||||
# api/executor.py
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import logging
|
||||
from typing import Dict, Any
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from api.job_manager import JobManager
|
||||
from api.runtime_manager import RuntimeConfigManager
|
||||
from tools.general_tools import write_config_value
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class ModelDayExecutor:
|
||||
"""
|
||||
Executes a single model-day simulation.
|
||||
|
||||
Responsibilities:
|
||||
- Initialize agent for specific model
|
||||
- Set up isolated runtime configuration
|
||||
- Execute trading session
|
||||
- Update job_detail status
|
||||
- Handle errors without blocking other models
|
||||
"""
|
||||
|
||||
def __init__(self, job_manager: JobManager):
|
||||
self.job_manager = job_manager
|
||||
self.runtime_manager = RuntimeConfigManager()
|
||||
|
||||
async def run_model_day(
|
||||
self,
|
||||
job_id: str,
|
||||
date: str,
|
||||
model_config: Dict[str, Any],
|
||||
agent_class: type,
|
||||
config: Dict[str, Any]
|
||||
) -> None:
|
||||
"""
|
||||
Execute simulation for one model on one date.
|
||||
|
||||
Args:
|
||||
job_id: Job UUID
|
||||
date: Trading date (YYYY-MM-DD)
|
||||
model_config: Model configuration dict from config file
|
||||
agent_class: Agent class (e.g., BaseAgent)
|
||||
config: Full configuration dict
|
||||
|
||||
Updates:
|
||||
- job_details status: pending → running → completed/failed
|
||||
- Writes to position.jsonl and log.jsonl
|
||||
"""
|
||||
model_sig = model_config["signature"]
|
||||
logger.info(f"[Job {job_id}] Starting {model_sig} on {date}")
|
||||
|
||||
# Update status to 'running'
|
||||
self.job_manager.update_job_detail_status(
|
||||
job_id, date, model_sig, "running"
|
||||
)
|
||||
|
||||
# Create isolated runtime config for this execution
|
||||
runtime_config_path = self.runtime_manager.create_runtime_config(
|
||||
job_id=job_id,
|
||||
model_sig=model_sig,
|
||||
date=date
|
||||
)
|
||||
|
||||
try:
|
||||
# 1. Extract model parameters
|
||||
basemodel = model_config.get("basemodel")
|
||||
openai_base_url = model_config.get("openai_base_url")
|
||||
openai_api_key = model_config.get("openai_api_key")
|
||||
|
||||
if not basemodel:
|
||||
raise ValueError(f"Model {model_sig} missing basemodel field")
|
||||
|
||||
# 2. Get agent configuration
|
||||
agent_config = config.get("agent_config", {})
|
||||
log_config = config.get("log_config", {})
|
||||
|
||||
max_steps = agent_config.get("max_steps", 10)
|
||||
max_retries = agent_config.get("max_retries", 3)
|
||||
base_delay = agent_config.get("base_delay", 0.5)
|
||||
initial_cash = agent_config.get("initial_cash", 10000.0)
|
||||
log_path = log_config.get("log_path", "./data/agent_data")
|
||||
|
||||
# 3. Get stock symbols from prompts
|
||||
from prompts.agent_prompt import all_nasdaq_100_symbols
|
||||
|
||||
# 4. Create agent instance
|
||||
agent = agent_class(
|
||||
signature=model_sig,
|
||||
basemodel=basemodel,
|
||||
stock_symbols=all_nasdaq_100_symbols,
|
||||
log_path=log_path,
|
||||
openai_base_url=openai_base_url,
|
||||
openai_api_key=openai_api_key,
|
||||
max_steps=max_steps,
|
||||
max_retries=max_retries,
|
||||
base_delay=base_delay,
|
||||
initial_cash=initial_cash,
|
||||
init_date=date # Note: This is used for initial registration
|
||||
)
|
||||
|
||||
# 5. Initialize MCP connection and AI model
|
||||
# (Only do this once per job, not per date - optimization for future)
|
||||
await agent.initialize()
|
||||
|
||||
# 6. Set runtime configuration for this execution
|
||||
# Override RUNTIME_ENV_PATH to use isolated config
|
||||
original_runtime_path = os.environ.get("RUNTIME_ENV_PATH")
|
||||
os.environ["RUNTIME_ENV_PATH"] = runtime_config_path
|
||||
|
||||
try:
|
||||
# Write runtime config values
|
||||
write_config_value("TODAY_DATE", date)
|
||||
write_config_value("SIGNATURE", model_sig)
|
||||
write_config_value("IF_TRADE", False)
|
||||
|
||||
# 7. Execute trading session
|
||||
await agent.run_trading_session(date)
|
||||
|
||||
# 8. Mark as completed
|
||||
self.job_manager.update_job_detail_status(
|
||||
job_id, date, model_sig, "completed"
|
||||
)
|
||||
|
||||
logger.info(f"[Job {job_id}] Completed {model_sig} on {date}")
|
||||
|
||||
finally:
|
||||
# Restore original runtime path
|
||||
if original_runtime_path:
|
||||
os.environ["RUNTIME_ENV_PATH"] = original_runtime_path
|
||||
else:
|
||||
os.environ.pop("RUNTIME_ENV_PATH", None)
|
||||
|
||||
except Exception as e:
|
||||
# Log error and update status to 'failed'
|
||||
error_msg = f"{type(e).__name__}: {str(e)}"
|
||||
logger.error(
|
||||
f"[Job {job_id}] Failed {model_sig} on {date}: {error_msg}",
|
||||
exc_info=True
|
||||
)
|
||||
|
||||
self.job_manager.update_job_detail_status(
|
||||
job_id, date, model_sig, "failed", error=error_msg
|
||||
)
|
||||
|
||||
finally:
|
||||
# Cleanup runtime config file
|
||||
self.runtime_manager.cleanup_runtime_config(runtime_config_path)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3.3 RuntimeConfigManager
|
||||
|
||||
```python
|
||||
# api/runtime_manager.py
|
||||
|
||||
import os
|
||||
import json
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class RuntimeConfigManager:
|
||||
"""
|
||||
Manages isolated runtime configuration files for concurrent model execution.
|
||||
|
||||
Problem:
|
||||
Multiple models running concurrently need separate runtime_env.json files
|
||||
to avoid race conditions on TODAY_DATE, SIGNATURE, IF_TRADE values.
|
||||
|
||||
Solution:
|
||||
Create temporary runtime config file per model-day execution:
|
||||
- /app/data/runtime_env_{job_id}_{model}_{date}.json
|
||||
|
||||
Lifecycle:
|
||||
1. create_runtime_config() → Creates temp file
|
||||
2. Executor sets RUNTIME_ENV_PATH env var
|
||||
3. Agent uses isolated config via get_config_value/write_config_value
|
||||
4. cleanup_runtime_config() → Deletes temp file
|
||||
"""
|
||||
|
||||
def __init__(self, data_dir: str = "data"):
|
||||
self.data_dir = Path(data_dir)
|
||||
self.data_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def create_runtime_config(
|
||||
self,
|
||||
job_id: str,
|
||||
model_sig: str,
|
||||
date: str
|
||||
) -> str:
|
||||
"""
|
||||
Create isolated runtime config file for this execution.
|
||||
|
||||
Args:
|
||||
job_id: Job UUID
|
||||
model_sig: Model signature
|
||||
date: Trading date
|
||||
|
||||
Returns:
|
||||
Path to created runtime config file
|
||||
"""
|
||||
# Generate unique filename
|
||||
filename = f"runtime_env_{job_id[:8]}_{model_sig}_{date}.json"
|
||||
config_path = self.data_dir / filename
|
||||
|
||||
# Initialize with default values
|
||||
initial_config = {
|
||||
"TODAY_DATE": date,
|
||||
"SIGNATURE": model_sig,
|
||||
"IF_TRADE": False,
|
||||
"JOB_ID": job_id
|
||||
}
|
||||
|
||||
with open(config_path, "w", encoding="utf-8") as f:
|
||||
json.dump(initial_config, f, indent=4)
|
||||
|
||||
logger.debug(f"Created runtime config: {config_path}")
|
||||
return str(config_path)
|
||||
|
||||
def cleanup_runtime_config(self, config_path: str) -> None:
|
||||
"""
|
||||
Delete runtime config file after execution.
|
||||
|
||||
Args:
|
||||
config_path: Path to runtime config file
|
||||
"""
|
||||
try:
|
||||
if os.path.exists(config_path):
|
||||
os.unlink(config_path)
|
||||
logger.debug(f"Cleaned up runtime config: {config_path}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to cleanup runtime config {config_path}: {e}")
|
||||
|
||||
def cleanup_all_runtime_configs(self) -> int:
|
||||
"""
|
||||
Cleanup all runtime config files (for maintenance/startup).
|
||||
|
||||
Returns:
|
||||
Number of files deleted
|
||||
"""
|
||||
count = 0
|
||||
for config_file in self.data_dir.glob("runtime_env_*.json"):
|
||||
try:
|
||||
config_file.unlink()
|
||||
count += 1
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete {config_file}: {e}")
|
||||
|
||||
if count > 0:
|
||||
logger.info(f"Cleaned up {count} stale runtime config files")
|
||||
|
||||
return count
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Integration with FastAPI
|
||||
|
||||
### 4.1 Background Task Pattern
|
||||
|
||||
```python
|
||||
# api/main.py
|
||||
|
||||
from fastapi import FastAPI, BackgroundTasks, HTTPException
|
||||
from api.job_manager import JobManager
|
||||
from api.worker import SimulationWorker
|
||||
from api.models import TriggerSimulationRequest, TriggerSimulationResponse
|
||||
|
||||
app = FastAPI(title="AI-Trader API")
|
||||
|
||||
# Global instances
|
||||
job_manager = JobManager()
|
||||
worker = SimulationWorker(job_manager)
|
||||
|
||||
@app.post("/simulate/trigger", response_model=TriggerSimulationResponse)
|
||||
async def trigger_simulation(
|
||||
request: TriggerSimulationRequest,
|
||||
background_tasks: BackgroundTasks
|
||||
):
|
||||
"""
|
||||
Trigger a catch-up simulation job.
|
||||
|
||||
Returns:
|
||||
202 Accepted with job details if new job queued
|
||||
200 OK with existing job details if already running
|
||||
"""
|
||||
# 1. Load configuration
|
||||
config = load_config(request.config_path)
|
||||
|
||||
# 2. Determine date range (last position date → most recent trading day)
|
||||
date_range = calculate_date_range(config)
|
||||
|
||||
if not date_range:
|
||||
return {
|
||||
"status": "current",
|
||||
"message": "Simulation already up-to-date",
|
||||
"last_simulation_date": get_last_simulation_date(config),
|
||||
"next_trading_day": get_next_trading_day()
|
||||
}
|
||||
|
||||
# 3. Get enabled models
|
||||
models = [m["signature"] for m in config["models"] if m.get("enabled", True)]
|
||||
|
||||
# 4. Check for existing job with same date range
|
||||
existing_job = job_manager.find_job_by_date_range(date_range)
|
||||
if existing_job:
|
||||
# Return existing job status
|
||||
progress = job_manager.get_job_progress(existing_job["job_id"])
|
||||
return {
|
||||
"job_id": existing_job["job_id"],
|
||||
"status": existing_job["status"],
|
||||
"date_range": date_range,
|
||||
"models": models,
|
||||
"created_at": existing_job["created_at"],
|
||||
"message": "Simulation already in progress",
|
||||
"progress": progress
|
||||
}
|
||||
|
||||
# 5. Create new job
|
||||
try:
|
||||
job_id = job_manager.create_job(
|
||||
config_path=request.config_path,
|
||||
date_range=date_range,
|
||||
models=models
|
||||
)
|
||||
except ValueError as e:
|
||||
# Another job is running (different date range)
|
||||
raise HTTPException(status_code=409, detail=str(e))
|
||||
|
||||
# 6. Queue background task
|
||||
background_tasks.add_task(worker.run_job, job_id)
|
||||
|
||||
# 7. Return immediately with job details
|
||||
return {
|
||||
"job_id": job_id,
|
||||
"status": "accepted",
|
||||
"date_range": date_range,
|
||||
"models": models,
|
||||
"created_at": datetime.utcnow().isoformat() + "Z",
|
||||
"message": "Simulation job queued successfully"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Agent Initialization Optimization
|
||||
|
||||
### 5.1 Current Issue
|
||||
|
||||
**Problem:** Each model-day calls `agent.initialize()`, which:
|
||||
1. Creates new MCP client connections
|
||||
2. Creates new AI model instance
|
||||
|
||||
For a 5-day simulation with 3 models = 15 `initialize()` calls → Slow
|
||||
|
||||
### 5.2 Optimization Strategy (Future Enhancement)
|
||||
|
||||
**Option A: Persistent Agent Instances**
|
||||
|
||||
Create agent once per model, reuse for all dates:
|
||||
|
||||
```python
|
||||
class SimulationWorker:
|
||||
async def run_job(self, job_id: str) -> None:
|
||||
# ... load config ...
|
||||
|
||||
# Initialize all agents once
|
||||
agents = {}
|
||||
for model_config in enabled_models:
|
||||
agent = await self._create_and_initialize_agent(
|
||||
model_config, AgentClass, config
|
||||
)
|
||||
agents[model_config["signature"]] = agent
|
||||
|
||||
# Execute dates
|
||||
for date in job["date_range"]:
|
||||
tasks = []
|
||||
for model_sig, agent in agents.items():
|
||||
task = self.executor.run_model_day_with_agent(
|
||||
job_id, date, agent
|
||||
)
|
||||
tasks.append(task)
|
||||
|
||||
await asyncio.gather(*tasks, return_exceptions=True)
|
||||
```
|
||||
|
||||
**Benefit:** ~10-15s saved per job (avoid repeated MCP handshakes)
|
||||
|
||||
**Tradeoff:** More memory usage (agents kept in memory), more complex error handling
|
||||
|
||||
**Recommendation:** Implement in v2 after MVP validation
|
||||
|
||||
---
|
||||
|
||||
## 6. Error Handling & Recovery
|
||||
|
||||
### 6.1 Model-Day Failure Scenarios
|
||||
|
||||
**Scenario 1: AI Model API Timeout**
|
||||
|
||||
```python
|
||||
# In executor.run_model_day()
|
||||
try:
|
||||
await agent.run_trading_session(date)
|
||||
except asyncio.TimeoutError:
|
||||
error_msg = "AI model API timeout after 30s"
|
||||
self.job_manager.update_job_detail_status(
|
||||
job_id, date, model_sig, "failed", error=error_msg
|
||||
)
|
||||
# Do NOT raise - let other models continue
|
||||
```
|
||||
|
||||
**Scenario 2: MCP Service Down**
|
||||
|
||||
```python
|
||||
# In agent.initialize()
|
||||
except RuntimeError as e:
|
||||
if "Failed to initialize MCP client" in str(e):
|
||||
error_msg = "MCP services unavailable - check agent_tools/start_mcp_services.py"
|
||||
self.job_manager.update_job_detail_status(
|
||||
job_id, date, model_sig, "failed", error=error_msg
|
||||
)
|
||||
# This likely affects all models - but still don't raise, let job_manager determine final status
|
||||
```
|
||||
|
||||
**Scenario 3: Out of Cash**
|
||||
|
||||
```python
|
||||
# In trade tool
|
||||
if position["CASH"] < total_cost:
|
||||
# Trade tool returns error message
|
||||
# Agent receives error, continues reasoning (might sell other stocks)
|
||||
# Not a fatal error - trading session completes normally
|
||||
```
|
||||
|
||||
### 6.2 Job-Level Failure
|
||||
|
||||
**When does entire job fail?**
|
||||
|
||||
Only if:
|
||||
1. Configuration file is invalid/missing
|
||||
2. Agent class import fails
|
||||
3. Database errors during status updates
|
||||
|
||||
In these cases, `worker.run_job()` catches exception and marks job as `failed`.
|
||||
|
||||
All other errors (model-day failures) result in `partial` status.
|
||||
|
||||
---
|
||||
|
||||
## 7. Logging Strategy
|
||||
|
||||
### 7.1 Log Levels by Component
|
||||
|
||||
**Worker (api/worker.py):**
|
||||
- `INFO`: Job start/end, date transitions
|
||||
- `ERROR`: Fatal job errors
|
||||
|
||||
**Executor (api/executor.py):**
|
||||
- `INFO`: Model-day start/completion
|
||||
- `ERROR`: Model-day failures (with exc_info=True)
|
||||
|
||||
**Agent (base_agent.py):**
|
||||
- Existing logging (step-by-step execution)
|
||||
|
||||
### 7.2 Structured Logging Format
|
||||
|
||||
```python
|
||||
import logging
|
||||
import json
|
||||
|
||||
class JSONFormatter(logging.Formatter):
|
||||
def format(self, record):
|
||||
log_record = {
|
||||
"timestamp": self.formatTime(record, self.datefmt),
|
||||
"level": record.levelname,
|
||||
"logger": record.name,
|
||||
"message": record.getMessage(),
|
||||
}
|
||||
|
||||
# Add extra fields if present
|
||||
if hasattr(record, "job_id"):
|
||||
log_record["job_id"] = record.job_id
|
||||
if hasattr(record, "model"):
|
||||
log_record["model"] = record.model
|
||||
if hasattr(record, "date"):
|
||||
log_record["date"] = record.date
|
||||
|
||||
return json.dumps(log_record)
|
||||
|
||||
# Configure logger
|
||||
handler = logging.StreamHandler()
|
||||
handler.setFormatter(JSONFormatter())
|
||||
logger = logging.getLogger("api")
|
||||
logger.addHandler(handler)
|
||||
logger.setLevel(logging.INFO)
|
||||
```
|
||||
|
||||
### 7.3 Log Output Example
|
||||
|
||||
```json
|
||||
{"timestamp": "2025-01-20T14:30:00Z", "level": "INFO", "logger": "api.worker", "message": "Starting simulation job 550e8400-...", "job_id": "550e8400-..."}
|
||||
{"timestamp": "2025-01-20T14:30:01Z", "level": "INFO", "logger": "api.executor", "message": "Starting gpt-5 on 2025-01-16", "job_id": "550e8400-...", "model": "gpt-5", "date": "2025-01-16"}
|
||||
{"timestamp": "2025-01-20T14:30:45Z", "level": "INFO", "logger": "api.executor", "message": "Completed gpt-5 on 2025-01-16", "job_id": "550e8400-...", "model": "gpt-5", "date": "2025-01-16"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Testing Strategy
|
||||
|
||||
### 8.1 Unit Tests
|
||||
|
||||
```python
|
||||
# tests/test_worker.py
|
||||
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
from api.worker import SimulationWorker
|
||||
from api.job_manager import JobManager
|
||||
|
||||
@pytest.fixture
|
||||
def mock_job_manager():
|
||||
jm = MagicMock(spec=JobManager)
|
||||
jm.get_job.return_value = {
|
||||
"job_id": "test-job-123",
|
||||
"config_path": "configs/test.json",
|
||||
"date_range": ["2025-01-16", "2025-01-17"],
|
||||
"models": ["gpt-5"]
|
||||
}
|
||||
return jm
|
||||
|
||||
@pytest.fixture
|
||||
def worker(mock_job_manager):
|
||||
return SimulationWorker(mock_job_manager)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_job_success(worker, mock_job_manager):
|
||||
# Mock executor
|
||||
worker.executor.run_model_day = AsyncMock(return_value=None)
|
||||
|
||||
await worker.run_job("test-job-123")
|
||||
|
||||
# Verify job status updated to running
|
||||
mock_job_manager.update_job_status.assert_any_call("test-job-123", "running")
|
||||
|
||||
# Verify executor called for each model-day
|
||||
assert worker.executor.run_model_day.call_count == 2 # 2 dates × 1 model
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_job_partial_failure(worker, mock_job_manager):
|
||||
# Mock executor - first call succeeds, second fails
|
||||
worker.executor.run_model_day = AsyncMock(
|
||||
side_effect=[None, Exception("API timeout")]
|
||||
)
|
||||
|
||||
await worker.run_job("test-job-123")
|
||||
|
||||
# Job should continue despite one failure
|
||||
assert worker.executor.run_model_day.call_count == 2
|
||||
|
||||
# Job status determined by job_manager based on job_details
|
||||
# (tested in test_job_manager.py)
|
||||
```
|
||||
|
||||
### 8.2 Integration Tests
|
||||
|
||||
```python
|
||||
# tests/test_integration.py
|
||||
|
||||
import pytest
|
||||
from api.main import app
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
client = TestClient(app)
|
||||
|
||||
def test_trigger_and_poll_simulation():
|
||||
# 1. Trigger simulation
|
||||
response = client.post("/simulate/trigger", json={
|
||||
"config_path": "configs/test.json"
|
||||
})
|
||||
assert response.status_code == 202
|
||||
job_id = response.json()["job_id"]
|
||||
|
||||
# 2. Poll status (may need to wait for background task)
|
||||
import time
|
||||
time.sleep(2) # Wait for execution to start
|
||||
|
||||
response = client.get(f"/simulate/status/{job_id}")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["status"] in ("running", "completed")
|
||||
|
||||
# 3. Wait for completion (with timeout)
|
||||
max_wait = 60 # seconds
|
||||
start_time = time.time()
|
||||
while time.time() - start_time < max_wait:
|
||||
response = client.get(f"/simulate/status/{job_id}")
|
||||
status = response.json()["status"]
|
||||
if status in ("completed", "partial", "failed"):
|
||||
break
|
||||
time.sleep(5)
|
||||
|
||||
assert status in ("completed", "partial")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Performance Monitoring
|
||||
|
||||
### 9.1 Metrics to Track
|
||||
|
||||
**Job-level metrics:**
|
||||
- Total duration (from trigger to completion)
|
||||
- Model-day failure rate
|
||||
- Average model-day duration
|
||||
|
||||
**System-level metrics:**
|
||||
- Concurrent job count (should be ≤ 1)
|
||||
- Database query latency
|
||||
- MCP service response times
|
||||
|
||||
### 9.2 Instrumentation (Future)
|
||||
|
||||
```python
|
||||
# api/metrics.py
|
||||
|
||||
from prometheus_client import Counter, Histogram, Gauge
|
||||
|
||||
# Job metrics
|
||||
job_counter = Counter('simulation_jobs_total', 'Total simulation jobs', ['status'])
|
||||
job_duration = Histogram('simulation_job_duration_seconds', 'Job execution time')
|
||||
|
||||
# Model-day metrics
|
||||
model_day_counter = Counter('model_days_total', 'Total model-days', ['model', 'status'])
|
||||
model_day_duration = Histogram('model_day_duration_seconds', 'Model-day execution time', ['model'])
|
||||
|
||||
# System metrics
|
||||
concurrent_jobs = Gauge('concurrent_jobs', 'Number of running jobs')
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# In worker.run_job()
|
||||
with job_duration.time():
|
||||
await self._execute_job_logic(job_id)
|
||||
job_counter.labels(status=final_status).inc()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Concurrency Safety
|
||||
|
||||
### 10.1 Thread Safety
|
||||
|
||||
**FastAPI Background Tasks:**
|
||||
- Run in threadpool (default) or asyncio tasks
|
||||
- For MVP, using asyncio tasks (async functions)
|
||||
|
||||
**SQLite Thread Safety:**
|
||||
- `check_same_thread=False` allows multi-thread access
|
||||
- Each operation opens new connection → Safe for low concurrency
|
||||
|
||||
**File I/O:**
|
||||
- `position.jsonl` writes are sequential per model → Safe
|
||||
- Different models write to different files → Safe
|
||||
|
||||
### 10.2 Race Condition Scenarios
|
||||
|
||||
**Scenario: Two trigger requests at exact same time**
|
||||
|
||||
```
|
||||
Thread A: Check can_start_new_job() → True
|
||||
Thread B: Check can_start_new_job() → True
|
||||
Thread A: Create job → Success
|
||||
Thread B: Create job → Success (PROBLEM: 2 jobs running)
|
||||
```
|
||||
|
||||
**Mitigation: Database-level locking**
|
||||
|
||||
```python
|
||||
def can_start_new_job(self) -> bool:
|
||||
conn = get_db_connection(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Use SELECT ... FOR UPDATE to lock rows (not supported in SQLite)
|
||||
# Instead, use UNIQUE constraint on (status, created_at) for pending/running jobs
|
||||
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) FROM jobs
|
||||
WHERE status IN ('pending', 'running')
|
||||
""")
|
||||
|
||||
count = cursor.fetchone()[0]
|
||||
conn.close()
|
||||
|
||||
return count == 0
|
||||
```
|
||||
|
||||
**For MVP:** Accept risk of rare double-job scenario (extremely unlikely with Windmill polling)
|
||||
|
||||
**For Production:** Use PostgreSQL with row-level locking or distributed lock (Redis)
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The Background Worker provides:
|
||||
1. **Async job execution** with FastAPI BackgroundTasks
|
||||
2. **Parallel model execution** for faster completion
|
||||
3. **Isolated runtime configs** to prevent state collisions
|
||||
4. **Graceful error handling** where model failures don't block others
|
||||
5. **Comprehensive logging** for debugging and monitoring
|
||||
|
||||
**Next specification:** BaseAgent Refactoring for Single-Day Execution
|
||||
Reference in New Issue
Block a user