mirror of
https://github.com/Xe138/AI-Trader.git
synced 2026-04-02 09:37:23 -04:00
Major architecture transformation from batch-only to API service with
database persistence for Windmill integration.
## REST API Implementation
- POST /simulate/trigger - Start simulation jobs
- GET /simulate/status/{job_id} - Monitor job progress
- GET /results - Query results with filters (job_id, date, model)
- GET /health - Service health checks
## Database Layer
- SQLite persistence with 6 tables (jobs, job_details, positions,
holdings, reasoning_logs, tool_usage)
- Foreign key constraints with cascade deletes
- Replaces JSONL file storage
## Backend Components
- JobManager: Job lifecycle management with concurrency control
- RuntimeConfigManager: Thread-safe isolated runtime configs
- ModelDayExecutor: Single model-day execution engine
- SimulationWorker: Date-sequential, model-parallel orchestration
## Testing
- 102 unit and integration tests (85% coverage)
- Database: 98% coverage
- Job manager: 98% coverage
- API endpoints: 81% coverage
- Pydantic models: 100% coverage
- TDD approach throughout
## Docker Deployment
- Dual-mode: API server (persistent) + batch (one-time)
- Health checks with 30s interval
- Volume persistence for database and logs
- Separate entrypoints for each mode
## Validation Tools
- scripts/validate_docker_build.sh - Build validation
- scripts/test_api_endpoints.sh - Complete API testing
- scripts/test_batch_mode.sh - Batch mode validation
- DOCKER_API.md - Deployment guide
- TESTING_GUIDE.md - Testing procedures
## Configuration
- API_PORT environment variable (default: 8080)
- Backwards compatible with existing configs
- FastAPI, uvicorn, pydantic>=2.0 dependencies
Co-Authored-By: AI Assistant <noreply@example.com>
437 lines
12 KiB
Markdown
437 lines
12 KiB
Markdown
# AI-Trader API Service - Technical Specifications Summary
|
||
|
||
## Overview
|
||
|
||
This directory contains comprehensive technical specifications for transforming the AI-Trader batch simulation system into an API service compatible with Windmill automation.
|
||
|
||
## Specification Documents
|
||
|
||
### 1. [API Specification](./api-specification.md)
|
||
**Purpose:** Defines all API endpoints, request/response formats, and data models
|
||
|
||
**Key Contents:**
|
||
- **5 REST Endpoints:**
|
||
- `POST /simulate/trigger` - Queue catch-up simulation job
|
||
- `GET /simulate/status/{job_id}` - Poll job progress
|
||
- `GET /simulate/current` - Get latest job
|
||
- `GET /results` - Retrieve simulation results (minimal/full detail)
|
||
- `GET /health` - Service health check
|
||
- **Pydantic Models** for type-safe request/response handling
|
||
- **Error Handling** strategies and HTTP status codes
|
||
- **SQLite Schema** for jobs and job_details tables
|
||
- **Configuration Management** via environment variables
|
||
|
||
**Status Codes:** 200 OK, 202 Accepted, 400 Bad Request, 404 Not Found, 409 Conflict, 503 Service Unavailable
|
||
|
||
---
|
||
|
||
### 2. [Job Manager Specification](./job-manager-specification.md)
|
||
**Purpose:** Details the job tracking and database layer
|
||
|
||
**Key Contents:**
|
||
- **SQLite Database Schema:**
|
||
- `jobs` table - High-level job metadata
|
||
- `job_details` table - Per model-day execution tracking
|
||
- **JobManager Class Interface:**
|
||
- `create_job()` - Create new simulation job
|
||
- `get_job()` - Retrieve job by ID
|
||
- `update_job_status()` - State transitions (pending → running → completed/partial/failed)
|
||
- `get_job_progress()` - Detailed progress metrics
|
||
- `can_start_new_job()` - Concurrency control
|
||
- **State Machine:** Job status transitions and business logic
|
||
- **Concurrency Control:** Single-job execution enforcement
|
||
- **Testing Strategy:** Unit tests with temporary databases
|
||
|
||
**Key Feature:** Independent model execution - one model's failure doesn't block others (results in "partial" status)
|
||
|
||
---
|
||
|
||
### 3. [Background Worker Specification](./worker-specification.md)
|
||
**Purpose:** Defines async job execution architecture
|
||
|
||
**Key Contents:**
|
||
- **Execution Pattern:** Date-sequential, Model-parallel
|
||
- All models for Date 1 run in parallel
|
||
- Date 2 starts only after all models finish Date 1
|
||
- Ensures position.jsonl integrity (no concurrent writes)
|
||
- **SimulationWorker Class:**
|
||
- Orchestrates job execution
|
||
- Manages date sequencing
|
||
- Handles job-level errors
|
||
- **ModelDayExecutor Class:**
|
||
- Executes single model-day simulation
|
||
- Updates job_detail status
|
||
- Isolates runtime configuration
|
||
- **RuntimeConfigManager:**
|
||
- Creates temporary runtime_env_{job_id}_{model}_{date}.json files
|
||
- Prevents state collisions between concurrent models
|
||
- Cleans up after execution
|
||
- **Error Handling:** Graceful failure (models continue despite peer failures)
|
||
- **Logging:** Structured JSON logging with job/model/date context
|
||
|
||
**Performance:** 3 models × 5 days = ~7-15 minutes (vs. ~22-45 minutes sequential)
|
||
|
||
---
|
||
|
||
### 4. [Implementation Specification](./implementation-specifications.md)
|
||
**Purpose:** Complete implementation guide covering Agent, Docker, and Windmill
|
||
|
||
**Key Contents:**
|
||
|
||
#### Part 1: BaseAgent Refactoring
|
||
- **Analysis:** Existing `run_trading_session()` already compatible with API mode
|
||
- **Required Changes:** ✅ NONE! Existing code works as-is
|
||
- **Worker Integration:** Calls `agent.run_trading_session(date)` directly
|
||
|
||
#### Part 2: Docker Configuration
|
||
- **Modified Dockerfile:** Adds FastAPI dependencies, new entrypoint
|
||
- **docker-entrypoint-api.sh:** Starts MCP services → launches uvicorn
|
||
- **Health Checks:** Verifies MCP services and database connectivity
|
||
- **Volume Mounts:** `./data`, `./configs` for persistence
|
||
|
||
#### Part 3: Windmill Integration
|
||
- **Flow 1: trigger_simulation.ts** - Daily cron triggers API
|
||
- **Flow 2: poll_simulation_status.ts** - Polls every 5 min until complete
|
||
- **Flow 3: store_simulation_results.py** - Stores results in Windmill DB
|
||
- **Dashboard:** Charts and tables showing portfolio performance
|
||
- **Workflow Orchestration:** Complete YAML workflow definition
|
||
|
||
#### Part 4: File Structure
|
||
- New `api/` directory with 7 modules
|
||
- New `windmill/` directory with scripts and dashboard
|
||
- New `docs/` directory (this folder)
|
||
- `data/jobs.db` for job tracking
|
||
|
||
#### Part 5: Implementation Checklist
|
||
10-day implementation plan broken into 6 phases
|
||
|
||
---
|
||
|
||
## Architecture Highlights
|
||
|
||
### Request Flow
|
||
|
||
```
|
||
1. Windmill → POST /simulate/trigger
|
||
2. API creates job in SQLite (status: pending)
|
||
3. API queues BackgroundTask
|
||
4. API returns 202 Accepted with job_id
|
||
↓
|
||
5. Worker starts (status: running)
|
||
6. For each date sequentially:
|
||
For each model in parallel:
|
||
- Create isolated runtime config
|
||
- Execute agent.run_trading_session(date)
|
||
- Update job_detail status
|
||
7. Worker finishes (status: completed/partial/failed)
|
||
↓
|
||
8. Windmill polls GET /simulate/status/{job_id}
|
||
9. When complete: Windmill calls GET /results?date=X
|
||
10. Windmill stores results in internal DB
|
||
11. Windmill dashboard displays performance
|
||
```
|
||
|
||
### Data Flow
|
||
|
||
```
|
||
Input: configs/default_config.json
|
||
↓
|
||
API: Calculates date_range (last position → today)
|
||
↓
|
||
Worker: Executes simulations
|
||
↓
|
||
Output: data/agent_data/{model}/position/position.jsonl
|
||
data/agent_data/{model}/log/{date}/log.jsonl
|
||
data/jobs.db (job tracking)
|
||
↓
|
||
API: Reads position.jsonl + calculates P&L
|
||
↓
|
||
Windmill: Stores in internal DB → Dashboard visualization
|
||
```
|
||
|
||
---
|
||
|
||
## Key Design Decisions
|
||
|
||
### 1. Pattern B: Lazy On-Demand Processing
|
||
- **Chosen:** Windmill controls simulation timing via API calls
|
||
- **Benefit:** Centralized scheduling in Windmill
|
||
- **Tradeoff:** First Windmill call of the day triggers long-running job
|
||
|
||
### 2. SQLite vs. PostgreSQL
|
||
- **Chosen:** SQLite for MVP
|
||
- **Rationale:** Low concurrency (1 job at a time), simple deployment
|
||
- **Future:** PostgreSQL for production with multiple concurrent jobs
|
||
|
||
### 3. Date-Sequential, Model-Parallel Execution
|
||
- **Chosen:** Dates run sequentially, models run in parallel per date
|
||
- **Rationale:** Prevents position.jsonl race conditions, faster than fully sequential
|
||
- **Performance:** ~50% faster than sequential (3 models in parallel)
|
||
|
||
### 4. Independent Model Failures
|
||
- **Chosen:** One model's failure doesn't block others
|
||
- **Benefit:** Partial results better than no results
|
||
- **Implementation:** Job status becomes "partial" if any model fails
|
||
|
||
### 5. Minimal BaseAgent Changes
|
||
- **Chosen:** No modifications to agent code
|
||
- **Rationale:** Existing `run_trading_session()` is perfect API interface
|
||
- **Benefit:** Maintains backward compatibility with batch mode
|
||
|
||
---
|
||
|
||
## Implementation Prerequisites
|
||
|
||
### Required Environment Variables
|
||
```bash
|
||
OPENAI_API_BASE=...
|
||
OPENAI_API_KEY=...
|
||
ALPHAADVANTAGE_API_KEY=...
|
||
JINA_API_KEY=...
|
||
RUNTIME_ENV_PATH=/app/data/runtime_env.json
|
||
MATH_HTTP_PORT=8000
|
||
SEARCH_HTTP_PORT=8001
|
||
TRADE_HTTP_PORT=8002
|
||
GETPRICE_HTTP_PORT=8003
|
||
API_HOST=0.0.0.0
|
||
API_PORT=8080
|
||
```
|
||
|
||
### Required Python Packages (new)
|
||
```
|
||
fastapi==0.109.0
|
||
uvicorn[standard]==0.27.0
|
||
pydantic==2.5.3
|
||
```
|
||
|
||
### Docker Requirements
|
||
- Docker Engine 20.10+
|
||
- Docker Compose 2.0+
|
||
- 2GB RAM minimum for container
|
||
- 10GB disk space for data
|
||
|
||
### Windmill Requirements
|
||
- Windmill instance (self-hosted or cloud)
|
||
- Network access from Windmill to AI-Trader API
|
||
- Windmill CLI for deployment (optional)
|
||
|
||
---
|
||
|
||
## Testing Strategy
|
||
|
||
### Unit Tests
|
||
- `tests/test_job_manager.py` - Database operations
|
||
- `tests/test_worker.py` - Job execution logic
|
||
- `tests/test_executor.py` - Model-day execution
|
||
|
||
### Integration Tests
|
||
- `tests/test_api_endpoints.py` - FastAPI endpoint behavior
|
||
- `tests/test_end_to_end.py` - Full workflow (trigger → execute → retrieve)
|
||
|
||
### Manual Testing
|
||
- Docker container startup
|
||
- Health check endpoint
|
||
- Windmill workflow execution
|
||
- Dashboard visualization
|
||
|
||
---
|
||
|
||
## Performance Expectations
|
||
|
||
### Single Model-Day Execution
|
||
- **Duration:** 30-60 seconds (varies by AI model latency)
|
||
- **Bottlenecks:** AI API calls, MCP tool latency
|
||
|
||
### Multi-Model Job
|
||
- **Example:** 3 models × 5 days = 15 model-days
|
||
- **Parallel Execution:** ~7-15 minutes
|
||
- **Sequential Execution:** ~22-45 minutes
|
||
- **Speedup:** ~3x (number of models)
|
||
|
||
### API Response Times
|
||
- `/simulate/trigger`: < 1 second (just queues job)
|
||
- `/simulate/status`: < 100ms (SQLite query)
|
||
- `/results?detail=minimal`: < 500ms (file read + JSON parsing)
|
||
- `/results?detail=full`: < 2 seconds (parse log files)
|
||
|
||
---
|
||
|
||
## Security Considerations
|
||
|
||
### MVP Security
|
||
- **Network Isolation:** Docker network (no public exposure)
|
||
- **No Authentication:** Assumes Windmill → API is trusted network
|
||
|
||
### Future Enhancements
|
||
- API key authentication (`X-API-Key` header)
|
||
- Rate limiting per client
|
||
- HTTPS/TLS encryption
|
||
- Input sanitization for path traversal prevention
|
||
|
||
---
|
||
|
||
## Deployment Steps
|
||
|
||
### 1. Build Docker Image
|
||
```bash
|
||
docker-compose build
|
||
```
|
||
|
||
### 2. Start API Service
|
||
```bash
|
||
docker-compose up -d
|
||
```
|
||
|
||
### 3. Verify Health
|
||
```bash
|
||
curl http://localhost:8080/health
|
||
```
|
||
|
||
### 4. Test Trigger
|
||
```bash
|
||
curl -X POST http://localhost:8080/simulate/trigger \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"config_path": "configs/default_config.json"}'
|
||
```
|
||
|
||
### 5. Deploy Windmill Scripts
|
||
```bash
|
||
wmill script push windmill/trigger_simulation.ts
|
||
wmill script push windmill/poll_simulation_status.ts
|
||
wmill script push windmill/store_simulation_results.py
|
||
```
|
||
|
||
### 6. Create Windmill Workflow
|
||
- Import `windmill/daily_simulation_workflow.yaml`
|
||
- Configure resource `ai_trader_api` with API URL
|
||
- Set cron schedule (daily 6 AM)
|
||
|
||
### 7. Create Windmill Dashboard
|
||
- Import `windmill/dashboard.json`
|
||
- Verify data visualization
|
||
|
||
---
|
||
|
||
## Troubleshooting Guide
|
||
|
||
### Issue: Health check fails
|
||
**Symptoms:** `curl http://localhost:8080/health` returns 503
|
||
|
||
**Possible Causes:**
|
||
1. MCP services not running
|
||
2. Database file permission error
|
||
3. API server not started
|
||
|
||
**Solutions:**
|
||
```bash
|
||
# Check MCP services
|
||
docker-compose exec ai-trader curl http://localhost:8000/health
|
||
|
||
# Check API logs
|
||
docker-compose logs -f ai-trader
|
||
|
||
# Restart container
|
||
docker-compose restart
|
||
```
|
||
|
||
### Issue: Job stuck in "running" status
|
||
**Symptoms:** Job never completes, status remains "running"
|
||
|
||
**Possible Causes:**
|
||
1. Agent execution crashed
|
||
2. Model API timeout
|
||
3. Worker process died
|
||
|
||
**Solutions:**
|
||
```bash
|
||
# Check job details for error messages
|
||
curl http://localhost:8080/simulate/status/{job_id}
|
||
|
||
# Check container logs
|
||
docker-compose logs -f ai-trader
|
||
|
||
# If API restarted, stale jobs are marked as failed on startup
|
||
docker-compose restart
|
||
```
|
||
|
||
### Issue: Windmill can't reach API
|
||
**Symptoms:** Connection refused from Windmill scripts
|
||
|
||
**Solutions:**
|
||
- Verify Windmill and AI-Trader on same Docker network
|
||
- Check firewall rules
|
||
- Use container name (ai-trader) instead of localhost in Windmill resource
|
||
- Verify API_PORT environment variable
|
||
|
||
---
|
||
|
||
## Migration from Batch Mode
|
||
|
||
### For Users Currently Running Batch Mode
|
||
|
||
**Option 1: Dual Mode (Recommended)**
|
||
- Keep existing `main.py` for manual testing
|
||
- Add new API mode for production automation
|
||
- Use different config files for each mode
|
||
|
||
**Option 2: API-Only**
|
||
- Replace batch execution entirely
|
||
- All simulations via API calls
|
||
- More consistent with production workflow
|
||
|
||
### Migration Checklist
|
||
- [ ] Backup existing `data/` directory
|
||
- [ ] Update `.env` with API configuration
|
||
- [ ] Test API mode in separate environment first
|
||
- [ ] Gradually migrate Windmill workflows
|
||
- [ ] Monitor logs for errors
|
||
- [ ] Validate results match batch mode output
|
||
|
||
---
|
||
|
||
## Next Steps
|
||
|
||
1. **Review Specifications**
|
||
- Read all 4 specification documents
|
||
- Ask clarifying questions
|
||
- Approve design before implementation
|
||
|
||
2. **Implementation Phase 1** (Days 1-2)
|
||
- Set up `api/` directory structure
|
||
- Implement database and job_manager
|
||
- Write unit tests
|
||
|
||
3. **Implementation Phase 2** (Days 3-4)
|
||
- Implement worker and executor
|
||
- Test with mock agents
|
||
|
||
4. **Implementation Phase 3** (Days 5-6)
|
||
- Implement FastAPI endpoints
|
||
- Test with Postman/curl
|
||
|
||
5. **Implementation Phase 4** (Day 7)
|
||
- Docker integration
|
||
- End-to-end testing
|
||
|
||
6. **Implementation Phase 5** (Days 8-9)
|
||
- Windmill integration
|
||
- Dashboard creation
|
||
|
||
7. **Implementation Phase 6** (Day 10)
|
||
- Final testing
|
||
- Documentation
|
||
|
||
---
|
||
|
||
## Questions or Feedback?
|
||
|
||
Please review all specifications and provide feedback on:
|
||
1. API endpoint design
|
||
2. Database schema
|
||
3. Execution pattern (date-sequential, model-parallel)
|
||
4. Error handling approach
|
||
5. Windmill integration workflow
|
||
6. Any concerns or suggested improvements
|
||
|
||
**Ready to proceed with implementation?** Confirm approval of specifications to begin Phase 1.
|