docs: restructure documentation for improved clarity and navigation

Reorganize documentation into user-focused, developer-focused, and deployment-focused sections.

**New structure:**
- Root: README.md (streamlined), QUICK_START.md, API_REFERENCE.md
- docs/user-guide/: configuration, API usage, integrations, troubleshooting
- docs/developer/: contributing, development setup, testing, architecture
- docs/deployment/: Docker deployment, production checklist, monitoring
- docs/reference/: environment variables, MCP tools, data formats

**Changes:**
- Streamline README.md from 831 to 469 lines
- Create QUICK_START.md for 5-minute onboarding
- Create API_REFERENCE.md as single source of truth for API
- Remove 9 outdated specification docs (v0.2.0 API design)
- Remove DOCKER_API.md (content consolidated into new structure)
- Remove docs/plans/ directory with old design documents
- Update CLAUDE.md with documentation structure guide
- Remove orchestration-specific references

**Benefits:**
- Clear entry points for different audiences
- No content duplication
- Better discoverability through logical hierarchy
- All content reflects current v0.3.0 API
This commit is contained in:
2025-11-01 10:40:57 -04:00
parent c1ebdd4780
commit b3debc125f
36 changed files with 3364 additions and 9643 deletions

View File

@@ -1,197 +0,0 @@
# Data Cache Reuse Design
**Date:** 2025-10-30
**Status:** Approved
## Problem Statement
Docker containers currently fetch all 103 NASDAQ 100 tickers from Alpha Vantage on every startup, even when price data is volume-mounted and already cached in `./data`. This causes:
- Slow startup times (103 API calls)
- Unnecessary API quota consumption
- Rate limit risks during frequent development iterations
## Solution Overview
Implement staleness-based data refresh with configurable age threshold. Container checks all `daily_prices_*.json` files and only refetches if any file is missing or older than `MAX_DATA_AGE_DAYS`.
## Design Decisions
### Architecture Choice
**Selected:** Check all `daily_prices_*.json` files individually
**Rationale:** Ensures data integrity by detecting partial/missing files, not just stale merged data
### Implementation Location
**Selected:** Bash wrapper logic in `entrypoint.sh`
**Rationale:** Keeps data fetching scripts unchanged, adds orchestration at container startup layer
### Staleness Threshold
**Selected:** Configurable via `MAX_DATA_AGE_DAYS` environment variable (default: 7 days)
**Rationale:** Balances freshness with API usage; flexible for different use cases (development vs production)
## Technical Design
### Components
#### 1. Staleness Check Function
Location: `entrypoint.sh` (after environment validation, before data fetch)
```bash
should_refresh_data() {
MAX_AGE=${MAX_DATA_AGE_DAYS:-7}
# Check if at least one price file exists
if ! ls /app/data/daily_prices_*.json >/dev/null 2>&1; then
echo "📭 No price data found"
return 0 # Need refresh
fi
# Find any files older than MAX_AGE days
STALE_COUNT=$(find /app/data -name "daily_prices_*.json" -mtime +$MAX_AGE | wc -l)
TOTAL_COUNT=$(ls /app/data/daily_prices_*.json 2>/dev/null | wc -l)
if [ $STALE_COUNT -gt 0 ]; then
echo "📅 Found $STALE_COUNT stale files (>$MAX_AGE days old)"
return 0 # Need refresh
fi
echo "✅ All $TOTAL_COUNT price files are fresh (<$MAX_AGE days old)"
return 1 # Skip refresh
}
```
**Logic:**
- Uses `find -mtime +N` to detect files modified more than N days ago
- Returns shell exit codes: 0 (refresh needed), 1 (skip refresh)
- Logs informative messages for debugging
#### 2. Conditional Data Fetch
Location: `entrypoint.sh` lines 40-46 (replace existing unconditional fetch)
```bash
# Step 1: Data preparation (conditional)
echo "📊 Checking price data freshness..."
if should_refresh_data; then
echo "🔄 Fetching and merging price data..."
cd /app/data
python /app/scripts/get_daily_price.py
python /app/scripts/merge_jsonl.py
cd /app
else
echo "⏭️ Skipping data fetch (using cached data)"
fi
```
#### 3. Environment Configuration
**docker-compose.yml:**
```yaml
environment:
- MAX_DATA_AGE_DAYS=${MAX_DATA_AGE_DAYS:-7}
```
**.env.example:**
```bash
# Data Refresh Configuration
MAX_DATA_AGE_DAYS=7 # Refresh price data older than N days (0=always refresh)
```
### Data Flow
1. **Container Startup** → entrypoint.sh begins execution
2. **Environment Validation** → Check required API keys (existing logic)
3. **Staleness Check**`should_refresh_data()` scans `/app/data/daily_prices_*.json`
- No files found → Return 0 (refresh)
- Any file older than `MAX_DATA_AGE_DAYS` → Return 0 (refresh)
- All files fresh → Return 1 (skip)
4. **Conditional Fetch** → Run get_daily_price.py only if refresh needed
5. **Merge Data** → Always run merge_jsonl.py (handles missing merged.jsonl)
6. **MCP Services** → Start services (existing logic)
7. **Trading Agent** → Begin trading (existing logic)
### Edge Cases
| Scenario | Behavior |
|----------|----------|
| **First run (no data)** | Detects no files → triggers full fetch |
| **Restart within 7 days** | All files fresh → skips fetch (fast startup) |
| **Restart after 7 days** | Files stale → refreshes all data |
| **Partial data (some files missing)** | Missing files treated as infinitely old → triggers refresh |
| **Corrupt merged.jsonl but fresh price files** | Skips fetch, re-runs merge to rebuild merged.jsonl |
| **MAX_DATA_AGE_DAYS=0** | Always refresh (useful for testing/production) |
| **MAX_DATA_AGE_DAYS unset** | Defaults to 7 days |
| **Alpha Vantage rate limit** | get_daily_price.py handles with warning (existing behavior) |
## Configuration Options
| Variable | Default | Purpose |
|----------|---------|---------|
| `MAX_DATA_AGE_DAYS` | 7 | Days before price data considered stale |
**Special Values:**
- `0` → Always refresh (force fresh data)
- `999` → Never refresh (use cached data indefinitely)
## User Experience
### Scenario 1: Fresh Container
```
🚀 Starting AI-Trader...
🔍 Validating environment variables...
✅ Environment variables validated
📊 Checking price data freshness...
📭 No price data found
🔄 Fetching and merging price data...
✓ Fetched NVDA
✓ Fetched MSFT
...
```
### Scenario 2: Restart Within 7 Days
```
🚀 Starting AI-Trader...
🔍 Validating environment variables...
✅ Environment variables validated
📊 Checking price data freshness...
✅ All 103 price files are fresh (<7 days old)
⏭️ Skipping data fetch (using cached data)
🔧 Starting MCP services...
```
### Scenario 3: Restart After 7 Days
```
🚀 Starting AI-Trader...
🔍 Validating environment variables...
✅ Environment variables validated
📊 Checking price data freshness...
📅 Found 103 stale files (>7 days old)
🔄 Fetching and merging price data...
✓ Fetched NVDA
✓ Fetched MSFT
...
```
## Testing Plan
1. **Test fresh container:** Delete `./data/daily_prices_*.json`, start container → should fetch all
2. **Test cached data:** Restart immediately → should skip fetch
3. **Test staleness:** `touch -d "8 days ago" ./data/daily_prices_AAPL.json`, restart → should refresh
4. **Test partial data:** Delete 10 random price files → should refresh all
5. **Test MAX_DATA_AGE_DAYS=0:** Restart with env var set → should always fetch
6. **Test MAX_DATA_AGE_DAYS=30:** Restart with 8-day-old data → should skip
## Documentation Updates
Files requiring updates:
- `entrypoint.sh` → Add function and conditional logic
- `docker-compose.yml` → Add MAX_DATA_AGE_DAYS environment variable
- `.env.example` → Document MAX_DATA_AGE_DAYS with default value
- `CLAUDE.md` → Update "Docker Deployment" section with new env var
- `docs/DOCKER.md` (if exists) → Explain data caching behavior
## Benefits
- **Development:** Instant container restarts during iteration
- **API Quota:** ~103 fewer API calls per restart
- **Reliability:** No rate limit risks during frequent testing
- **Flexibility:** Configurable threshold for different use cases
- **Consistency:** Checks all files to ensure complete data

View File

@@ -1,491 +0,0 @@
# Docker Deployment and CI/CD Design
**Date:** 2025-10-30
**Status:** Approved
**Target:** Development/local testing environment
## Overview
Package AI-Trader as a Docker container with docker-compose orchestration and automated image builds via GitHub Actions on release tags. Focus on simplicity and ease of use for researchers and developers.
## Requirements
- **Primary Use Case:** Development and local testing
- **Deployment Target:** Single monolithic container (all MCP services + trading agent)
- **Secrets Management:** Environment variables (no mounted .env file)
- **Data Strategy:** Fetch price data on container startup
- **Container Registry:** GitHub Container Registry (ghcr.io)
- **Trigger:** Build images automatically on release tag push (`v*` pattern)
## Architecture
### Components
1. **Dockerfile** - Builds Python 3.10 image with all dependencies
2. **docker-compose.yml** - Orchestrates container with volume mounts and environment config
3. **entrypoint.sh** - Sequential startup script (data fetch → MCP services → trading agent)
4. **GitHub Actions Workflow** - Automated image build and push on release tags
5. **.dockerignore** - Excludes unnecessary files from image
6. **Documentation** - Docker usage guide and examples
### Execution Flow
```
Container Start
entrypoint.sh
1. Fetch/merge price data (get_daily_price.py → merge_jsonl.py)
2. Start MCP services in background (start_mcp_services.py)
3. Wait 3 seconds for service stabilization
4. Run trading agent (main.py with config)
Container Exit → Cleanup MCP services
```
## Detailed Design
### 1. Dockerfile
**Multi-stage build:**
```dockerfile
# Base stage
FROM python:3.10-slim as base
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Application stage
FROM base
WORKDIR /app
# Copy application code
COPY . .
# Create necessary directories
RUN mkdir -p data logs data/agent_data
# Make entrypoint executable
RUN chmod +x entrypoint.sh
# Expose MCP service ports
EXPOSE 8000 8001 8002 8003
# Set Python to run unbuffered
ENV PYTHONUNBUFFERED=1
# Use entrypoint script
ENTRYPOINT ["./entrypoint.sh"]
CMD ["configs/default_config.json"]
```
**Key Features:**
- `python:3.10-slim` base for smaller image size
- Multi-stage for dependency caching
- Non-root user NOT included (dev/testing focus, can add later)
- Unbuffered Python output for real-time logs
- Default config path with override support
### 2. docker-compose.yml
```yaml
version: '3.8'
services:
ai-trader:
build: .
container_name: ai-trader-app
volumes:
- ./data:/app/data
- ./logs:/app/logs
environment:
- OPENAI_API_BASE=${OPENAI_API_BASE}
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ALPHAADVANTAGE_API_KEY=${ALPHAADVANTAGE_API_KEY}
- JINA_API_KEY=${JINA_API_KEY}
- RUNTIME_ENV_PATH=/app/data/runtime_env.json
- MATH_HTTP_PORT=${MATH_HTTP_PORT:-8000}
- SEARCH_HTTP_PORT=${SEARCH_HTTP_PORT:-8001}
- TRADE_HTTP_PORT=${TRADE_HTTP_PORT:-8002}
- GETPRICE_HTTP_PORT=${GETPRICE_HTTP_PORT:-8003}
- AGENT_MAX_STEP=${AGENT_MAX_STEP:-30}
ports:
- "8000:8000"
- "8001:8001"
- "8002:8002"
- "8003:8003"
- "8888:8888" # Optional: web dashboard
restart: unless-stopped
```
**Key Features:**
- Volume mounts for data/logs persistence
- Environment variables interpolated from `.env` file (Docker Compose reads automatically)
- No `.env` file mounted into container (cleaner separation)
- Default port values with override support
- Restart policy for recovery
### 3. entrypoint.sh
```bash
#!/bin/bash
set -e # Exit on any error
echo "🚀 Starting AI-Trader..."
# Step 1: Data preparation
echo "📊 Fetching and merging price data..."
cd /app/data
python get_daily_price.py
python merge_jsonl.py
cd /app
# Step 2: Start MCP services in background
echo "🔧 Starting MCP services..."
cd /app/agent_tools
python start_mcp_services.py &
MCP_PID=$!
cd /app
# Step 3: Wait for services to initialize
echo "⏳ Waiting for MCP services to start..."
sleep 3
# Step 4: Run trading agent with config file
echo "🤖 Starting trading agent..."
CONFIG_FILE="${1:-configs/default_config.json}"
python main.py "$CONFIG_FILE"
# Cleanup on exit
trap "echo '🛑 Stopping MCP services...'; kill $MCP_PID 2>/dev/null" EXIT
```
**Key Features:**
- Sequential execution with clear logging
- MCP services run in background with PID capture
- Trap ensures cleanup on container exit
- Config file path as argument (defaults to `configs/default_config.json`)
- Fail-fast with `set -e`
### 4. GitHub Actions Workflow
**File:** `.github/workflows/docker-release.yml`
```yaml
name: Build and Push Docker Image
on:
push:
tags:
- 'v*' # Triggers on v1.0.0, v2.1.3, etc.
workflow_dispatch: # Manual trigger option
jobs:
build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract version from tag
id: meta
run: |
VERSION=${GITHUB_REF#refs/tags/v}
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
ghcr.io/${{ github.repository_owner }}/ai-trader:${{ steps.meta.outputs.version }}
ghcr.io/${{ github.repository_owner }}/ai-trader:latest
cache-from: type=gha
cache-to: type=gha,mode=max
```
**Key Features:**
- Triggers on `v*` tags (e.g., `git tag v1.0.0 && git push origin v1.0.0`)
- Manual dispatch option for testing
- Uses `GITHUB_TOKEN` (automatically provided, no secrets needed)
- Builds with caching for faster builds
- Tags both version and `latest`
- Multi-platform support possible by adding `platforms: linux/amd64,linux/arm64`
### 5. .dockerignore
```
# Version control
.git/
.gitignore
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
# IDE
.vscode/
.idea/
*.swp
*.swo
# Environment and secrets
.env
.env.*
!.env.example
# Data files (fetched at runtime)
data/*.json
data/agent_data/
data/merged.jsonl
# Logs
logs/
*.log
# Runtime state
runtime_env.json
# Documentation (not needed in image)
*.md
docs/
!README.md
# CI/CD
.github/
```
**Purpose:**
- Reduces image size
- Keeps secrets out of image
- Excludes generated files
- Keeps only necessary source code and scripts
## Documentation Updates
### New File: docs/DOCKER.md
Create comprehensive Docker usage guide including:
1. **Quick Start**
```bash
cp .env.example .env
# Edit .env with your API keys
docker-compose up
```
2. **Configuration**
- Required environment variables
- Optional configuration overrides
- Custom config file usage
3. **Usage Examples**
```bash
# Run with default config
docker-compose up
# Run with custom config
docker-compose run ai-trader configs/my_config.json
# View logs
docker-compose logs -f
# Stop and clean up
docker-compose down
```
4. **Data Persistence**
- How volume mounts work
- Where data is stored
- How to backup/restore
5. **Troubleshooting**
- MCP services not starting → Check logs, verify ports available
- Missing API keys → Check .env file
- Data fetch failures → API rate limits or invalid keys
- Permission issues → Volume mount permissions
6. **Using Pre-built Images**
```bash
docker pull ghcr.io/hkuds/ai-trader:latest
docker run --env-file .env -v $(pwd)/data:/app/data ghcr.io/hkuds/ai-trader:latest
```
### Update .env.example
Add/clarify Docker-specific variables:
```bash
# AI Model API Configuration
OPENAI_API_BASE=https://your-openai-proxy.com/v1
OPENAI_API_KEY=your_openai_key
# Data Source Configuration
ALPHAADVANTAGE_API_KEY=your_alpha_vantage_key
JINA_API_KEY=your_jina_api_key
# System Configuration (Docker defaults)
RUNTIME_ENV_PATH=/app/data/runtime_env.json
# MCP Service Ports
MATH_HTTP_PORT=8000
SEARCH_HTTP_PORT=8001
TRADE_HTTP_PORT=8002
GETPRICE_HTTP_PORT=8003
# Agent Configuration
AGENT_MAX_STEP=30
```
### Update Main README.md
Add Docker section after "Quick Start":
```markdown
## Docker Deployment
### Using Docker Compose (Recommended)
```bash
# Setup environment
cp .env.example .env
# Edit .env with your API keys
# Run with docker-compose
docker-compose up
```
### Using Pre-built Images
```bash
# Pull latest image
docker pull ghcr.io/hkuds/ai-trader:latest
# Run container
docker run --env-file .env \
-v $(pwd)/data:/app/data \
-v $(pwd)/logs:/app/logs \
ghcr.io/hkuds/ai-trader:latest
```
See [docs/DOCKER.md](docs/DOCKER.md) for detailed Docker usage guide.
```
## Release Process
### For Maintainers
1. **Prepare release:**
```bash
# Ensure main branch is ready
git checkout main
git pull origin main
```
2. **Create and push tag:**
```bash
git tag v1.0.0
git push origin v1.0.0
```
3. **GitHub Actions automatically:**
- Builds Docker image
- Tags with version and `latest`
- Pushes to `ghcr.io/hkuds/ai-trader`
4. **Verify build:**
- Check Actions tab for build status
- Test pull: `docker pull ghcr.io/hkuds/ai-trader:v1.0.0`
5. **Optional: Create GitHub Release**
- Add release notes
- Include Docker pull command
### For Users
```bash
# Pull specific version
docker pull ghcr.io/hkuds/ai-trader:v1.0.0
# Or always get latest
docker pull ghcr.io/hkuds/ai-trader:latest
```
## Implementation Checklist
- [ ] Create Dockerfile with multi-stage build
- [ ] Create docker-compose.yml with volume mounts and environment config
- [ ] Create entrypoint.sh with sequential startup logic
- [ ] Create .dockerignore to exclude unnecessary files
- [ ] Create .github/workflows/docker-release.yml for CI/CD
- [ ] Create docs/DOCKER.md with comprehensive usage guide
- [ ] Update .env.example with Docker-specific variables
- [ ] Update main README.md with Docker deployment section
- [ ] Test local build: `docker-compose build`
- [ ] Test local run: `docker-compose up`
- [ ] Test with custom config
- [ ] Verify data persistence across container restarts
- [ ] Test GitHub Actions workflow (create test tag)
- [ ] Verify image pushed to ghcr.io
- [ ] Test pulling and running pre-built image
- [ ] Update CLAUDE.md with Docker commands
## Future Enhancements
Possible improvements for production use:
1. **Multi-container Architecture**
- Separate containers for each MCP service
- Better isolation and independent scaling
- More complex orchestration
2. **Security Hardening**
- Non-root user in container
- Docker secrets for production
- Read-only filesystem where possible
3. **Monitoring**
- Health checks for MCP services
- Prometheus metrics export
- Logging aggregation
4. **Optimization**
- Multi-platform builds (ARM64 support)
- Smaller base image (alpine)
- Layer caching optimization
5. **Development Tools**
- docker-compose.dev.yml with hot reload
- Debug container with additional tools
- Integration test container
These are deferred to keep initial implementation simple and focused on development/testing use cases.

File diff suppressed because it is too large Load Diff

View File

@@ -1,102 +0,0 @@
Docker Build Test Results
==========================
Date: 2025-10-30
Branch: docker-deployment
Working Directory: /home/bballou/AI-Trader/.worktrees/docker-deployment
Test 1: Docker Image Build
---------------------------
Command: docker-compose build
Status: SUCCESS
Result: Successfully built image 7b36b8f4c0e9
Build Output Summary:
- Base image: python:3.10-slim
- Build stages: Multi-stage build (base + application)
- Dependencies installed successfully from requirements.txt
- Application code copied
- Directories created: data, logs, data/agent_data
- Entrypoint script made executable
- Ports exposed: 8000, 8001, 8002, 8003, 8888
- Environment: PYTHONUNBUFFERED=1 set
- Image size: 266MB
- Build time: ~2 minutes (including dependency installation)
Key packages installed:
- langchain==1.0.2
- langchain-openai==1.0.1
- langchain-mcp-adapters>=0.1.0
- fastmcp==2.12.5
- langgraph<1.1.0,>=1.0.0
- pydantic<3.0.0,>=2.7.4
- openai<3.0.0,>=1.109.1
- All dependencies resolved without conflicts
Test 2: Image Verification
---------------------------
Command: docker images | grep ai-trader
Status: SUCCESS
Result: docker-deployment_ai-trader latest 7b36b8f4c0e9 9 seconds ago 266MB
Image Details:
- Repository: docker-deployment_ai-trader
- Tag: latest
- Image ID: 7b36b8f4c0e9
- Created: Just now
- Size: 266MB (reasonable for Python 3.10 + ML dependencies)
Test 3: Configuration Parsing (Dry-Run)
----------------------------------------
Command: docker-compose --env-file .env.test config
Status: SUCCESS
Result: Configuration parsed correctly without errors
Test .env.test contents:
OPENAI_API_KEY=test
ALPHAADVANTAGE_API_KEY=test
JINA_API_KEY=test
RUNTIME_ENV_PATH=/app/data/runtime_env.json
Parsed Configuration:
- Service name: ai-trader
- Container name: ai-trader-app
- Build context: /home/bballou/AI-Trader/.worktrees/docker-deployment
- Environment variables correctly injected:
* AGENT_MAX_STEP: '30' (default)
* ALPHAADVANTAGE_API_KEY: test
* GETPRICE_HTTP_PORT: '8003' (default)
* JINA_API_KEY: test
* MATH_HTTP_PORT: '8000' (default)
* OPENAI_API_BASE: '' (not set, defaulted to blank)
* OPENAI_API_KEY: test
* RUNTIME_ENV_PATH: /app/data/runtime_env.json
* SEARCH_HTTP_PORT: '8001' (default)
* TRADE_HTTP_PORT: '8002' (default)
- Ports correctly mapped: 8000, 8001, 8002, 8003, 8888
- Volumes correctly configured:
* ./data:/app/data:rw
* ./logs:/app/logs:rw
- Restart policy: unless-stopped
- Docker Compose version: 3.8
Summary
-------
All Docker build tests PASSED successfully:
✓ Docker image builds without errors
✓ Image created with reasonable size (266MB)
✓ Multi-stage build optimizes layer caching
✓ All Python dependencies install correctly
✓ Configuration parsing works with test environment
✓ Environment variables properly injected
✓ Volume mounts configured correctly
✓ Port mappings set up correctly
✓ Restart policy configured
No issues encountered during local Docker build testing.
The Docker deployment is ready for use.
Next Steps:
1. Test actual container startup with valid API keys
2. Verify MCP services start correctly in container
3. Test trading agent execution
4. Consider creating test tag for GitHub Actions CI/CD verification