Files
AI-Trader/docs/user-guide/troubleshooting.md
Bill 573264c49f docs: update user-guide docs for AI-Trader-Server rebrand
Update all user-guide documentation files:
- configuration.md: Update title and container name references
- using-the-api.md: Update title
- integration-examples.md: Update title, class names
  (AsyncAITraderServerClient), container names, DAG names, and log paths
- troubleshooting.md: Update title, container names (ai-trader to
  ai-trader-server), GitHub issues URL

All Docker commands and code examples now reference ai-trader-server
container name.

Part of Phase 3: Developer & Deployment Documentation
2025-11-01 11:56:01 -04:00

9.2 KiB

Troubleshooting Guide

Common issues and solutions for AI-Trader-Server.


Container Issues

Container Won't Start

Symptoms:

  • docker ps shows no ai-trader-server container
  • Container exits immediately after starting

Debug:

# Check logs
docker logs ai-trader-server

# Check if container exists (stopped)
docker ps -a | grep ai-trader-server

Common Causes & Solutions:

1. Missing API Keys

# Verify .env file
cat .env | grep -E "OPENAI_API_KEY|ALPHAADVANTAGE_API_KEY|JINA_API_KEY"

# Should show all three keys with values

Solution: Add missing keys to .env

2. Port Already in Use

# Check what's using port 8080
sudo lsof -i :8080  # Linux/Mac
netstat -ano | findstr :8080  # Windows

Solution: Change port in .env:

echo "API_PORT=8889" >> .env
docker-compose down
docker-compose up -d

3. Volume Permission Issues

# Fix permissions
chmod -R 755 data logs configs

Health Check Fails

Symptoms:

  • curl http://localhost:8080/health returns error or HTML page
  • Container running but API not responding

Debug:

# Check if API process is running
docker exec ai-trader-server ps aux | grep uvicorn

# Test internal health (always port 8080 inside container)
docker exec ai-trader-server curl http://localhost:8080/health

# Check configured port
grep API_PORT .env

Solutions:

If you get HTML 404 page: Another service is using your configured port.

# Find conflicting service
sudo lsof -i :8080

# Change AI-Trader-Server port
echo "API_PORT=8889" >> .env
docker-compose down
docker-compose up -d

# Now use new port
curl http://localhost:8889/health

If MCP services didn't start:

# Check MCP processes
docker exec ai-trader-server ps aux | grep python

# Should see 4 MCP services on ports 8000-8003

If database issues:

# Check database file
docker exec ai-trader-server ls -l /app/data/jobs.db

# If missing, restart to recreate
docker-compose restart

Simulation Issues

Job Stays in "Pending" Status

Symptoms:

  • Job triggered but never progresses to "running"
  • Status remains "pending" indefinitely

Debug:

# Check worker logs
docker logs ai-trader-server | grep -i "worker\|simulation"

# Check database
docker exec ai-trader-server sqlite3 /app/data/jobs.db "SELECT * FROM job_details;"

# Check MCP service accessibility
docker exec ai-trader-server curl http://localhost:8000/health

Solutions:

# Restart container (jobs resume automatically)
docker-compose restart

# Check specific job status with details
curl http://localhost:8080/simulate/status/$JOB_ID | jq '.details'

Job Takes Too Long / Timeouts

Symptoms:

  • Jobs taking longer than expected
  • Test scripts timing out

Expected Execution Times:

  • Single model-day: 2-5 minutes (with cached price data)
  • First run with data download: 10-15 minutes
  • 2-date, 2-model job: 10-20 minutes

Solutions:

Increase poll timeout in monitoring:

# Instead of fixed polling, use this
while true; do
  STATUS=$(curl -s http://localhost:8080/simulate/status/$JOB_ID | jq -r '.status')
  echo "$(date): Status = $STATUS"
  
  if [[ "$STATUS" == "completed" ]] || [[ "$STATUS" == "partial" ]] || [[ "$STATUS" == "failed" ]]; then
    break
  fi
  
  sleep 30
done

Check if agent is stuck:

# View real-time logs
docker logs -f ai-trader-server

# Look for repeated errors or infinite loops

"No trading dates with complete price data"

Error Message:

No trading dates with complete price data in range 2025-01-16 to 2025-01-17. 
All symbols must have data for a date to be tradeable.

Cause: Missing price data for requested dates.

Solutions:

Option 1: Try Recent Dates

Use more recent dates where data is more likely available:

curl -X POST http://localhost:8080/simulate/trigger \
  -H "Content-Type: application/json" \
  -d '{"start_date": "2024-12-15", "models": ["gpt-4"]}'

Option 2: Manually Download Data

docker exec -it ai-trader-server bash
cd data
python get_daily_price.py  # Downloads latest data
python merge_jsonl.py       # Merges into database
exit

# Retry simulation

Option 3: Check Auto-Download Setting

# Ensure auto-download is enabled
grep AUTO_DOWNLOAD_PRICE_DATA .env

# Should be: AUTO_DOWNLOAD_PRICE_DATA=true

Rate Limit Errors

Symptoms:

  • Logs show "rate limit" messages
  • Partial data downloaded

Cause: Alpha Vantage API rate limits (5 req/min free tier, 75 req/min premium)

Solutions:

For free tier:

  • Simulations automatically continue with available data
  • Next simulation resumes downloads
  • Consider upgrading to premium API key

Workaround:

# Pre-download data in batches
docker exec -it ai-trader-server bash
cd data

# Download in stages (wait 1 min between runs)
python get_daily_price.py
sleep 60
python get_daily_price.py
sleep 60
python get_daily_price.py

python merge_jsonl.py
exit

API Issues

400 Bad Request: Another Job Running

Error:

{
  "detail": "Another simulation job is already running or pending. Please wait for it to complete."
}

Cause: AI-Trader-Server allows only 1 concurrent job by default.

Solutions:

Check current jobs:

# Find running job
curl http://localhost:8080/health  # Verify API is up

# Query recent jobs (need to check database)
docker exec ai-trader-server sqlite3 /app/data/jobs.db \
  "SELECT job_id, status FROM jobs ORDER BY created_at DESC LIMIT 5;"

Wait for completion:

# Get the blocking job's status
curl http://localhost:8080/simulate/status/{job_id}

Force-stop stuck job (last resort):

# Update job status in database
docker exec ai-trader-server sqlite3 /app/data/jobs.db \
  "UPDATE jobs SET status='failed' WHERE status IN ('pending', 'running');"

# Restart service
docker-compose restart

Invalid Date Format Errors

Error:

{
  "detail": "Invalid date format: 2025-1-16. Expected YYYY-MM-DD"
}

Solution: Use zero-padded dates:

# Wrong
{"start_date": "2025-1-16"}

# Correct
{"start_date": "2025-01-16"}

Date Range Too Large

Error:

{
  "detail": "Date range too large: 45 days. Maximum allowed: 30 days"
}

Solution: Split into smaller batches:

# Instead of 2025-01-01 to 2025-02-15 (45 days)
# Run as two jobs:

# Job 1: Jan 1-30
curl -X POST http://localhost:8080/simulate/trigger \
  -d '{"start_date": "2025-01-01", "end_date": "2025-01-30"}'

# Job 2: Jan 31 - Feb 15
curl -X POST http://localhost:8080/simulate/trigger \
  -d '{"start_date": "2025-01-31", "end_date": "2025-02-15"}'

Data Issues

Database Corruption

Symptoms:

  • "database disk image is malformed"
  • Unexpected SQL errors

Solutions:

Backup and rebuild:

# Stop service
docker-compose down

# Backup current database
cp data/jobs.db data/jobs.db.backup

# Try recovery
docker run --rm -v $(pwd)/data:/data alpine sqlite3 /data/jobs.db "PRAGMA integrity_check;"

# If corrupted, delete and restart (loses job history)
rm data/jobs.db
docker-compose up -d

Missing Price Data Files

Symptoms:

  • Errors about missing merged.jsonl
  • Price query failures

Solution:

# Re-download price data
docker exec -it ai-trader-server bash
cd data
python get_daily_price.py
python merge_jsonl.py
ls -lh merged.jsonl  # Should exist
exit

Performance Issues

Slow Simulation Execution

Typical speeds:

  • Single model-day: 2-5 minutes
  • With cold start (first time): +3-5 minutes

Causes & Solutions:

1. AI Model API is slow

  • Check AI provider status page
  • Try different model
  • Increase timeout in config

2. Network latency

  • Check internet connection
  • Jina Search API might be slow

3. MCP services overloaded

# Check CPU usage
docker stats ai-trader-server

High Memory Usage

Normal: 500MB - 1GB during simulation

If higher:

# Check memory
docker stats ai-trader-server

# Restart if needed
docker-compose restart

Diagnostic Commands

# Container status
docker ps | grep ai-trader-server

# Real-time logs
docker logs -f ai-trader-server

# Check errors only
docker logs ai-trader-server 2>&1 | grep -i error

# Container resource usage
docker stats ai-trader-server

# Access container shell
docker exec -it ai-trader-server bash

# Database inspection
docker exec -it ai-trader-server sqlite3 /app/data/jobs.db
sqlite> SELECT * FROM jobs ORDER BY created_at DESC LIMIT 5;
sqlite> SELECT status, COUNT(*) FROM jobs GROUP BY status;
sqlite> .quit

# Check file permissions
docker exec ai-trader-server ls -la /app/data

# Test API connectivity
curl -v http://localhost:8080/health

# View all environment variables
docker exec ai-trader-server env | sort

Getting More Help

If your issue isn't covered here:

  1. Check logs for specific error messages
  2. Review API_REFERENCE.md for correct usage
  3. Search GitHub Issues
  4. Open new issue with:
    • Error messages from logs
    • Steps to reproduce
    • Environment details (OS, Docker version)
    • Relevant config files (redact API keys)