mirror of
https://github.com/Xe138/AI-Trader.git
synced 2026-04-01 17:17:24 -04:00
docs: add data cache reuse design document
Captures design for staleness-based data refresh to avoid re-fetching all 103 NASDAQ tickers on every container restart. Key features: - Check all daily_prices_*.json files for staleness - Configurable MAX_DATA_AGE_DAYS threshold (default: 7) - Bash wrapper logic in entrypoint.sh - Handles edge cases (partial data, missing files, rate limits)
This commit is contained in:
197
docs/plans/2025-10-30-data-cache-reuse-design.md
Normal file
197
docs/plans/2025-10-30-data-cache-reuse-design.md
Normal file
@@ -0,0 +1,197 @@
|
|||||||
|
# Data Cache Reuse Design
|
||||||
|
|
||||||
|
**Date:** 2025-10-30
|
||||||
|
**Status:** Approved
|
||||||
|
|
||||||
|
## Problem Statement
|
||||||
|
|
||||||
|
Docker containers currently fetch all 103 NASDAQ 100 tickers from Alpha Vantage on every startup, even when price data is volume-mounted and already cached in `./data`. This causes:
|
||||||
|
- Slow startup times (103 API calls)
|
||||||
|
- Unnecessary API quota consumption
|
||||||
|
- Rate limit risks during frequent development iterations
|
||||||
|
|
||||||
|
## Solution Overview
|
||||||
|
|
||||||
|
Implement staleness-based data refresh with configurable age threshold. Container checks all `daily_prices_*.json` files and only refetches if any file is missing or older than `MAX_DATA_AGE_DAYS`.
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
### Architecture Choice
|
||||||
|
**Selected:** Check all `daily_prices_*.json` files individually
|
||||||
|
**Rationale:** Ensures data integrity by detecting partial/missing files, not just stale merged data
|
||||||
|
|
||||||
|
### Implementation Location
|
||||||
|
**Selected:** Bash wrapper logic in `entrypoint.sh`
|
||||||
|
**Rationale:** Keeps data fetching scripts unchanged, adds orchestration at container startup layer
|
||||||
|
|
||||||
|
### Staleness Threshold
|
||||||
|
**Selected:** Configurable via `MAX_DATA_AGE_DAYS` environment variable (default: 7 days)
|
||||||
|
**Rationale:** Balances freshness with API usage; flexible for different use cases (development vs production)
|
||||||
|
|
||||||
|
## Technical Design
|
||||||
|
|
||||||
|
### Components
|
||||||
|
|
||||||
|
#### 1. Staleness Check Function
|
||||||
|
Location: `entrypoint.sh` (after environment validation, before data fetch)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
should_refresh_data() {
|
||||||
|
MAX_AGE=${MAX_DATA_AGE_DAYS:-7}
|
||||||
|
|
||||||
|
# Check if at least one price file exists
|
||||||
|
if ! ls /app/data/daily_prices_*.json >/dev/null 2>&1; then
|
||||||
|
echo "📭 No price data found"
|
||||||
|
return 0 # Need refresh
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Find any files older than MAX_AGE days
|
||||||
|
STALE_COUNT=$(find /app/data -name "daily_prices_*.json" -mtime +$MAX_AGE | wc -l)
|
||||||
|
TOTAL_COUNT=$(ls /app/data/daily_prices_*.json 2>/dev/null | wc -l)
|
||||||
|
|
||||||
|
if [ $STALE_COUNT -gt 0 ]; then
|
||||||
|
echo "📅 Found $STALE_COUNT stale files (>$MAX_AGE days old)"
|
||||||
|
return 0 # Need refresh
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "✅ All $TOTAL_COUNT price files are fresh (<$MAX_AGE days old)"
|
||||||
|
return 1 # Skip refresh
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Logic:**
|
||||||
|
- Uses `find -mtime +N` to detect files modified more than N days ago
|
||||||
|
- Returns shell exit codes: 0 (refresh needed), 1 (skip refresh)
|
||||||
|
- Logs informative messages for debugging
|
||||||
|
|
||||||
|
#### 2. Conditional Data Fetch
|
||||||
|
Location: `entrypoint.sh` lines 40-46 (replace existing unconditional fetch)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Step 1: Data preparation (conditional)
|
||||||
|
echo "📊 Checking price data freshness..."
|
||||||
|
|
||||||
|
if should_refresh_data; then
|
||||||
|
echo "🔄 Fetching and merging price data..."
|
||||||
|
cd /app/data
|
||||||
|
python /app/scripts/get_daily_price.py
|
||||||
|
python /app/scripts/merge_jsonl.py
|
||||||
|
cd /app
|
||||||
|
else
|
||||||
|
echo "⏭️ Skipping data fetch (using cached data)"
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. Environment Configuration
|
||||||
|
**docker-compose.yml:**
|
||||||
|
```yaml
|
||||||
|
environment:
|
||||||
|
- MAX_DATA_AGE_DAYS=${MAX_DATA_AGE_DAYS:-7}
|
||||||
|
```
|
||||||
|
|
||||||
|
**.env.example:**
|
||||||
|
```bash
|
||||||
|
# Data Refresh Configuration
|
||||||
|
MAX_DATA_AGE_DAYS=7 # Refresh price data older than N days (0=always refresh)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Flow
|
||||||
|
|
||||||
|
1. **Container Startup** → entrypoint.sh begins execution
|
||||||
|
2. **Environment Validation** → Check required API keys (existing logic)
|
||||||
|
3. **Staleness Check** → `should_refresh_data()` scans `/app/data/daily_prices_*.json`
|
||||||
|
- No files found → Return 0 (refresh)
|
||||||
|
- Any file older than `MAX_DATA_AGE_DAYS` → Return 0 (refresh)
|
||||||
|
- All files fresh → Return 1 (skip)
|
||||||
|
4. **Conditional Fetch** → Run get_daily_price.py only if refresh needed
|
||||||
|
5. **Merge Data** → Always run merge_jsonl.py (handles missing merged.jsonl)
|
||||||
|
6. **MCP Services** → Start services (existing logic)
|
||||||
|
7. **Trading Agent** → Begin trading (existing logic)
|
||||||
|
|
||||||
|
### Edge Cases
|
||||||
|
|
||||||
|
| Scenario | Behavior |
|
||||||
|
|----------|----------|
|
||||||
|
| **First run (no data)** | Detects no files → triggers full fetch |
|
||||||
|
| **Restart within 7 days** | All files fresh → skips fetch (fast startup) |
|
||||||
|
| **Restart after 7 days** | Files stale → refreshes all data |
|
||||||
|
| **Partial data (some files missing)** | Missing files treated as infinitely old → triggers refresh |
|
||||||
|
| **Corrupt merged.jsonl but fresh price files** | Skips fetch, re-runs merge to rebuild merged.jsonl |
|
||||||
|
| **MAX_DATA_AGE_DAYS=0** | Always refresh (useful for testing/production) |
|
||||||
|
| **MAX_DATA_AGE_DAYS unset** | Defaults to 7 days |
|
||||||
|
| **Alpha Vantage rate limit** | get_daily_price.py handles with warning (existing behavior) |
|
||||||
|
|
||||||
|
## Configuration Options
|
||||||
|
|
||||||
|
| Variable | Default | Purpose |
|
||||||
|
|----------|---------|---------|
|
||||||
|
| `MAX_DATA_AGE_DAYS` | 7 | Days before price data considered stale |
|
||||||
|
|
||||||
|
**Special Values:**
|
||||||
|
- `0` → Always refresh (force fresh data)
|
||||||
|
- `999` → Never refresh (use cached data indefinitely)
|
||||||
|
|
||||||
|
## User Experience
|
||||||
|
|
||||||
|
### Scenario 1: Fresh Container
|
||||||
|
```
|
||||||
|
🚀 Starting AI-Trader...
|
||||||
|
🔍 Validating environment variables...
|
||||||
|
✅ Environment variables validated
|
||||||
|
📊 Checking price data freshness...
|
||||||
|
📭 No price data found
|
||||||
|
🔄 Fetching and merging price data...
|
||||||
|
✓ Fetched NVDA
|
||||||
|
✓ Fetched MSFT
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scenario 2: Restart Within 7 Days
|
||||||
|
```
|
||||||
|
🚀 Starting AI-Trader...
|
||||||
|
🔍 Validating environment variables...
|
||||||
|
✅ Environment variables validated
|
||||||
|
📊 Checking price data freshness...
|
||||||
|
✅ All 103 price files are fresh (<7 days old)
|
||||||
|
⏭️ Skipping data fetch (using cached data)
|
||||||
|
🔧 Starting MCP services...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scenario 3: Restart After 7 Days
|
||||||
|
```
|
||||||
|
🚀 Starting AI-Trader...
|
||||||
|
🔍 Validating environment variables...
|
||||||
|
✅ Environment variables validated
|
||||||
|
📊 Checking price data freshness...
|
||||||
|
📅 Found 103 stale files (>7 days old)
|
||||||
|
🔄 Fetching and merging price data...
|
||||||
|
✓ Fetched NVDA
|
||||||
|
✓ Fetched MSFT
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing Plan
|
||||||
|
|
||||||
|
1. **Test fresh container:** Delete `./data/daily_prices_*.json`, start container → should fetch all
|
||||||
|
2. **Test cached data:** Restart immediately → should skip fetch
|
||||||
|
3. **Test staleness:** `touch -d "8 days ago" ./data/daily_prices_AAPL.json`, restart → should refresh
|
||||||
|
4. **Test partial data:** Delete 10 random price files → should refresh all
|
||||||
|
5. **Test MAX_DATA_AGE_DAYS=0:** Restart with env var set → should always fetch
|
||||||
|
6. **Test MAX_DATA_AGE_DAYS=30:** Restart with 8-day-old data → should skip
|
||||||
|
|
||||||
|
## Documentation Updates
|
||||||
|
|
||||||
|
Files requiring updates:
|
||||||
|
- `entrypoint.sh` → Add function and conditional logic
|
||||||
|
- `docker-compose.yml` → Add MAX_DATA_AGE_DAYS environment variable
|
||||||
|
- `.env.example` → Document MAX_DATA_AGE_DAYS with default value
|
||||||
|
- `CLAUDE.md` → Update "Docker Deployment" section with new env var
|
||||||
|
- `docs/DOCKER.md` (if exists) → Explain data caching behavior
|
||||||
|
|
||||||
|
## Benefits
|
||||||
|
|
||||||
|
- **Development:** Instant container restarts during iteration
|
||||||
|
- **API Quota:** ~103 fewer API calls per restart
|
||||||
|
- **Reliability:** No rate limit risks during frequent testing
|
||||||
|
- **Flexibility:** Configurable threshold for different use cases
|
||||||
|
- **Consistency:** Checks all files to ensure complete data
|
||||||
Reference in New Issue
Block a user