When a Docker container is shutdown and restarted, jobs with status
'pending', 'downloading_data', or 'running' remained in the database,
preventing new jobs from starting due to concurrency control checks.
This commit adds automatic cleanup of stale jobs during FastAPI startup:
- New cleanup_stale_jobs() method in JobManager (api/job_manager.py:702-779)
- Integrated into FastAPI lifespan startup (api/main.py:164-168)
- Intelligent status determination based on completion percentage:
- 'partial' if any model-days completed (preserves progress data)
- 'failed' if no progress made
- Detailed error messages with original status and completion counts
- Marks incomplete job_details as 'failed' with clear error messages
- Deployment-aware: skips cleanup in DEV mode when DB is reset
- Comprehensive logging at warning level for visibility
Testing:
- 6 new unit tests covering all cleanup scenarios (451-609)
- All 30 existing job_manager tests still pass
- Tests verify pending, running, downloading_data, partial progress,
no stale jobs, and multiple stale jobs scenarios
Resolves issue where container restarts left stale jobs blocking the
can_start_new_job() concurrency check.
After extensive systematic debugging, identified and fixed LangChain bug
where parse_tool_call() returns string args instead of dict.
**Root Cause:**
LangChain's parse_tool_call() has intermittent bug returning unparsed
JSON string for 'args' field instead of dict object, violating AIMessage
Pydantic schema.
**Solution:**
ToolCallArgsParsingWrapper provides two-layer fix:
1. Patches parse_tool_call() to detect string args and parse to dict
2. Normalizes non-standard tool_call formats to OpenAI standard
**Implementation:**
- Patches parse_tool_call in langchain_openai.chat_models.base namespace
- Defensive approach: only acts when string args detected
- Handles edge cases: invalid JSON, non-standard formats, invalid_tool_calls
- Minimal performance impact: lightweight type checks
- Thread-safe: patches apply at wrapper initialization
**Testing:**
- Confirmed fix working in production with DeepSeek Chat v3.1
- All tool calls now process successfully without validation errors
- No impact on other AI providers (OpenAI, Anthropic, etc.)
**Impact:**
- Enables DeepSeek models via OpenRouter
- Maintains backward compatibility
- Future-proof against similar issues from other providers
Closes systematic debugging investigation that spanned 6 alpha releases.
Fixes: tool_calls.0.args validation error [type=dict_type, input_type=str]
Systematic debugging revealed DeepSeek returns tool_calls in non-standard
format that bypasses LangChain's parse_tool_call():
**Root Cause:**
- OpenAI standard: {function: {name, arguments}, id}
- DeepSeek format: {name, args, id}
- LangChain's parse_tool_call() returns None when no 'function' key
- Result: Raw tool_call with string args → Pydantic validation error
**Solution:**
- ToolCallArgsParsingWrapper detects non-standard format
- Normalizes to OpenAI standard before LangChain processing
- Converts {name, args, id} → {function: {name, arguments}, id}
- Added diagnostic logging to identify format variations
**Impact:**
- DeepSeek models now work via OpenRouter
- No breaking changes to other providers (defensive design)
- Diagnostic logs help debug future format issues
Fixes validation errors:
tool_calls.0.args: Input should be a valid dictionary
[type=dict_type, input_value='{"symbol": "GILD", ...}', input_type=str]
Reverted ChatDeepSeek integration approach as it conflicts with
OpenRouter unified gateway architecture.
The system uses OPENAI_API_BASE (OpenRouter) with a single
OPENAI_API_KEY for all AI providers, not direct provider connections.
v0.4.1 now only includes the IF_TRADE initialization fix.
Systematic debugging revealed the root cause of Pydantic validation errors:
- DeepSeek correctly returns tool_calls.arguments as JSON strings
- My wrapper was incorrectly converting strings to dicts
- This caused LangChain's parse_tool_call() to fail (json.loads(dict) error)
- Failure created invalid_tool_calls with dict args (should be string)
- Result: Pydantic validation error on invalid_tool_calls
Solution: Remove all conversion logic. DeepSeek format is already correct.
ToolCallArgsParsingWrapper now acts as a simple passthrough proxy.
Trading session completes successfully with no errors.
Fixes the systematic-debugging investigation that identified the
issue was in our fix attempt, not in the original API response.
Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Extended ToolCallArgsParsingWrapper to handle both tool_calls and
invalid_tool_calls args formatting inconsistencies from DeepSeek:
- tool_calls.args: string -> dict (for successful calls)
- invalid_tool_calls.args: dict -> string (for failed calls)
The wrapper now normalizes both types before AIMessage construction,
preventing Pydantic validation errors in both success and error cases.
Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added ToolCallArgsParsingWrapper to handle AI providers (like DeepSeek)
that return tool_calls.args as JSON strings instead of dictionaries.
The wrapper monkey-patches ChatOpenAI's _create_chat_result method to
parse string arguments before AIMessage construction, preventing
Pydantic validation errors.
Changes:
- New: agent/chat_model_wrapper.py - Wrapper implementation
- Modified: agent/base_agent/base_agent.py - Wrap model during init
- Modified: CHANGELOG.md - Document fix as v0.4.1
- New: tests/unit/test_chat_model_wrapper.py - Unit tests
Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major version bump due to breaking changes:
- Schema migration from action-centric to day-centric model
- Old database tables removed (trading_sessions, positions, reasoning_logs)
- /reasoning endpoint removed (replaced by /results)
- Accurate daily P&L calculation system implemented
- Document removal of trading_sessions, positions, reasoning_logs tables
- Document removal of /reasoning endpoint with migration guide
- Add migration instructions for production databases
- Document response structure changes between old and new endpoints
- Update Changed section with trade tools and executor modifications
Healthcheck now runs once per hour instead of every 30 seconds,
reducing log spam while still maintaining startup verification
during the 40s start_period.
Benefits:
- Minimal log noise (1 check/hour vs every 30s)
- Maintains startup verification
- Compatible with Docker orchestration tools
Critical bug fixes for position tracking:
- Fixed cash reset between trading days
- Fixed positions lost over weekends
- Fixed profit calculation accuracy
Plus standardized testing infrastructure.
Document the critical bug fixes for position tracking:
- Cash reset to initial value each day
- Positions lost over weekends
- Incorrect profit calculations treating trades as losses
Update database schema documentation to explain the corrected
profit calculation logic that compares to start-of-day portfolio
value instead of previous day's final value.
Consolidated all unreleased changes into v0.3.0 release dated 2025-11-03.
Key additions:
- Development Mode with mock AI provider
- Config Override System for Docker
- Async Price Download (non-blocking)
- Resume Mode (idempotent execution)
- Reasoning Logs API (GET /reasoning)
- Project rebrand to AI-Trader-Server
Includes comprehensive bug fixes for context injection, simulation
re-runs, database reliability, and configuration handling.
- Add FastAPI @app.on_event("startup") handler to display warning
- Previously only appeared when running directly (not via uvicorn)
- Add DEPLOYMENT_MODE and PRESERVE_DEV_DATA to docker-compose.yml
- Update CHANGELOG.md with fix documentation
Fixes issue where dev mode banner wasn't visible in Docker logs
because uvicorn imports app without executing __main__ block.
Clarify that v0.3.0 is the first version with REST API functionality,
and remove misleading "API Request Format Changed" entries that implied
the API existed in v0.2.0.
Key improvements:
- Remove "API Request Format Changed" from Changed section (API is new)
- Remove "Model Selection" and "API Interface" items (API design, not changes)
- Clarify batch mode removal context (v0.2.0 had batch, v0.3.0 adds API)
- Update test counts to reflect new tests (175 total, up from 102)
- Add coverage details for new test files (date_utils, price_data_manager)
- Update test execution time estimate (~12 seconds for full suite)
Breaking changes now correctly identify what changed from v0.2.0:
- Batch execution replaced with REST API (new capability)
- Price data storage moved from JSONL to SQLite (migration required)
- Configuration variables added/removed for new features
v0.2.0 was Docker-focused with batch execution
v0.3.0 adds REST API, on-demand downloads, and database storage
Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Makes config_path an internal server detail rather than an API parameter.
Changes:
- Remove config_path from SimulateTriggerRequest
- Add config_path parameter to create_app() with default
- Store in app.state.config_path for internal use
- Update trigger endpoint to use internal config path
- Change missing config error from 400 to 500 (server error)
API calls now only need to specify date_range (and optionally models):
POST /simulate/trigger
{"date_range": ["2025-01-16"]}
The server uses configs/default_config.json by default.
This simplifies the API and hides implementation details from clients.
Changed the API to respect the 'enabled' field in model configurations,
rather than requiring models to be explicitly specified in API requests.
Changes:
- Make 'models' parameter optional in POST /simulate/trigger
- If models not provided, read config and use enabled models
- If models provided, use as explicit override (for testing)
- Raise error if no enabled models found and none specified
- Update response message to show model count
Behavior:
- Default: Only runs models with "enabled": true in config
- Override: Can still specify models in request for manual testing
- Safety: Prevents accidental execution of disabled/expensive models
Example before (required):
POST /simulate/trigger
{"config_path": "...", "date_range": [...], "models": ["gpt-4"]}
Example after (optional):
POST /simulate/trigger
{"config_path": "...", "date_range": [...]}
# Uses models where enabled: true
This makes the config file the source of truth for which models
should run, while still allowing ad-hoc overrides for testing.
The web UI (docs/index.html, portfolio.html) exists but is not served
in API mode. Removing the port configuration to eliminate confusion.
Changes:
- Remove port 8888 mapping from docker-compose.yml
- Remove WEB_HTTP_PORT from .env.example
- Update Dockerfile EXPOSE to only port 8080
- Update CHANGELOG.md to document removal
Technical details:
- Web UI static files remain in docs/ folder (legacy from batch mode)
- These were designed for JSONL file format, not the new SQLite database
- No web server was ever started in entrypoint.sh for API mode
- Port 8888 was exposed but nothing listened on it
Result:
- Cleaner configuration (1 fewer port mapping)
- Only REST API (8080) is exposed
- Eliminates user confusion about non-functional web UI
Consolidated the configuration simplification changes (RUNTIME_ENV_PATH
removal, API_PORT cleanup, MCP port removal) into the v0.3.0 release
notes under the 'Changed - Configuration' section.
This ensures all v0.3.0 changes are documented together in a single
release entry rather than split across Unreleased and v0.3.0 sections.
MCP services are completely internal to the container and accessed
only via localhost. They should not be configurable or exposed.
Changes:
- Remove MATH_HTTP_PORT, SEARCH_HTTP_PORT, TRADE_HTTP_PORT,
GETPRICE_HTTP_PORT from docker-compose.yml environment
- Remove MCP service port mappings from docker-compose.yml
- Remove MCP port configuration from .env.example
- Update README.md to remove MCP port configuration
- Update CLAUDE.md to clarify MCP services use fixed internal ports
- Update CHANGELOG.md with these simplifications
Technical details:
- MCP services hardcode to ports 8000-8003 via os.getenv() defaults
- Services only accessed via localhost URLs within container:
- http://localhost:8000/mcp (math)
- http://localhost:8001/mcp (search)
- http://localhost:8002/mcp (trade)
- http://localhost:8003/mcp (price)
- No external access needed or desired for these services
- Only API (8080) and web dashboard (8888) should be exposed
Benefits:
- Simpler configuration (4 fewer environment variables)
- Reduced attack surface (4 fewer exposed ports)
- Clearer architecture (internal vs external services)
- Prevents accidental misconfiguration of internal services
The API_PORT variable was incorrectly included in the container's
environment section. It should only be used for host port mapping
in docker-compose.yml, not passed into the container.
Changes:
- Remove API_PORT from environment section in docker-compose.yml
- Container always uses port 8080 internally (hardcoded in entrypoint.sh)
- API_PORT in .env/.env.example only controls the host-side mapping:
ports: "${API_PORT:-8080}:8080" (host:container)
Why this matters:
- Prevents confusion about whether API_PORT changes internal port
- Clarifies that entrypoint.sh hardcodes --port 8080
- Simplifies container environment (one less unused variable)
- More explicit about the port mapping behavior
No functional change - the container was already ignoring this variable.
Simplifies deployment configuration by removing the RUNTIME_ENV_PATH
environment variable, which is no longer needed for API mode.
Changes:
- Remove RUNTIME_ENV_PATH from docker-compose.yml
- Remove RUNTIME_ENV_PATH from .env.example
- Update CLAUDE.md to reflect API-managed runtime configs
- Update README.md to remove RUNTIME_ENV_PATH from config examples
- Update CHANGELOG.md with this simplification
Technical details:
- API mode dynamically creates isolated runtime config files via
RuntimeConfigManager (data/runtime_env_{job_id}_{model}_{date}.json)
- tools/general_tools.py already handles missing RUNTIME_ENV_PATH
gracefully, returning empty dict and warning on writes
- No functional impact - all tests pass without this variable set
- Reduces configuration complexity for new deployments
Breaking change: None - variable was vestigial from batch mode era
Removes dual-mode deployment complexity, focusing on REST API service only.
Changes:
- Removed batch mode from docker-compose.yml (now single ai-trader service)
- Deleted scripts/test_batch_mode.sh validation script
- Renamed entrypoint-api.sh to entrypoint.sh (now default)
- Simplified Dockerfile (single entrypoint, removed CMD)
- Updated validation scripts to use 'ai-trader' service name
- Updated documentation (README.md, TESTING_GUIDE.md, CHANGELOG.md)
Benefits:
- Eliminates port conflicts between batch and API services
- Simpler configuration and deployment
- API-first architecture aligned with Windmill integration
- Reduced maintenance complexity
Breaking Changes:
- Batch mode no longer available
- All simulations must use REST API endpoints
Update changelog with comprehensive release notes including:
- All features added during alpha testing phase
- Configuration improvements and new documentation
- Bug fixes and stability improvements
- Corrected release date to 2025-10-31
Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>