Compare commits

..

7 Commits

Author SHA1 Message Date
d749247af4 docs: consolidate fixes into v0.4.1 release
Merged v0.4.2 changes into v0.4.1:
- IF_TRADE initialization fix
- Native ChatDeepSeek integration for DeepSeek models
- Model factory pattern implementation
- Removed obsolete wrapper code

Preparing v0.4.1-alpha for deployment testing.
2025-11-06 11:02:47 -05:00
2178bbcdde chore: remove obsolete chat model wrapper
The ToolCallArgsParsingWrapper was replaced by the model factory
approach which uses native ChatDeepSeek for DeepSeek models.

Removed:
- agent/chat_model_wrapper.py (no longer used)
- tests/unit/test_chat_model_wrapper.py (tests for removed functionality)

The model factory provides a cleaner solution by using provider-specific
implementations rather than wrapping ChatOpenAI.
2025-11-06 08:02:33 -05:00
872928a187 docs: update for IF_TRADE and DeepSeek fixes
- Document model factory architecture
- Add troubleshooting for DeepSeek validation errors
- Update changelog with version 0.4.2 fixes
2025-11-06 07:58:43 -05:00
81ec0ec53b test: add DeepSeek tool calls integration tests
Verifies that ChatDeepSeek properly handles tool_calls arguments:
- Returns ChatDeepSeek for deepseek/* models
- tool_calls.args are dicts (not JSON strings)
- No Pydantic validation errors on args

Tests skip gracefully if API keys not available.
2025-11-06 07:56:48 -05:00
ed6647ed66 refactor: use model factory in BaseAgent
Replaces direct ChatOpenAI instantiation with create_model() factory.

Benefits:
- DeepSeek models now use native ChatDeepSeek
- Other models continue using ChatOpenAI
- Provider-specific optimizations in one place
- Easier to add new providers

Logging now shows both model name and provider class for debugging.
2025-11-06 07:52:10 -05:00
e689a78b3f feat: add model factory for provider-specific chat models
Implements factory pattern to create appropriate chat model based on
provider prefix in basemodel string.

Supported providers:
- deepseek/*: Uses ChatDeepSeek (native tool calling)
- openai/*: Uses ChatOpenAI
- others: Fall back to ChatOpenAI (OpenAI-compatible)

This enables native DeepSeek integration while maintaining backward
compatibility with existing OpenAI-compatible providers.
2025-11-06 07:47:56 -05:00
60d89c8d3a deps: add langchain-deepseek for native DeepSeek support
Adds official LangChain DeepSeek integration to replace ChatOpenAI
wrapper approach for DeepSeek models. Native integration provides:
- Better tool_calls argument parsing
- DeepSeek-specific error handling
- No OpenAI compatibility layer issues

Version 0.1.20+ includes tool calling support for deepseek-chat.
2025-11-06 07:44:11 -05:00
30 changed files with 1343 additions and 3001 deletions

View File

@@ -7,49 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
## [0.4.2] - 2025-11-07
### Fixed
- **Critical:** Fixed negative cash position bug where trades calculated from initial capital instead of accumulating
- Root cause: MCP tools return `CallToolResult` objects with position data in `structuredContent` field, but `ContextInjector` was checking `isinstance(result, dict)` which always failed
- Impact: Each trade checked cash against initial $10,000 instead of cumulative position, allowing over-spending and resulting in negative cash balances (e.g., -$8,768.68 after 11 trades totaling $18,768.68)
- Solution: Updated `ContextInjector` to extract position dict from `CallToolResult.structuredContent` before validation
- Fix ensures proper intra-day position tracking with cumulative cash checks preventing over-trading
- Updated unit tests to mock `CallToolResult` objects matching production MCP behavior
- Locations: `agent/context_injector.py:95-109`, `tests/unit/test_context_injector.py:26-53`
- Enabled MCP service logging by redirecting stdout/stderr from `/dev/null` to main process for better debugging
- Previously, all MCP tool debug output was silently discarded
- Now visible in docker logs for diagnosing parameter injection and trade execution issues
- Location: `agent_tools/start_mcp_services.py:81-88`
### Fixed
- **Critical:** Fixed stale jobs blocking new jobs after Docker container restart
- Root cause: Jobs with status 'pending', 'downloading_data', or 'running' remained in database after container shutdown, preventing new job creation
- Solution: Added `cleanup_stale_jobs()` method that runs on FastAPI startup to mark interrupted jobs as 'failed' or 'partial' based on completion percentage
- Intelligent status determination: Uses existing progress tracking (completed/total model-days) to distinguish between failed (0% complete) and partial (>0% complete)
- Detailed error messages include original status and completion counts (e.g., "Job interrupted by container restart (was running, 3/10 model-days completed)")
- Incomplete job_details automatically marked as 'failed' with clear error messages
- Deployment-aware: Skips cleanup in DEV mode when database is reset, always runs in PROD mode
- Comprehensive test coverage: 6 new unit tests covering all cleanup scenarios
- Locations: `api/job_manager.py:702-779`, `api/main.py:164-168`, `tests/unit/test_job_manager.py:451-609`
- Fixed Pydantic validation errors when using DeepSeek models via OpenRouter
- Root cause: LangChain's `parse_tool_call()` has a bug where it sometimes returns `args` as JSON string instead of parsed dict object
- Solution: Added `ToolCallArgsParsingWrapper` that:
1. Patches `parse_tool_call()` to detect and fix string args by parsing them to dict
2. Normalizes non-standard tool_call formats (e.g., `{name, args, id}``{function: {name, arguments}, id}`)
- The wrapper is defensive and only acts when needed, ensuring compatibility with all AI providers
- Fixes validation error: `tool_calls.0.args: Input should be a valid dictionary [type=dict_type, input_value='...', input_type=str]`
## [0.4.1] - 2025-11-06
## [0.4.1] - 2025-11-05
### Fixed
- Fixed "No trading" message always displaying despite trading activity by initializing `IF_TRADE` to `True` (trades expected by default)
- Root cause: `IF_TRADE` was initialized to `False` in runtime config but never updated when trades executed
- Resolved sporadic Pydantic validation errors for DeepSeek tool_calls arguments by switching to native `ChatDeepSeek` integration
- Root cause: OpenAI compatibility layer did not consistently parse DeepSeek's tool_calls arguments from JSON strings to dicts
- Solution: Use native `langchain-deepseek` package which properly handles DeepSeek's response format
### Note
- ChatDeepSeek integration was reverted as it conflicts with OpenRouter unified gateway architecture
- System uses `OPENAI_API_BASE` (OpenRouter) with single `OPENAI_API_KEY` for all providers
- Sporadic DeepSeek validation errors appear to be transient and do not require code changes
### Added
- Added `agent/model_factory.py` for provider-specific model creation
- Added `langchain-deepseek>=0.1.20` dependency for native DeepSeek support
- Added integration tests for DeepSeek tool calls argument parsing
### Changed
- `BaseAgent` now uses model factory instead of direct `ChatOpenAI` instantiation
- DeepSeek models (`deepseek/*`) now use `ChatDeepSeek` for native tool calling support
- Removed obsolete `ToolCallArgsParsingWrapper` and related tests
## [0.4.0] - 2025-11-05

View File

@@ -167,12 +167,18 @@ bash main.sh
### Agent System
**BaseAgent Key Methods:**
- `initialize()`: Connect to MCP services, create AI model
- `initialize()`: Connect to MCP services, create AI model via model factory
- `run_trading_session(date)`: Execute single day's trading with retry logic
- `run_date_range(init_date, end_date)`: Process all weekdays in range
- `get_trading_dates()`: Resume from last date in position.jsonl
- `register_agent()`: Create initial position file with $10,000 cash
**Model Factory:**
The `create_model()` factory automatically selects the appropriate chat model:
- `deepseek/*` models → `ChatDeepSeek` (native tool calling support)
- `openai/*` models → `ChatOpenAI`
- Other providers → `ChatOpenAI` (OpenAI-compatible endpoint)
**Adding Custom Agents:**
1. Create new class inheriting from `BaseAgent`
2. Add to `AGENT_REGISTRY` in `main.py`:
@@ -202,34 +208,6 @@ bash main.sh
- Search results: News filtered by publication date
- All tools enforce temporal boundaries via `TODAY_DATE` from `runtime_env.json`
### Duplicate Simulation Prevention
**Automatic Skip Logic:**
- `JobManager.create_job()` checks database for already-completed model-day pairs
- Skips completed simulations automatically
- Returns warnings list with skipped pairs
- Raises `ValueError` if all requested simulations are already completed
**Example:**
```python
result = job_manager.create_job(
config_path="config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["model-a"],
model_day_filter=[("model-a", "2025-10-15")] # Already completed
)
# result = {
# "job_id": "new-job-uuid",
# "warnings": ["Skipped model-a/2025-10-15 - already completed"]
# }
```
**Cross-Job Portfolio Continuity:**
- `get_current_position_from_db()` queries across ALL jobs for a given model
- Enables portfolio continuity even when new jobs are created with overlapping dates
- Starting position = most recent trading_day.ending_cash + holdings where date < current_date
## Configuration File Format
```json
@@ -480,4 +458,10 @@ The project uses a well-organized documentation structure:
**Agent Doesn't Stop Trading:**
- Agent must output `<FINISH_SIGNAL>` within `max_steps`
- Increase `max_steps` if agent needs more reasoning time
- Check `log.jsonl` for errors preventing completion
- Check `log.jsonl` for errors preventing completion
**DeepSeek Pydantic Validation Errors:**
- Error: "Input should be a valid dictionary [type=dict_type, input_value='...', input_type=str]"
- Cause: Using `ChatOpenAI` for DeepSeek models (OpenAI compatibility layer issue)
- Fix: Ensure `langchain-deepseek` is installed and basemodel uses `deepseek/` prefix
- The model factory automatically uses `ChatDeepSeek` for native support

View File

@@ -4,78 +4,6 @@ This document outlines planned features and improvements for the AI-Trader proje
## Release Planning
### v0.5.0 - Performance Metrics & Status APIs (Planned)
**Focus:** Enhanced observability and performance tracking
#### Performance Metrics API
- **Performance Summary Endpoint** - Query model performance over date ranges
- `GET /metrics/performance` - Aggregated performance metrics
- Query parameters: `model`, `start_date`, `end_date`
- Returns comprehensive performance summary:
- Total return (dollar amount and percentage)
- Number of trades executed (buy + sell)
- Win rate (profitable trading days / total trading days)
- Average daily P&L (profit and loss)
- Best/worst trading day (highest/lowest daily P&L)
- Final portfolio value (cash + holdings at market value)
- Number of trading days in queried range
- Starting vs. ending portfolio comparison
- Use cases:
- Compare model performance across different time periods
- Evaluate strategy effectiveness
- Identify top-performing models
- Example: `GET /metrics/performance?model=gpt-4&start_date=2025-01-01&end_date=2025-01-31`
- Filtering options:
- Single model or all models
- Custom date ranges
- Exclude incomplete trading days
- Response format: JSON with clear metric definitions
#### Status & Coverage Endpoint
- **System Status Summary** - Data availability and simulation progress
- `GET /status` - Comprehensive system status
- Price data coverage section:
- Available symbols (NASDAQ 100 constituents)
- Date range of downloaded price data per symbol
- Total trading days with complete data
- Missing data gaps (symbols without data, date gaps)
- Last data refresh timestamp
- Model simulation status section:
- List of all configured models (enabled/disabled)
- Date ranges simulated per model (first and last trading day)
- Total trading days completed per model
- Most recent simulation date per model
- Completion percentage (simulated days / available data days)
- System health section:
- Database connectivity status
- MCP services status (Math, Search, Trade, LocalPrices)
- API version and deployment mode
- Disk space usage (database size, log size)
- Use cases:
- Verify data availability before triggering simulations
- Identify which models need updates to latest data
- Monitor system health and readiness
- Plan data downloads for missing date ranges
- Example: `GET /status` (no parameters required)
- Benefits:
- Single endpoint for complete system overview
- No need to query multiple endpoints for status
- Clear visibility into data gaps
- Track simulation progress across models
#### Implementation Details
- Database queries for efficient metric calculation
- Caching for frequently accessed metrics (optional)
- Response time target: <500ms for typical queries
- Comprehensive error handling for missing data
#### Benefits
- **Better Observability** - Clear view of system state and model performance
- **Data-Driven Decisions** - Quantitative metrics for model comparison
- **Proactive Monitoring** - Identify data gaps before simulations fail
- **User Experience** - Single endpoint to check "what's available and what's been done"
### v1.0.0 - Production Stability & Validation (Planned)
**Focus:** Comprehensive testing, documentation, and production readiness
@@ -679,13 +607,11 @@ To propose a new feature:
- **v0.1.0** - Initial release with batch execution
- **v0.2.0** - Docker deployment support
- **v0.3.0** - REST API, on-demand downloads, database storage
- **v0.4.0** - Daily P&L calculation, day-centric results API, reasoning summaries (current)
- **v0.5.0** - Performance metrics & status APIs (planned)
- **v0.3.0** - REST API, on-demand downloads, database storage (current)
- **v1.0.0** - Production stability & validation (planned)
- **v1.1.0** - API authentication & security (planned)
- **v1.2.0** - Position history & analytics (planned)
- **v1.3.0** - Advanced performance metrics & analytics (planned)
- **v1.3.0** - Performance metrics & analytics (planned)
- **v1.4.0** - Data management API (planned)
- **v1.5.0** - Web dashboard UI (planned)
- **v1.6.0** - Advanced configuration & customization (planned)
@@ -693,4 +619,4 @@ To propose a new feature:
---
Last updated: 2025-11-06
Last updated: 2025-11-01

View File

@@ -12,7 +12,6 @@ from typing import Dict, List, Optional, Any
from pathlib import Path
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from dotenv import load_dotenv
@@ -33,7 +32,7 @@ from tools.deployment_config import (
from agent.context_injector import ContextInjector
from agent.pnl_calculator import DailyPnLCalculator
from agent.reasoning_summarizer import ReasoningSummarizer
from agent.chat_model_wrapper import ToolCallArgsParsingWrapper
from agent.model_factory import create_model
# Load environment variables
load_dotenv()
@@ -127,7 +126,7 @@ class BaseAgent:
# Initialize components
self.client: Optional[MultiServerMCPClient] = None
self.tools: Optional[List] = None
self.model: Optional[ChatOpenAI] = None
self.model: Optional[Any] = None
self.agent: Optional[Any] = None
# Context injector for MCP tools
@@ -212,16 +211,18 @@ class BaseAgent:
self.model = MockChatModel(date="2025-01-01") # Date will be updated per session
print(f"🤖 Using MockChatModel (DEV mode)")
else:
base_model = ChatOpenAI(
model=self.basemodel,
base_url=self.openai_base_url,
# Use model factory for provider-specific implementations
self.model = create_model(
basemodel=self.basemodel,
api_key=self.openai_api_key,
max_retries=3,
base_url=self.openai_base_url,
temperature=0.7,
timeout=30
)
# Wrap model with diagnostic wrapper
self.model = ToolCallArgsParsingWrapper(model=base_model)
print(f"🤖 Using {self.basemodel} (PROD mode) with diagnostic wrapper")
# Determine model type for logging
model_class = self.model.__class__.__name__
print(f"🤖 Using {self.basemodel} via {model_class} (PROD mode)")
except Exception as e:
raise RuntimeError(f"❌ Failed to initialize AI model: {e}")

View File

@@ -1,121 +0,0 @@
"""
Chat model wrapper to fix tool_calls args parsing issues.
DeepSeek and other providers return tool_calls.args as JSON strings, which need
to be parsed to dicts before AIMessage construction.
"""
import json
from typing import Any, Optional, Dict
from functools import wraps
class ToolCallArgsParsingWrapper:
"""
Wrapper that adds diagnostic logging and fixes tool_calls args if needed.
"""
def __init__(self, model: Any, **kwargs):
"""
Initialize wrapper around a chat model.
Args:
model: The chat model to wrap
**kwargs: Additional parameters (ignored, for compatibility)
"""
self.wrapped_model = model
self._patch_model()
def _patch_model(self):
"""Monkey-patch the model's _create_chat_result to add diagnostics"""
if not hasattr(self.wrapped_model, '_create_chat_result'):
# Model doesn't have this method (e.g., MockChatModel), skip patching
return
# CRITICAL: Patch parse_tool_call in base.py's namespace (not in openai_tools module!)
from langchain_openai.chat_models import base as langchain_base
original_parse_tool_call = langchain_base.parse_tool_call
def patched_parse_tool_call(raw_tool_call, *, partial=False, strict=False, return_id=True):
"""Patched parse_tool_call to fix string args bug"""
result = original_parse_tool_call(raw_tool_call, partial=partial, strict=strict, return_id=return_id)
if result and isinstance(result.get('args'), str):
# FIX: parse_tool_call sometimes returns string args instead of dict
# This is a known LangChain bug - parse the string to dict
try:
result['args'] = json.loads(result['args'])
except (json.JSONDecodeError, TypeError):
# Leave as string if we can't parse it - will fail validation
# but at least we tried
pass
return result
# Replace in base.py's namespace (where _convert_dict_to_message uses it)
langchain_base.parse_tool_call = patched_parse_tool_call
original_create_chat_result = self.wrapped_model._create_chat_result
@wraps(original_create_chat_result)
def patched_create_chat_result(response: Any, generation_info: Optional[Dict] = None):
"""Patched version that normalizes non-standard tool_call formats"""
response_dict = response if isinstance(response, dict) else response.model_dump()
# Normalize tool_calls to OpenAI standard format if needed
if 'choices' in response_dict:
for choice in response_dict['choices']:
if 'message' not in choice:
continue
message = choice['message']
# Fix tool_calls: Convert non-standard {name, args, id} to {function: {name, arguments}, id}
if 'tool_calls' in message and message['tool_calls']:
for tool_call in message['tool_calls']:
# Check if this is non-standard format (has 'args' directly)
if 'args' in tool_call and 'function' not in tool_call:
# Convert to standard OpenAI format
args = tool_call['args']
tool_call['function'] = {
'name': tool_call.get('name', ''),
'arguments': args if isinstance(args, str) else json.dumps(args)
}
# Remove non-standard fields
if 'name' in tool_call:
del tool_call['name']
if 'args' in tool_call:
del tool_call['args']
# Fix invalid_tool_calls: Ensure args is JSON string (not dict)
if 'invalid_tool_calls' in message and message['invalid_tool_calls']:
for invalid_call in message['invalid_tool_calls']:
if 'args' in invalid_call and isinstance(invalid_call['args'], dict):
try:
invalid_call['args'] = json.dumps(invalid_call['args'])
except (TypeError, ValueError):
# Keep as-is if serialization fails
pass
# Call original method with normalized response
return original_create_chat_result(response_dict, generation_info)
# Replace the method
self.wrapped_model._create_chat_result = patched_create_chat_result
@property
def _llm_type(self) -> str:
"""Return identifier for this LLM type"""
if hasattr(self.wrapped_model, '_llm_type'):
return f"wrapped-{self.wrapped_model._llm_type}"
return "wrapped-chat-model"
def __getattr__(self, name: str):
"""Proxy all attributes/methods to the wrapped model"""
return getattr(self.wrapped_model, name)
def bind_tools(self, tools: Any, **kwargs):
"""Bind tools to the wrapped model"""
return self.wrapped_model.bind_tools(tools, **kwargs)
def bind(self, **kwargs):
"""Bind settings to the wrapped model"""
return self.wrapped_model.bind(**kwargs)

View File

@@ -88,17 +88,9 @@ class ContextInjector:
# Update position state after successful trade
if request.name in ["buy", "sell"]:
# Extract position dict from MCP result
# MCP tools return CallToolResult objects with structuredContent field
position_dict = None
if hasattr(result, 'structuredContent') and result.structuredContent:
position_dict = result.structuredContent
elif isinstance(result, dict):
position_dict = result
# Check if position dict is valid (not an error) and update state
if position_dict and "error" not in position_dict and "CASH" in position_dict:
# Check if result is a valid position dict (not an error)
if isinstance(result, dict) and "error" not in result and "CASH" in result:
# Update our tracked position with the new state
self._current_position = position_dict.copy()
self._current_position = result.copy()
return result

68
agent/model_factory.py Normal file
View File

@@ -0,0 +1,68 @@
"""
Model factory for creating provider-specific chat models.
Supports multiple AI providers with native integrations where available:
- DeepSeek: Uses ChatDeepSeek for native tool calling support
- OpenAI: Uses ChatOpenAI
- Others: Fall back to ChatOpenAI (OpenAI-compatible endpoints)
"""
from typing import Any
from langchain_openai import ChatOpenAI
from langchain_deepseek import ChatDeepSeek
def create_model(
basemodel: str,
api_key: str,
base_url: str,
temperature: float,
timeout: int
) -> Any:
"""
Create appropriate chat model based on provider.
Args:
basemodel: Model identifier (e.g., "deepseek/deepseek-chat", "openai/gpt-4")
api_key: API key for the provider
base_url: Base URL for API endpoint
temperature: Sampling temperature (0-1)
timeout: Request timeout in seconds
Returns:
Provider-specific chat model instance
Examples:
>>> model = create_model("deepseek/deepseek-chat", "key", "url", 0.7, 30)
>>> isinstance(model, ChatDeepSeek)
True
>>> model = create_model("openai/gpt-4", "key", "url", 0.7, 30)
>>> isinstance(model, ChatOpenAI)
True
"""
# Extract provider from basemodel (format: "provider/model-name")
provider = basemodel.split("/")[0].lower() if "/" in basemodel else "unknown"
if provider == "deepseek":
# Use native ChatDeepSeek for DeepSeek models
# Extract model name without provider prefix
model_name = basemodel.split("/", 1)[1] if "/" in basemodel else basemodel
return ChatDeepSeek(
model=model_name,
api_key=api_key,
base_url=base_url,
temperature=temperature,
timeout=timeout
)
else:
# Use ChatOpenAI for OpenAI and OpenAI-compatible endpoints
# (Anthropic, Google, Qwen, etc. via compatibility layer)
return ChatOpenAI(
model=basemodel,
api_key=api_key,
base_url=base_url,
temperature=temperature,
timeout=timeout
)

View File

@@ -78,11 +78,10 @@ class MCPServiceManager:
env['PYTHONPATH'] = str(Path.cwd())
# Start service process (output goes to Docker logs)
# Enable stdout/stderr for debugging (previously sent to DEVNULL)
process = subprocess.Popen(
[sys.executable, str(script_path)],
stdout=sys.stdout, # Redirect to main process stdout
stderr=sys.stderr, # Redirect to main process stderr
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
cwd=Path.cwd(), # Use current working directory (/app)
env=env # Pass environment with PYTHONPATH
)

View File

@@ -34,11 +34,8 @@ def get_current_position_from_db(
Returns ending holdings and cash from that previous day, which becomes the
starting position for the current day.
NOTE: Searches across ALL jobs for the given model, enabling portfolio continuity
even when new jobs are created with overlapping date ranges.
Args:
job_id: Job UUID (kept for compatibility but not used in query)
job_id: Job UUID
model: Model signature
date: Current trading date (will query for date < this)
initial_cash: Initial cash if no prior data (first trading day)
@@ -54,14 +51,13 @@ def get_current_position_from_db(
try:
# Query most recent trading_day BEFORE current date (previous day's ending position)
# NOTE: Removed job_id filter to enable cross-job continuity
cursor.execute("""
SELECT id, ending_cash
FROM trading_days
WHERE model = ? AND date < ?
WHERE job_id = ? AND model = ? AND date < ?
ORDER BY date DESC
LIMIT 1
""", (model, date))
""", (job_id, model, date))
row = cursor.fetchone()

View File

@@ -611,10 +611,6 @@ class Database:
Handles weekends/holidays by finding actual previous trading day.
NOTE: Queries across ALL jobs for the given model to enable portfolio
continuity even when new jobs are created with overlapping date ranges.
The job_id parameter is kept for API compatibility but not used in the query.
Returns:
dict with keys: id, date, ending_cash, ending_portfolio_value
or None if no previous day exists
@@ -623,11 +619,11 @@ class Database:
"""
SELECT id, date, ending_cash, ending_portfolio_value
FROM trading_days
WHERE model = ? AND date < ?
WHERE job_id = ? AND model = ? AND date < ?
ORDER BY date DESC
LIMIT 1
""",
(model, current_date)
(job_id, model, current_date)
)
row = cursor.fetchone()

View File

@@ -55,9 +55,8 @@ class JobManager:
config_path: str,
date_range: List[str],
models: List[str],
model_day_filter: Optional[List[tuple]] = None,
skip_completed: bool = True
) -> Dict[str, Any]:
model_day_filter: Optional[List[tuple]] = None
) -> str:
"""
Create new simulation job.
@@ -67,16 +66,12 @@ class JobManager:
models: List of model signatures to execute
model_day_filter: Optional list of (model, date) tuples to limit job_details.
If None, creates job_details for all model-date combinations.
skip_completed: If True (default), skips already-completed simulations.
If False, includes all requested simulations regardless of completion status.
Returns:
Dict with:
- job_id: UUID of created job
- warnings: List of warning messages for skipped simulations
job_id: UUID of created job
Raises:
ValueError: If another job is already running/pending or if all simulations are already completed (when skip_completed=True)
ValueError: If another job is already running/pending
"""
if not self.can_start_new_job():
raise ValueError("Another simulation job is already running or pending")
@@ -88,49 +83,6 @@ class JobManager:
cursor = conn.cursor()
try:
# Determine which model-day pairs to check
if model_day_filter is not None:
pairs_to_check = model_day_filter
else:
pairs_to_check = [(model, date) for date in date_range for model in models]
# Check for already-completed simulations (only if skip_completed=True)
skipped_pairs = []
pending_pairs = []
if skip_completed:
# Perform duplicate checking
for model, date in pairs_to_check:
cursor.execute("""
SELECT COUNT(*)
FROM job_details
WHERE model = ? AND date = ? AND status = 'completed'
""", (model, date))
count = cursor.fetchone()[0]
if count > 0:
skipped_pairs.append((model, date))
logger.info(f"Skipping {model}/{date} - already completed in previous job")
else:
pending_pairs.append((model, date))
# If all simulations are already completed, raise error
if len(pending_pairs) == 0:
warnings = [
f"Skipped {model}/{date} - already completed"
for model, date in skipped_pairs
]
raise ValueError(
f"All requested simulations are already completed. "
f"Skipped {len(skipped_pairs)} model-day pair(s). "
f"Details: {warnings}"
)
else:
# skip_completed=False: include ALL pairs (no duplicate checking)
pending_pairs = pairs_to_check
logger.info(f"Including all {len(pending_pairs)} model-day pairs (skip_completed=False)")
# Insert job
cursor.execute("""
INSERT INTO jobs (
@@ -146,32 +98,34 @@ class JobManager:
created_at
))
# Create job_details only for pending pairs
for model, date in pending_pairs:
cursor.execute("""
INSERT INTO job_details (
job_id, date, model, status
)
VALUES (?, ?, ?, ?)
""", (job_id, date, model, "pending"))
# Create job_details based on filter
if model_day_filter is not None:
# Only create job_details for specified model-day pairs
for model, date in model_day_filter:
cursor.execute("""
INSERT INTO job_details (
job_id, date, model, status
)
VALUES (?, ?, ?, ?)
""", (job_id, date, model, "pending"))
logger.info(f"Created job {job_id} with {len(pending_pairs)} model-day tasks")
logger.info(f"Created job {job_id} with {len(model_day_filter)} model-day tasks (filtered)")
else:
# Create job_details for all model-day combinations
for date in date_range:
for model in models:
cursor.execute("""
INSERT INTO job_details (
job_id, date, model, status
)
VALUES (?, ?, ?, ?)
""", (job_id, date, model, "pending"))
if skipped_pairs:
logger.info(f"Skipped {len(skipped_pairs)} already-completed simulations")
logger.info(f"Created job {job_id} with {len(date_range)} dates and {len(models)} models")
conn.commit()
# Prepare warnings
warnings = [
f"Skipped {model}/{date} - already completed"
for model, date in skipped_pairs
]
return {
"job_id": job_id,
"warnings": warnings
}
return job_id
finally:
conn.close()
@@ -745,85 +699,6 @@ class JobManager:
finally:
conn.close()
def cleanup_stale_jobs(self) -> Dict[str, int]:
"""
Clean up stale jobs from container restarts.
Marks jobs with status 'pending', 'downloading_data', or 'running' as
'failed' or 'partial' based on completion percentage.
Called on application startup to reset interrupted jobs.
Returns:
Dict with jobs_cleaned count and details
"""
conn = get_db_connection(self.db_path)
cursor = conn.cursor()
try:
# Find all stale jobs
cursor.execute("""
SELECT job_id, status
FROM jobs
WHERE status IN ('pending', 'downloading_data', 'running')
""")
stale_jobs = cursor.fetchall()
cleaned_count = 0
for job_id, original_status in stale_jobs:
# Get progress to determine if partially completed
cursor.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed
FROM job_details
WHERE job_id = ?
""", (job_id,))
total, completed, failed = cursor.fetchone()
completed = completed or 0
failed = failed or 0
# Determine final status based on completion
if completed > 0:
new_status = "partial"
error_msg = f"Job interrupted by container restart (was {original_status}, {completed}/{total} model-days completed)"
else:
new_status = "failed"
error_msg = f"Job interrupted by container restart (was {original_status}, no progress made)"
# Mark incomplete job_details as failed
cursor.execute("""
UPDATE job_details
SET status = 'failed', error = 'Container restarted before completion'
WHERE job_id = ? AND status IN ('pending', 'running')
""", (job_id,))
# Update job status
updated_at = datetime.utcnow().isoformat() + "Z"
cursor.execute("""
UPDATE jobs
SET status = ?, error = ?, completed_at = ?, updated_at = ?
WHERE job_id = ?
""", (new_status, error_msg, updated_at, updated_at, job_id))
logger.warning(f"Cleaned up stale job {job_id}: {original_status}{new_status} ({completed}/{total} completed)")
cleaned_count += 1
conn.commit()
if cleaned_count > 0:
logger.warning(f"⚠️ Cleaned up {cleaned_count} stale job(s) from previous container session")
else:
logger.info("✅ No stale jobs found")
return {"jobs_cleaned": cleaned_count}
finally:
conn.close()
def cleanup_old_jobs(self, days: int = 30) -> Dict[str, int]:
"""
Delete jobs older than threshold.

View File

@@ -134,39 +134,25 @@ def create_app(
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Initialize database on startup, cleanup on shutdown if needed"""
from tools.deployment_config import is_dev_mode, get_db_path, should_preserve_dev_data
from tools.deployment_config import is_dev_mode, get_db_path
from api.database import initialize_dev_database, initialize_database
# Startup - use closure to access db_path from create_app scope
logger.info("🚀 FastAPI application starting...")
logger.info("📊 Initializing database...")
should_cleanup_stale_jobs = False
if is_dev_mode():
# Initialize dev database (reset unless PRESERVE_DEV_DATA=true)
logger.info(" 🔧 DEV mode detected - initializing dev database")
dev_db_path = get_db_path(db_path)
initialize_dev_database(dev_db_path)
log_dev_mode_startup_warning()
# Only cleanup stale jobs if preserving dev data (otherwise DB is fresh)
if should_preserve_dev_data():
should_cleanup_stale_jobs = True
else:
# Ensure production database schema exists
logger.info(" 🏭 PROD mode - ensuring database schema exists")
initialize_database(db_path)
should_cleanup_stale_jobs = True
logger.info("✅ Database initialized")
# Clean up stale jobs from previous container session
if should_cleanup_stale_jobs:
logger.info("🧹 Checking for stale jobs from previous session...")
job_manager = JobManager(get_db_path(db_path) if is_dev_mode() else db_path)
job_manager.cleanup_stale_jobs()
logger.info("🌐 API server ready to accept requests")
yield
@@ -280,19 +266,12 @@ def create_app(
# Create job immediately with all requested dates
# Worker will handle data download and filtering
result = job_manager.create_job(
job_id = job_manager.create_job(
config_path=config_path,
date_range=all_dates,
models=models_to_run,
model_day_filter=None, # Worker will filter based on available data
skip_completed=(not request.replace_existing) # Skip if replace_existing=False
model_day_filter=None # Worker will filter based on available data
)
job_id = result["job_id"]
warnings = result.get("warnings", [])
# Log warnings if any simulations were skipped
if warnings:
logger.warning(f"Job {job_id} created with {len(warnings)} skipped simulations: {warnings}")
# Start worker in background thread (only if not in test mode)
if not getattr(app.state, "test_mode", False):
@@ -319,7 +298,6 @@ def create_app(
status="pending",
total_model_days=len(all_dates) * len(models_to_run),
message=message,
warnings=warnings if warnings else None,
**deployment_info
)

View File

@@ -66,28 +66,3 @@ See README.md for architecture diagram.
- Search results filtered by publication date
See [CLAUDE.md](../../CLAUDE.md) for implementation details.
---
## Position Tracking Across Jobs
**Design:** Portfolio state is tracked per-model across all jobs, not per-job.
**Query Logic:**
```python
# Get starting position for current trading day
SELECT id, ending_cash FROM trading_days
WHERE model = ? AND date < ? # No job_id filter
ORDER BY date DESC
LIMIT 1
```
**Benefits:**
- Portfolio continuity when creating new jobs with overlapping dates
- Prevents accidental portfolio resets
- Enables flexible job scheduling (resume, rerun, backfill)
**Example:**
- Job 1: Runs 2025-10-13 to 2025-10-15 for model-a
- Job 2: Runs 2025-10-16 to 2025-10-20 for model-a
- Job 2 starts with Job 1's ending position from 2025-10-15

View File

@@ -0,0 +1,891 @@
# Fix IF_TRADE Flag and DeepSeek Tool Calls Validation Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Fix two bugs: (1) "No trading" message always displayed despite trading activity, and (2) sporadic Pydantic validation errors for DeepSeek tool_calls arguments.
**Architecture:**
- Issue #1: Change IF_TRADE initialization from False to True in runtime config manager
- Issue #2: Replace ChatOpenAI with native ChatDeepSeek for DeepSeek models to eliminate OpenAI compatibility layer issues
**Tech Stack:** Python 3.10+, LangChain 1.0.2, langchain-deepseek, FastAPI, SQLite
---
## Task 1: Fix IF_TRADE Initialization Bug
**Files:**
- Modify: `api/runtime_manager.py:80-86`
- Test: `tests/unit/test_runtime_manager.py`
- Verify: `agent/base_agent/base_agent.py:745-752`
**Root Cause:**
`IF_TRADE` is initialized to `False` but never updated when trades execute. The design documents show it should initialize to `True` (trades are expected by default).
**Step 1: Write failing test for IF_TRADE initialization**
Add to `tests/unit/test_runtime_manager.py` after existing tests:
```python
def test_create_runtime_config_if_trade_defaults_true(self):
"""Test that IF_TRADE initializes to True (trades expected by default)"""
manager = RuntimeConfigManager()
config_path = manager.create_runtime_config(
date="2025-01-16",
model_sig="test-model",
job_id="test-job-123",
trading_day_id=1
)
try:
# Read the config file
with open(config_path, 'r') as f:
config = json.load(f)
# Verify IF_TRADE is True by default
assert config["IF_TRADE"] is True, "IF_TRADE should initialize to True"
finally:
# Cleanup
if os.path.exists(config_path):
os.remove(config_path)
```
**Step 2: Run test to verify it fails**
Run: `./venv/bin/python -m pytest tests/unit/test_runtime_manager.py::TestRuntimeConfigManager::test_create_runtime_config_if_trade_defaults_true -v`
Expected: FAIL with assertion error showing `IF_TRADE` is `False`
**Step 3: Fix IF_TRADE initialization**
In `api/runtime_manager.py`, change line 83:
```python
# BEFORE (line 80-86):
initial_config = {
"TODAY_DATE": date,
"SIGNATURE": model_sig,
"IF_TRADE": False, # BUG: Should be True
"JOB_ID": job_id,
"TRADING_DAY_ID": trading_day_id
}
# AFTER:
initial_config = {
"TODAY_DATE": date,
"SIGNATURE": model_sig,
"IF_TRADE": True, # FIX: Trades are expected by default
"JOB_ID": job_id,
"TRADING_DAY_ID": trading_day_id
}
```
**Step 4: Run test to verify it passes**
Run: `./venv/bin/python -m pytest tests/unit/test_runtime_manager.py::TestRuntimeConfigManager::test_create_runtime_config_if_trade_defaults_true -v`
Expected: PASS
**Step 5: Update existing test expectations**
The existing test `test_create_runtime_config_creates_file` at line 66 expects `IF_TRADE` to be `False`. Update it:
```python
# In test_create_runtime_config_creates_file, change line 66:
# BEFORE:
assert config["IF_TRADE"] is False
# AFTER:
assert config["IF_TRADE"] is True
```
**Step 6: Run all runtime_manager tests**
Run: `./venv/bin/python -m pytest tests/unit/test_runtime_manager.py -v`
Expected: All tests PASS
**Step 7: Verify integration test expectations**
Check `tests/integration/test_agent_pnl_integration.py` line 63:
```python
# Current mock (line 62-66):
mock_get_config.side_effect = lambda key: {
"IF_TRADE": False, # This may need updating depending on test scenario
"JOB_ID": "test-job",
"TODAY_DATE": "2025-01-15",
"SIGNATURE": "test-model"
}.get(key)
```
This test mocks a no-trade scenario, so `False` is correct here. No change needed.
**Step 8: Commit IF_TRADE fix**
```bash
git add api/runtime_manager.py tests/unit/test_runtime_manager.py
git commit -m "fix: initialize IF_TRADE to True (trades expected by default)
Root cause: IF_TRADE was initialized to False and never updated when
trades executed, causing 'No trading' message to always display.
Design documents (2025-02-11-complete-schema-migration) specify
IF_TRADE should start as True, with trades setting it to False only
after completion.
Fixes sporadic issue where all trading sessions reported 'No trading'
despite successful buy/sell actions."
```
---
## Task 2: Add langchain-deepseek Dependency
**Files:**
- Modify: `requirements.txt`
- Verify: `venv/bin/pip list`
**Step 1: Add langchain-deepseek to requirements.txt**
Add after line 3 in `requirements.txt`:
```txt
langchain==1.0.2
langchain-openai==1.0.1
langchain-mcp-adapters>=0.1.0
langchain-deepseek>=0.1.20
```
**Step 2: Install new dependency**
Run: `./venv/bin/pip install -r requirements.txt`
Expected: Successfully installs `langchain-deepseek` and its dependencies
**Step 3: Verify installation**
Run: `./venv/bin/pip show langchain-deepseek`
Expected: Shows package info with version >= 0.1.20
**Step 4: Commit dependency addition**
```bash
git add requirements.txt
git commit -m "deps: add langchain-deepseek for native DeepSeek support
Adds official LangChain DeepSeek integration to replace ChatOpenAI
wrapper approach for DeepSeek models. Native integration provides:
- Better tool_calls argument parsing
- DeepSeek-specific error handling
- No OpenAI compatibility layer issues
Version 0.1.20+ includes tool calling support for deepseek-chat."
```
---
## Task 3: Implement Model Provider Factory
**Files:**
- Create: `agent/model_factory.py`
- Test: `tests/unit/test_model_factory.py`
**Rationale:**
Currently `base_agent.py` hardcodes model creation logic. Extract to factory pattern to support multiple providers (OpenAI, DeepSeek, Anthropic, etc.) with provider-specific handling.
**Step 1: Write failing test for model factory**
Create `tests/unit/test_model_factory.py`:
```python
"""Unit tests for model factory - provider-specific model creation"""
import pytest
from unittest.mock import Mock, patch
from agent.model_factory import create_model
class TestModelFactory:
"""Tests for create_model factory function"""
@patch('agent.model_factory.ChatDeepSeek')
def test_create_model_deepseek(self, mock_deepseek_class):
"""Test that DeepSeek models use ChatDeepSeek"""
mock_model = Mock()
mock_deepseek_class.return_value = mock_model
result = create_model(
basemodel="deepseek/deepseek-chat",
api_key="test-key",
base_url="https://api.deepseek.com",
temperature=0.7,
timeout=30
)
# Verify ChatDeepSeek was called with correct params
mock_deepseek_class.assert_called_once_with(
model="deepseek-chat", # Extracted from "deepseek/deepseek-chat"
api_key="test-key",
base_url="https://api.deepseek.com",
temperature=0.7,
timeout=30
)
assert result == mock_model
@patch('agent.model_factory.ChatOpenAI')
def test_create_model_openai(self, mock_openai_class):
"""Test that OpenAI models use ChatOpenAI"""
mock_model = Mock()
mock_openai_class.return_value = mock_model
result = create_model(
basemodel="openai/gpt-4",
api_key="test-key",
base_url="https://api.openai.com/v1",
temperature=0.7,
timeout=30
)
# Verify ChatOpenAI was called with correct params
mock_openai_class.assert_called_once_with(
model="openai/gpt-4",
api_key="test-key",
base_url="https://api.openai.com/v1",
temperature=0.7,
timeout=30
)
assert result == mock_model
@patch('agent.model_factory.ChatOpenAI')
def test_create_model_anthropic(self, mock_openai_class):
"""Test that Anthropic models use ChatOpenAI (via compatibility)"""
mock_model = Mock()
mock_openai_class.return_value = mock_model
result = create_model(
basemodel="anthropic/claude-sonnet-4.5",
api_key="test-key",
base_url="https://api.anthropic.com/v1",
temperature=0.7,
timeout=30
)
# Verify ChatOpenAI was used (Anthropic via OpenAI-compatible endpoint)
mock_openai_class.assert_called_once()
assert result == mock_model
@patch('agent.model_factory.ChatOpenAI')
def test_create_model_generic_provider(self, mock_openai_class):
"""Test that unknown providers default to ChatOpenAI"""
mock_model = Mock()
mock_openai_class.return_value = mock_model
result = create_model(
basemodel="custom/custom-model",
api_key="test-key",
base_url="https://api.custom.com",
temperature=0.7,
timeout=30
)
# Should fall back to ChatOpenAI for unknown providers
mock_openai_class.assert_called_once()
assert result == mock_model
def test_create_model_deepseek_extracts_model_name(self):
"""Test that DeepSeek model name is extracted correctly"""
with patch('agent.model_factory.ChatDeepSeek') as mock_class:
create_model(
basemodel="deepseek/deepseek-chat-v3.1",
api_key="key",
base_url="url",
temperature=0,
timeout=30
)
# Check that model param is just "deepseek-chat-v3.1"
call_kwargs = mock_class.call_args[1]
assert call_kwargs['model'] == "deepseek-chat-v3.1"
```
**Step 2: Run test to verify it fails**
Run: `./venv/bin/python -m pytest tests/unit/test_model_factory.py -v`
Expected: FAIL with "ModuleNotFoundError: No module named 'agent.model_factory'"
**Step 3: Implement model factory**
Create `agent/model_factory.py`:
```python
"""
Model factory for creating provider-specific chat models.
Supports multiple AI providers with native integrations where available:
- DeepSeek: Uses ChatDeepSeek for native tool calling support
- OpenAI: Uses ChatOpenAI
- Others: Fall back to ChatOpenAI (OpenAI-compatible endpoints)
"""
from typing import Any
from langchain_openai import ChatOpenAI
from langchain_deepseek import ChatDeepSeek
def create_model(
basemodel: str,
api_key: str,
base_url: str,
temperature: float,
timeout: int
) -> Any:
"""
Create appropriate chat model based on provider.
Args:
basemodel: Model identifier (e.g., "deepseek/deepseek-chat", "openai/gpt-4")
api_key: API key for the provider
base_url: Base URL for API endpoint
temperature: Sampling temperature (0-1)
timeout: Request timeout in seconds
Returns:
Provider-specific chat model instance
Examples:
>>> model = create_model("deepseek/deepseek-chat", "key", "url", 0.7, 30)
>>> isinstance(model, ChatDeepSeek)
True
>>> model = create_model("openai/gpt-4", "key", "url", 0.7, 30)
>>> isinstance(model, ChatOpenAI)
True
"""
# Extract provider from basemodel (format: "provider/model-name")
provider = basemodel.split("/")[0].lower() if "/" in basemodel else "unknown"
if provider == "deepseek":
# Use native ChatDeepSeek for DeepSeek models
# Extract model name without provider prefix
model_name = basemodel.split("/", 1)[1] if "/" in basemodel else basemodel
return ChatDeepSeek(
model=model_name,
api_key=api_key,
base_url=base_url,
temperature=temperature,
timeout=timeout
)
else:
# Use ChatOpenAI for OpenAI and OpenAI-compatible endpoints
# (Anthropic, Google, Qwen, etc. via compatibility layer)
return ChatOpenAI(
model=basemodel,
api_key=api_key,
base_url=base_url,
temperature=temperature,
timeout=timeout
)
```
**Step 4: Run tests to verify they pass**
Run: `./venv/bin/python -m pytest tests/unit/test_model_factory.py -v`
Expected: All tests PASS
**Step 5: Commit model factory**
```bash
git add agent/model_factory.py tests/unit/test_model_factory.py
git commit -m "feat: add model factory for provider-specific chat models
Implements factory pattern to create appropriate chat model based on
provider prefix in basemodel string.
Supported providers:
- deepseek/*: Uses ChatDeepSeek (native tool calling)
- openai/*: Uses ChatOpenAI
- others: Fall back to ChatOpenAI (OpenAI-compatible)
This enables native DeepSeek integration while maintaining backward
compatibility with existing OpenAI-compatible providers."
```
---
## Task 4: Integrate Model Factory into BaseAgent
**Files:**
- Modify: `agent/base_agent/base_agent.py:146-220`
- Test: `tests/unit/test_base_agent.py` (if exists) or manual verification
**Step 1: Import model factory in base_agent.py**
Add after line 33 in `agent/base_agent/base_agent.py`:
```python
from agent.reasoning_summarizer import ReasoningSummarizer
from agent.model_factory import create_model # ADD THIS
```
**Step 2: Replace model creation logic**
Replace lines 208-220 in `agent/base_agent/base_agent.py`:
```python
# BEFORE (lines 208-220):
if is_dev_mode():
from agent.mock_provider import MockChatModel
self.model = MockChatModel(date="2025-01-01")
print(f"🤖 Using MockChatModel (DEV mode)")
else:
self.model = ChatOpenAI(
model=self.basemodel,
base_url=self.openai_base_url,
api_key=self.openai_api_key,
temperature=0.7,
timeout=30
)
print(f"🤖 Using {self.basemodel} (PROD mode)")
# AFTER:
if is_dev_mode():
from agent.mock_provider import MockChatModel
self.model = MockChatModel(date="2025-01-01")
print(f"🤖 Using MockChatModel (DEV mode)")
else:
# Use model factory for provider-specific implementations
self.model = create_model(
basemodel=self.basemodel,
api_key=self.openai_api_key,
base_url=self.openai_base_url,
temperature=0.7,
timeout=30
)
# Determine model type for logging
model_class = self.model.__class__.__name__
print(f"🤖 Using {self.basemodel} via {model_class} (PROD mode)")
```
**Step 3: Remove ChatOpenAI import if no longer used**
Check if `ChatOpenAI` is imported but no longer used in `base_agent.py`:
```python
# If line ~11 has:
from langchain_openai import ChatOpenAI
# And it's only used in the section we just replaced, remove it
# (Keep if used elsewhere in file)
```
**Step 4: Manual verification test**
Since this changes core agent initialization, test with actual execution:
Run: `DEPLOYMENT_MODE=DEV python main.py configs/default_config.json`
Expected:
- Logs show "Using deepseek-chat-v3.1 via ChatDeepSeek (PROD mode)" for DeepSeek
- Logs show "Using openai/gpt-5 via ChatOpenAI (PROD mode)" for OpenAI
- No import errors or model creation failures
**Step 5: Run existing agent tests**
Run: `./venv/bin/python -m pytest tests/unit/ -k agent -v`
Expected: All agent-related tests still PASS (factory is transparent to existing behavior)
**Step 6: Commit model factory integration**
```bash
git add agent/base_agent/base_agent.py
git commit -m "refactor: use model factory in BaseAgent
Replaces direct ChatOpenAI instantiation with create_model() factory.
Benefits:
- DeepSeek models now use native ChatDeepSeek
- Other models continue using ChatOpenAI
- Provider-specific optimizations in one place
- Easier to add new providers
Logging now shows both model name and provider class for debugging."
```
---
## Task 5: Add Integration Test for DeepSeek Tool Calls
**Files:**
- Create: `tests/integration/test_deepseek_tool_calls.py`
- Reference: `agent_tools/tool_math.py` (math tool for testing)
**Rationale:**
Verify that DeepSeek's tool_calls arguments are properly parsed to dicts without Pydantic validation errors.
**Step 1: Write integration test**
Create `tests/integration/test_deepseek_tool_calls.py`:
```python
"""
Integration test for DeepSeek tool calls argument parsing.
Tests that ChatDeepSeek properly converts tool_calls.arguments (JSON string)
to tool_calls.args (dict) without Pydantic validation errors.
"""
import pytest
import os
from unittest.mock import patch, AsyncMock
from langchain_core.messages import AIMessage
from agent.model_factory import create_model
@pytest.mark.integration
class TestDeepSeekToolCalls:
"""Integration tests for DeepSeek tool calling"""
def test_create_model_returns_chat_deepseek_for_deepseek_models(self):
"""Verify that DeepSeek models use ChatDeepSeek class"""
# Skip if no DeepSeek API key available
if not os.getenv("OPENAI_API_KEY"):
pytest.skip("OPENAI_API_KEY not available")
model = create_model(
basemodel="deepseek/deepseek-chat",
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_API_BASE"),
temperature=0,
timeout=30
)
# Verify it's a ChatDeepSeek instance
assert model.__class__.__name__ == "ChatDeepSeek"
@pytest.mark.asyncio
async def test_deepseek_tool_calls_args_are_dicts(self):
"""Test that DeepSeek tool_calls.args are dicts, not strings"""
# Skip if no API key
if not os.getenv("OPENAI_API_KEY"):
pytest.skip("OPENAI_API_KEY not available")
# Create DeepSeek model
model = create_model(
basemodel="deepseek/deepseek-chat",
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_API_BASE"),
temperature=0,
timeout=30
)
# Bind a simple math tool
from langchain_core.tools import tool
@tool
def add(a: float, b: float) -> float:
"""Add two numbers"""
return a + b
model_with_tools = model.bind_tools([add])
# Invoke with a query that should trigger tool call
result = await model_with_tools.ainvoke(
"What is 5 plus 3?"
)
# Verify response is AIMessage
assert isinstance(result, AIMessage)
# Verify tool_calls exist
assert len(result.tool_calls) > 0, "Expected at least one tool call"
# Verify args are dicts, not strings
for tool_call in result.tool_calls:
assert isinstance(tool_call['args'], dict), \
f"tool_calls.args should be dict, got {type(tool_call['args'])}"
assert 'a' in tool_call['args'], "Missing expected arg 'a'"
assert 'b' in tool_call['args'], "Missing expected arg 'b'"
@pytest.mark.asyncio
async def test_deepseek_no_pydantic_validation_errors(self):
"""Test that DeepSeek doesn't produce Pydantic validation errors"""
# Skip if no API key
if not os.getenv("OPENAI_API_KEY"):
pytest.skip("OPENAI_API_KEY not available")
model = create_model(
basemodel="deepseek/deepseek-chat",
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_API_BASE"),
temperature=0,
timeout=30
)
from langchain_core.tools import tool
@tool
def multiply(a: float, b: float) -> float:
"""Multiply two numbers"""
return a * b
model_with_tools = model.bind_tools([multiply])
# This should NOT raise Pydantic validation errors
try:
result = await model_with_tools.ainvoke(
"Calculate 7 times 8"
)
assert isinstance(result, AIMessage)
except Exception as e:
# Check that it's not a Pydantic validation error
error_msg = str(e).lower()
assert "validation error" not in error_msg, \
f"Pydantic validation error occurred: {e}"
assert "input should be a valid dictionary" not in error_msg, \
f"tool_calls.args validation error occurred: {e}"
# Re-raise if it's a different error
raise
```
**Step 2: Run test (may require API access)**
Run: `./venv/bin/python -m pytest tests/integration/test_deepseek_tool_calls.py -v`
Expected:
- If API key available: Tests PASS
- If no API key: Tests SKIPPED with message "OPENAI_API_KEY not available"
**Step 3: Commit integration test**
```bash
git add tests/integration/test_deepseek_tool_calls.py
git commit -m "test: add DeepSeek tool calls integration tests
Verifies that ChatDeepSeek properly handles tool_calls arguments:
- Returns ChatDeepSeek for deepseek/* models
- tool_calls.args are dicts (not JSON strings)
- No Pydantic validation errors on args
Tests skip gracefully if API keys not available."
```
---
## Task 6: Update Documentation
**Files:**
- Modify: `CLAUDE.md` (sections on model configuration and troubleshooting)
- Modify: `CHANGELOG.md`
**Step 1: Update CLAUDE.md architecture section**
In `CLAUDE.md`, find the "Agent System" section (around line 125) and update:
```markdown
**BaseAgent Key Methods:**
- `initialize()`: Connect to MCP services, create AI model via model factory
- `run_trading_session(date)`: Execute single day's trading with retry logic
- `run_date_range(init_date, end_date)`: Process all weekdays in range
- `get_trading_dates()`: Resume from last date in position.jsonl
- `register_agent()`: Create initial position file with $10,000 cash
**Model Factory:**
The `create_model()` factory automatically selects the appropriate chat model:
- `deepseek/*` models → `ChatDeepSeek` (native tool calling support)
- `openai/*` models → `ChatOpenAI`
- Other providers → `ChatOpenAI` (OpenAI-compatible endpoint)
**Adding Custom Agents:**
[existing content remains the same]
```
**Step 2: Update CLAUDE.md common issues section**
In `CLAUDE.md`, find "Common Issues" (around line 420) and add:
```markdown
**DeepSeek Pydantic Validation Errors:**
- Error: "Input should be a valid dictionary [type=dict_type, input_value='...', input_type=str]"
- Cause: Using `ChatOpenAI` for DeepSeek models (OpenAI compatibility layer issue)
- Fix: Ensure `langchain-deepseek` is installed and basemodel uses `deepseek/` prefix
- The model factory automatically uses `ChatDeepSeek` for native support
```
**Step 3: Update CHANGELOG.md**
Add new version entry at top of `CHANGELOG.md`:
```markdown
## [0.4.2] - 2025-11-05
### Fixed
- Fixed "No trading" message always displaying despite trading activity by initializing `IF_TRADE` to `True` (trades expected by default)
- Resolved sporadic Pydantic validation errors for DeepSeek tool_calls arguments by switching to native `ChatDeepSeek` integration
### Added
- Added `agent/model_factory.py` for provider-specific model creation
- Added `langchain-deepseek` dependency for native DeepSeek support
- Added integration tests for DeepSeek tool calls argument parsing
### Changed
- `BaseAgent` now uses model factory instead of direct `ChatOpenAI` instantiation
- DeepSeek models (`deepseek/*`) now use `ChatDeepSeek` instead of OpenAI compatibility layer
```
**Step 4: Commit documentation updates**
```bash
git add CLAUDE.md CHANGELOG.md
git commit -m "docs: update for IF_TRADE and DeepSeek fixes
- Document model factory architecture
- Add troubleshooting for DeepSeek validation errors
- Update changelog with version 0.4.2 fixes"
```
---
## Task 7: End-to-End Verification
**Files:**
- Test: Full simulation with DeepSeek model
- Verify: Logs show correct messages and no errors
**Step 1: Run simulation with DeepSeek in PROD mode**
Run:
```bash
# Ensure API keys are set
export OPENAI_API_KEY="your-deepseek-key"
export OPENAI_API_BASE="https://api.deepseek.com/v1"
export DEPLOYMENT_MODE=PROD
# Run short simulation (1 day)
python main.py configs/default_config.json
```
**Step 2: Verify expected behaviors**
Check logs for:
1. **Model initialization:**
- ✅ Should show: "🤖 Using deepseek/deepseek-chat-v3.1 via ChatDeepSeek (PROD mode)"
- ❌ Should NOT show: "via ChatOpenAI"
2. **Tool calls execution:**
- ✅ Should show: "[DEBUG] Extracted X tool messages from response"
- ❌ Should NOT show: "⚠️ Attempt 1 failed" with Pydantic validation errors
- Note: If retries occur for other reasons (network, rate limits), that's OK
3. **Trading completion:**
- ✅ Should show: "✅ Trading completed" (if trades occurred)
- ❌ Should NOT show: "📊 No trading, maintaining positions" (if trades occurred)
**Step 3: Check database for trade records**
Run:
```bash
sqlite3 data/jobs.db "SELECT job_id, date, model, status FROM trading_days ORDER BY created_at DESC LIMIT 5;"
```
Expected: Recent records show `status='completed'` for DeepSeek runs
**Step 4: Verify position tracking**
Run:
```bash
sqlite3 data/jobs.db "SELECT trading_day_id, action_type, symbol, quantity FROM actions WHERE trading_day_id IN (SELECT id FROM trading_days ORDER BY created_at DESC LIMIT 1);"
```
Expected: Shows buy/sell actions if AI made trades
**Step 5: Run test suite**
Run full test suite to ensure no regressions:
```bash
./venv/bin/python -m pytest tests/ -v --cov=. --cov-report=term-missing
```
Expected: All tests PASS, coverage >= 85%
**Step 6: Final commit**
```bash
git add -A
git commit -m "chore: verify end-to-end functionality after fixes
Confirmed:
- DeepSeek models use ChatDeepSeek (no validation errors)
- Trading completion shows correct message
- Database tracking works correctly
- All tests pass with good coverage"
```
---
## Summary of Changes
### Files Modified:
1. `api/runtime_manager.py` - IF_TRADE initialization fix
2. `requirements.txt` - Added langchain-deepseek dependency
3. `agent/base_agent/base_agent.py` - Integrated model factory
4. `tests/unit/test_runtime_manager.py` - Updated test expectations
5. `CLAUDE.md` - Architecture and troubleshooting updates
6. `CHANGELOG.md` - Version 0.4.2 release notes
### Files Created:
1. `agent/model_factory.py` - Provider-specific model creation
2. `tests/unit/test_model_factory.py` - Model factory tests
3. `tests/integration/test_deepseek_tool_calls.py` - DeepSeek integration tests
### Testing Strategy:
- Unit tests for IF_TRADE initialization
- Unit tests for model factory provider routing
- Integration tests for DeepSeek tool calls
- End-to-end verification with real simulation
- Full test suite regression check
### Verification Commands:
```bash
# Quick test (unit tests only)
bash scripts/quick_test.sh
# Full test suite
bash scripts/run_tests.sh
# End-to-end simulation
DEPLOYMENT_MODE=PROD python main.py configs/default_config.json
```
---
## Notes for Engineer
**Key Architectural Changes:**
- Factory pattern separates provider-specific logic from agent core
- Native integrations preferred over compatibility layers
- IF_TRADE semantics: True = trades expected, tools set to False after execution
**Why These Fixes Work:**
1. **IF_TRADE**: Design always intended True initialization, False was a typo
2. **DeepSeek**: Native ChatDeepSeek handles tool_calls parsing correctly, eliminating sporadic OpenAI compatibility layer bugs
**Testing Philosophy:**
- @superpowers:test-driven-development - Write test first, watch it fail, implement, verify pass
- @superpowers:verification-before-completion - Never claim "it works" without running verification
- Each commit should leave the codebase in a working state
**If You Get Stuck:**
- Check logs for exact error messages
- Run pytest with `-vv` for verbose output
- Use `git diff` to verify changes match plan
- @superpowers:systematic-debugging - Never guess, always investigate root cause

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,7 @@
langchain==1.0.2
langchain-openai==1.0.1
langchain-mcp-adapters>=0.1.0
langchain-deepseek>=0.1.20
fastmcp==2.12.5
fastapi>=0.120.0
uvicorn[standard]>=0.27.0

View File

@@ -405,12 +405,11 @@ class TestAsyncDownload:
db_path = api_client.db_path
job_manager = JobManager(db_path=db_path)
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Add warnings
warnings = ["Rate limited", "Skipped 1 date"]

View File

@@ -12,12 +12,11 @@ def test_worker_prepares_data_before_execution(tmp_path):
job_manager = JobManager(db_path=db_path)
# Create job
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="configs/default_config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=db_path)
@@ -47,12 +46,11 @@ def test_worker_handles_no_available_dates(tmp_path):
initialize_database(db_path)
job_manager = JobManager(db_path=db_path)
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="configs/default_config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=db_path)
@@ -76,12 +74,11 @@ def test_worker_stores_warnings(tmp_path):
initialize_database(db_path)
job_manager = JobManager(db_path=db_path)
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="configs/default_config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=db_path)

View File

@@ -0,0 +1,118 @@
"""
Integration test for DeepSeek tool calls argument parsing.
Tests that ChatDeepSeek properly converts tool_calls.arguments (JSON string)
to tool_calls.args (dict) without Pydantic validation errors.
"""
import pytest
import os
from unittest.mock import patch, AsyncMock
from langchain_core.messages import AIMessage
from agent.model_factory import create_model
@pytest.mark.integration
class TestDeepSeekToolCalls:
"""Integration tests for DeepSeek tool calling"""
def test_create_model_returns_chat_deepseek_for_deepseek_models(self):
"""Verify that DeepSeek models use ChatDeepSeek class"""
# Skip if no DeepSeek API key available
if not os.getenv("OPENAI_API_KEY"):
pytest.skip("OPENAI_API_KEY not available")
model = create_model(
basemodel="deepseek/deepseek-chat",
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_API_BASE"),
temperature=0,
timeout=30
)
# Verify it's a ChatDeepSeek instance
assert model.__class__.__name__ == "ChatDeepSeek"
@pytest.mark.asyncio
async def test_deepseek_tool_calls_args_are_dicts(self):
"""Test that DeepSeek tool_calls.args are dicts, not strings"""
# Skip if no API key
if not os.getenv("OPENAI_API_KEY"):
pytest.skip("OPENAI_API_KEY not available")
# Create DeepSeek model
model = create_model(
basemodel="deepseek/deepseek-chat",
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_API_BASE"),
temperature=0,
timeout=30
)
# Bind a simple math tool
from langchain_core.tools import tool
@tool
def add(a: float, b: float) -> float:
"""Add two numbers"""
return a + b
model_with_tools = model.bind_tools([add])
# Invoke with a query that should trigger tool call
result = await model_with_tools.ainvoke(
"What is 5 plus 3?"
)
# Verify response is AIMessage
assert isinstance(result, AIMessage)
# Verify tool_calls exist
assert len(result.tool_calls) > 0, "Expected at least one tool call"
# Verify args are dicts, not strings
for tool_call in result.tool_calls:
assert isinstance(tool_call['args'], dict), \
f"tool_calls.args should be dict, got {type(tool_call['args'])}"
assert 'a' in tool_call['args'], "Missing expected arg 'a'"
assert 'b' in tool_call['args'], "Missing expected arg 'b'"
@pytest.mark.asyncio
async def test_deepseek_no_pydantic_validation_errors(self):
"""Test that DeepSeek doesn't produce Pydantic validation errors"""
# Skip if no API key
if not os.getenv("OPENAI_API_KEY"):
pytest.skip("OPENAI_API_KEY not available")
model = create_model(
basemodel="deepseek/deepseek-chat",
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_API_BASE"),
temperature=0,
timeout=30
)
from langchain_core.tools import tool
@tool
def multiply(a: float, b: float) -> float:
"""Multiply two numbers"""
return a * b
model_with_tools = model.bind_tools([multiply])
# This should NOT raise Pydantic validation errors
try:
result = await model_with_tools.ainvoke(
"Calculate 7 times 8"
)
assert isinstance(result, AIMessage)
except Exception as e:
# Check that it's not a Pydantic validation error
error_msg = str(e).lower()
assert "validation error" not in error_msg, \
f"Pydantic validation error occurred: {e}"
assert "input should be a valid dictionary" not in error_msg, \
f"tool_calls.args validation error occurred: {e}"
# Re-raise if it's a different error
raise

View File

@@ -1,278 +0,0 @@
"""Integration test for duplicate simulation prevention."""
import pytest
import tempfile
import os
import json
from pathlib import Path
from api.job_manager import JobManager
from api.model_day_executor import ModelDayExecutor
from api.database import get_db_connection
pytestmark = pytest.mark.integration
@pytest.fixture
def temp_env(tmp_path):
"""Create temporary environment with db and config."""
# Create temp database
db_path = str(tmp_path / "test_jobs.db")
# Initialize database
conn = get_db_connection(db_path)
cursor = conn.cursor()
# Create schema
cursor.execute("""
CREATE TABLE IF NOT EXISTS jobs (
job_id TEXT PRIMARY KEY,
config_path TEXT NOT NULL,
status TEXT NOT NULL,
date_range TEXT NOT NULL,
models TEXT NOT NULL,
created_at TEXT NOT NULL,
started_at TEXT,
updated_at TEXT,
completed_at TEXT,
total_duration_seconds REAL,
error TEXT,
warnings TEXT
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS job_details (
id INTEGER PRIMARY KEY AUTOINCREMENT,
job_id TEXT NOT NULL,
date TEXT NOT NULL,
model TEXT NOT NULL,
status TEXT NOT NULL,
started_at TEXT,
completed_at TEXT,
duration_seconds REAL,
error TEXT,
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE,
UNIQUE(job_id, date, model)
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS trading_days (
id INTEGER PRIMARY KEY AUTOINCREMENT,
job_id TEXT NOT NULL,
model TEXT NOT NULL,
date TEXT NOT NULL,
starting_cash REAL NOT NULL,
ending_cash REAL NOT NULL,
profit REAL NOT NULL,
return_pct REAL NOT NULL,
portfolio_value REAL NOT NULL,
reasoning_summary TEXT,
reasoning_full TEXT,
completed_at TEXT,
session_duration_seconds REAL,
UNIQUE(job_id, model, date)
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS holdings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
trading_day_id INTEGER NOT NULL,
symbol TEXT NOT NULL,
quantity INTEGER NOT NULL,
FOREIGN KEY (trading_day_id) REFERENCES trading_days(id) ON DELETE CASCADE
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS actions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
trading_day_id INTEGER NOT NULL,
action_type TEXT NOT NULL,
symbol TEXT NOT NULL,
quantity INTEGER NOT NULL,
price REAL NOT NULL,
created_at TEXT NOT NULL,
FOREIGN KEY (trading_day_id) REFERENCES trading_days(id) ON DELETE CASCADE
)
""")
conn.commit()
conn.close()
# Create mock config
config_path = str(tmp_path / "test_config.json")
config = {
"models": [
{
"signature": "test-model",
"basemodel": "mock/model",
"enabled": True
}
],
"agent_config": {
"max_steps": 10,
"initial_cash": 10000.0
},
"log_config": {
"log_path": str(tmp_path / "logs")
},
"date_range": {
"init_date": "2025-10-13"
}
}
with open(config_path, 'w') as f:
json.dump(config, f)
yield {
"db_path": db_path,
"config_path": config_path,
"data_dir": str(tmp_path)
}
def test_duplicate_simulation_is_skipped(temp_env):
"""Test that overlapping job skips already-completed simulation."""
manager = JobManager(db_path=temp_env["db_path"])
# Create first job
result_1 = manager.create_job(
config_path=temp_env["config_path"],
date_range=["2025-10-15"],
models=["test-model"]
)
job_id_1 = result_1["job_id"]
# Simulate completion by manually inserting trading_day record
conn = get_db_connection(temp_env["db_path"])
cursor = conn.cursor()
cursor.execute("""
INSERT INTO trading_days (
job_id, model, date, starting_cash, ending_cash,
profit, return_pct, portfolio_value, completed_at
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
job_id_1,
"test-model",
"2025-10-15",
10000.0,
9500.0,
-500.0,
-5.0,
9500.0,
"2025-11-07T01:00:00Z"
))
conn.commit()
conn.close()
# Mark job_detail as completed
manager.update_job_detail_status(
job_id_1,
"2025-10-15",
"test-model",
"completed"
)
# Try to create second job with same model-day
result_2 = manager.create_job(
config_path=temp_env["config_path"],
date_range=["2025-10-15", "2025-10-16"],
models=["test-model"]
)
# Should have warnings about skipped simulation
assert len(result_2["warnings"]) == 1
assert "2025-10-15" in result_2["warnings"][0]
# Should only create job_detail for 2025-10-16
details = manager.get_job_details(result_2["job_id"])
assert len(details) == 1
assert details[0]["date"] == "2025-10-16"
def test_portfolio_continues_from_previous_job(temp_env):
"""Test that new job continues portfolio from previous job's last day."""
manager = JobManager(db_path=temp_env["db_path"])
# Create and complete first job
result_1 = manager.create_job(
config_path=temp_env["config_path"],
date_range=["2025-10-13"],
models=["test-model"]
)
job_id_1 = result_1["job_id"]
# Insert completed trading_day with holdings
conn = get_db_connection(temp_env["db_path"])
cursor = conn.cursor()
cursor.execute("""
INSERT INTO trading_days (
job_id, model, date, starting_cash, ending_cash,
profit, return_pct, portfolio_value, completed_at
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
job_id_1,
"test-model",
"2025-10-13",
10000.0,
5000.0,
0.0,
0.0,
15000.0,
"2025-11-07T01:00:00Z"
))
trading_day_id = cursor.lastrowid
cursor.execute("""
INSERT INTO holdings (trading_day_id, symbol, quantity)
VALUES (?, ?, ?)
""", (trading_day_id, "AAPL", 10))
conn.commit()
# Mark as completed
manager.update_job_detail_status(job_id_1, "2025-10-13", "test-model", "completed")
manager.update_job_status(job_id_1, "completed")
# Create second job for next day
result_2 = manager.create_job(
config_path=temp_env["config_path"],
date_range=["2025-10-14"],
models=["test-model"]
)
job_id_2 = result_2["job_id"]
# Get starting position for 2025-10-14
from agent_tools.tool_trade import get_current_position_from_db
import agent_tools.tool_trade as trade_module
original_get_db_connection = trade_module.get_db_connection
def mock_get_db_connection(path):
return get_db_connection(temp_env["db_path"])
trade_module.get_db_connection = mock_get_db_connection
try:
position, _ = get_current_position_from_db(
job_id=job_id_2,
model="test-model",
date="2025-10-14",
initial_cash=10000.0
)
# Should continue from job 1's ending position
assert position["CASH"] == 5000.0
assert position["AAPL"] == 10
finally:
# Restore original function
trade_module.get_db_connection = original_get_db_connection
conn.close()

View File

@@ -1,216 +0,0 @@
"""
Unit tests for ChatModelWrapper - tool_calls args parsing fix
"""
import json
import pytest
from unittest.mock import Mock, AsyncMock
from langchain_core.messages import AIMessage
from langchain_core.outputs import ChatResult, ChatGeneration
from agent.chat_model_wrapper import ToolCallArgsParsingWrapper
class TestToolCallArgsParsingWrapper:
"""Tests for ToolCallArgsParsingWrapper"""
@pytest.fixture
def mock_model(self):
"""Create a mock chat model"""
model = Mock()
model._llm_type = "mock-model"
return model
@pytest.fixture
def wrapper(self, mock_model):
"""Create a wrapper around mock model"""
return ToolCallArgsParsingWrapper(model=mock_model)
def test_fix_tool_calls_with_string_args(self, wrapper):
"""Test that string args are parsed to dict"""
# Create message with tool_calls where args is a JSON string
message = AIMessage(
content="",
tool_calls=[
{
"name": "buy",
"args": '{"symbol": "AAPL", "amount": 10}', # String, not dict
"id": "call_123"
}
]
)
fixed_message = wrapper._fix_tool_calls(message)
# Check that args is now a dict
assert isinstance(fixed_message.tool_calls[0]['args'], dict)
assert fixed_message.tool_calls[0]['args'] == {"symbol": "AAPL", "amount": 10}
def test_fix_tool_calls_with_dict_args(self, wrapper):
"""Test that dict args are left unchanged"""
# Create message with tool_calls where args is already a dict
message = AIMessage(
content="",
tool_calls=[
{
"name": "buy",
"args": {"symbol": "AAPL", "amount": 10}, # Already a dict
"id": "call_123"
}
]
)
fixed_message = wrapper._fix_tool_calls(message)
# Check that args is still a dict
assert isinstance(fixed_message.tool_calls[0]['args'], dict)
assert fixed_message.tool_calls[0]['args'] == {"symbol": "AAPL", "amount": 10}
def test_fix_tool_calls_with_invalid_json(self, wrapper):
"""Test that invalid JSON string is left unchanged"""
# Create message with tool_calls where args is an invalid JSON string
message = AIMessage(
content="",
tool_calls=[
{
"name": "buy",
"args": 'invalid json {', # Invalid JSON
"id": "call_123"
}
]
)
fixed_message = wrapper._fix_tool_calls(message)
# Check that args is still a string (parsing failed)
assert isinstance(fixed_message.tool_calls[0]['args'], str)
assert fixed_message.tool_calls[0]['args'] == 'invalid json {'
def test_fix_tool_calls_no_tool_calls(self, wrapper):
"""Test that messages without tool_calls are left unchanged"""
message = AIMessage(content="Hello, world!")
fixed_message = wrapper._fix_tool_calls(message)
assert fixed_message == message
def test_generate_with_string_args(self, wrapper, mock_model):
"""Test _generate method with string args"""
# Create a response with string args
original_message = AIMessage(
content="",
tool_calls=[
{
"name": "buy",
"args": '{"symbol": "MSFT", "amount": 5}',
"id": "call_456"
}
]
)
mock_result = ChatResult(
generations=[ChatGeneration(message=original_message)]
)
mock_model._generate.return_value = mock_result
# Call wrapper's _generate
result = wrapper._generate(messages=[], stop=None, run_manager=None)
# Check that args is now a dict
fixed_message = result.generations[0].message
assert isinstance(fixed_message.tool_calls[0]['args'], dict)
assert fixed_message.tool_calls[0]['args'] == {"symbol": "MSFT", "amount": 5}
@pytest.mark.asyncio
async def test_agenerate_with_string_args(self, wrapper, mock_model):
"""Test _agenerate method with string args"""
# Create a response with string args
original_message = AIMessage(
content="",
tool_calls=[
{
"name": "sell",
"args": '{"symbol": "GOOGL", "amount": 3}',
"id": "call_789"
}
]
)
mock_result = ChatResult(
generations=[ChatGeneration(message=original_message)]
)
mock_model._agenerate = AsyncMock(return_value=mock_result)
# Call wrapper's _agenerate
result = await wrapper._agenerate(messages=[], stop=None, run_manager=None)
# Check that args is now a dict
fixed_message = result.generations[0].message
assert isinstance(fixed_message.tool_calls[0]['args'], dict)
assert fixed_message.tool_calls[0]['args'] == {"symbol": "GOOGL", "amount": 3}
def test_invoke_with_string_args(self, wrapper, mock_model):
"""Test invoke method with string args"""
original_message = AIMessage(
content="",
tool_calls=[
{
"name": "buy",
"args": '{"symbol": "NVDA", "amount": 20}',
"id": "call_999"
}
]
)
mock_model.invoke.return_value = original_message
# Call wrapper's invoke
result = wrapper.invoke(input=[])
# Check that args is now a dict
assert isinstance(result.tool_calls[0]['args'], dict)
assert result.tool_calls[0]['args'] == {"symbol": "NVDA", "amount": 20}
@pytest.mark.asyncio
async def test_ainvoke_with_string_args(self, wrapper, mock_model):
"""Test ainvoke method with string args"""
original_message = AIMessage(
content="",
tool_calls=[
{
"name": "sell",
"args": '{"symbol": "TSLA", "amount": 15}',
"id": "call_111"
}
]
)
mock_model.ainvoke = AsyncMock(return_value=original_message)
# Call wrapper's ainvoke
result = await wrapper.ainvoke(input=[])
# Check that args is now a dict
assert isinstance(result.tool_calls[0]['args'], dict)
assert result.tool_calls[0]['args'] == {"symbol": "TSLA", "amount": 15}
def test_bind_tools_returns_wrapper(self, wrapper, mock_model):
"""Test that bind_tools returns a new wrapper"""
mock_bound = Mock()
mock_model.bind_tools.return_value = mock_bound
result = wrapper.bind_tools(tools=[], strict=True)
# Check that result is a wrapper around the bound model
assert isinstance(result, ToolCallArgsParsingWrapper)
assert result.wrapped_model == mock_bound
def test_bind_returns_wrapper(self, wrapper, mock_model):
"""Test that bind returns a new wrapper"""
mock_bound = Mock()
mock_model.bind.return_value = mock_bound
result = wrapper.bind(max_tokens=100)
# Check that result is a wrapper around the bound model
assert isinstance(result, ToolCallArgsParsingWrapper)
assert result.wrapped_model == mock_bound

View File

@@ -2,7 +2,6 @@
import pytest
from agent.context_injector import ContextInjector
from unittest.mock import Mock
@pytest.fixture
@@ -23,34 +22,27 @@ class MockRequest:
self.args = args or {}
def create_mcp_result(position_dict):
"""Create a mock MCP CallToolResult object matching production behavior."""
result = Mock()
result.structuredContent = position_dict
return result
async def mock_handler_success(request):
"""Mock handler that returns a successful position update as MCP CallToolResult."""
"""Mock handler that returns a successful position update."""
# Simulate a successful trade returning updated position
if request.name == "sell":
return create_mcp_result({
return {
"CASH": 1100.0,
"AAPL": 7,
"MSFT": 5
})
}
elif request.name == "buy":
return create_mcp_result({
return {
"CASH": 50.0,
"AAPL": 7,
"MSFT": 12
})
return create_mcp_result({})
}
return {}
async def mock_handler_error(request):
"""Mock handler that returns an error as MCP CallToolResult."""
return create_mcp_result({"error": "Insufficient cash"})
"""Mock handler that returns an error."""
return {"error": "Insufficient cash"}
@pytest.mark.asyncio
@@ -76,17 +68,17 @@ async def test_context_injector_injects_parameters(injector):
"""Test that context parameters are injected into buy/sell requests."""
request = MockRequest("buy", {"symbol": "AAPL", "amount": 10})
# Mock handler that returns MCP result containing the request args
# Mock handler that just returns the request args
async def handler(req):
return create_mcp_result(req.args)
return req.args
result = await injector(request, handler)
# Verify context was injected (result is MCP CallToolResult object)
assert result.structuredContent["signature"] == "test-model"
assert result.structuredContent["today_date"] == "2025-01-15"
assert result.structuredContent["job_id"] == "test-job-123"
assert result.structuredContent["trading_day_id"] == 1
# Verify context was injected
assert result["signature"] == "test-model"
assert result["today_date"] == "2025-01-15"
assert result["job_id"] == "test-job-123"
assert result["trading_day_id"] == 1
@pytest.mark.asyncio
@@ -140,7 +132,7 @@ async def test_context_injector_does_not_update_position_on_error(injector):
# Verify position was NOT updated
assert injector._current_position == original_position
assert "error" in result.structuredContent
assert "error" in result
@pytest.mark.asyncio
@@ -154,7 +146,7 @@ async def test_context_injector_does_not_inject_position_for_non_trade_tools(inj
async def verify_no_injection_handler(req):
assert "_current_position" not in req.args
return create_mcp_result({"results": []})
return {"results": []}
await injector(request, verify_no_injection_handler)
@@ -172,7 +164,7 @@ async def test_context_injector_full_trading_session_simulation(injector):
async def handler1(req):
# First trade should NOT have injected position
assert req.args.get("_current_position") is None
return create_mcp_result({"CASH": 1100.0, "AAPL": 7})
return {"CASH": 1100.0, "AAPL": 7}
result1 = await injector(request1, handler1)
assert injector._current_position == {"CASH": 1100.0, "AAPL": 7}
@@ -184,7 +176,7 @@ async def test_context_injector_full_trading_session_simulation(injector):
# Second trade SHOULD have injected position from trade 1
assert req.args["_current_position"]["CASH"] == 1100.0
assert req.args["_current_position"]["AAPL"] == 7
return create_mcp_result({"CASH": 50.0, "AAPL": 7, "MSFT": 7})
return {"CASH": 50.0, "AAPL": 7, "MSFT": 7}
result2 = await injector(request2, handler2)
assert injector._current_position == {"CASH": 50.0, "AAPL": 7, "MSFT": 7}
@@ -193,7 +185,7 @@ async def test_context_injector_full_trading_session_simulation(injector):
request3 = MockRequest("buy", {"symbol": "GOOGL", "amount": 100})
async def handler3(req):
return create_mcp_result({"error": "Insufficient cash", "cash_available": 50.0})
return {"error": "Insufficient cash", "cash_available": 50.0}
result3 = await injector(request3, handler3)
# Position should remain unchanged after failed trade

View File

@@ -1,229 +0,0 @@
"""Test portfolio continuity across multiple jobs."""
import pytest
import tempfile
import os
from agent_tools.tool_trade import get_current_position_from_db
from api.database import get_db_connection
@pytest.fixture
def temp_db():
"""Create temporary database with schema."""
fd, path = tempfile.mkstemp(suffix='.db')
os.close(fd)
conn = get_db_connection(path)
cursor = conn.cursor()
# Create trading_days table
cursor.execute("""
CREATE TABLE IF NOT EXISTS trading_days (
id INTEGER PRIMARY KEY AUTOINCREMENT,
job_id TEXT NOT NULL,
model TEXT NOT NULL,
date TEXT NOT NULL,
starting_cash REAL NOT NULL,
ending_cash REAL NOT NULL,
profit REAL NOT NULL,
return_pct REAL NOT NULL,
portfolio_value REAL NOT NULL,
reasoning_summary TEXT,
reasoning_full TEXT,
completed_at TEXT,
session_duration_seconds REAL,
UNIQUE(job_id, model, date)
)
""")
# Create holdings table
cursor.execute("""
CREATE TABLE IF NOT EXISTS holdings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
trading_day_id INTEGER NOT NULL,
symbol TEXT NOT NULL,
quantity INTEGER NOT NULL,
FOREIGN KEY (trading_day_id) REFERENCES trading_days(id) ON DELETE CASCADE
)
""")
conn.commit()
conn.close()
yield path
if os.path.exists(path):
os.remove(path)
def test_position_continuity_across_jobs(temp_db):
"""Test that position queries see history from previous jobs."""
# Insert trading_day from job 1
conn = get_db_connection(temp_db)
cursor = conn.cursor()
cursor.execute("""
INSERT INTO trading_days (
job_id, model, date, starting_cash, ending_cash,
profit, return_pct, portfolio_value, completed_at
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
"job-1-uuid",
"deepseek-chat-v3.1",
"2025-10-14",
10000.0,
5121.52, # Negative cash from buying
0.0,
0.0,
14993.945,
"2025-11-07T01:52:53Z"
))
trading_day_id = cursor.lastrowid
# Insert holdings from job 1
holdings = [
("ADBE", 5),
("AVGO", 5),
("CRWD", 5),
("GOOGL", 20),
("META", 5),
("MSFT", 5),
("NVDA", 10)
]
for symbol, quantity in holdings:
cursor.execute("""
INSERT INTO holdings (trading_day_id, symbol, quantity)
VALUES (?, ?, ?)
""", (trading_day_id, symbol, quantity))
conn.commit()
conn.close()
# Mock get_db_connection to return our test db
import agent_tools.tool_trade as trade_module
original_get_db_connection = trade_module.get_db_connection
def mock_get_db_connection(path):
return get_db_connection(temp_db)
trade_module.get_db_connection = mock_get_db_connection
try:
# Now query position for job 2 on next trading day
position, _ = get_current_position_from_db(
job_id="job-2-uuid", # Different job
model="deepseek-chat-v3.1",
date="2025-10-15",
initial_cash=10000.0
)
# Should see job 1's ending position, NOT initial $10k
assert position["CASH"] == 5121.52
assert position["ADBE"] == 5
assert position["AVGO"] == 5
assert position["CRWD"] == 5
assert position["GOOGL"] == 20
assert position["META"] == 5
assert position["MSFT"] == 5
assert position["NVDA"] == 10
finally:
# Restore original function
trade_module.get_db_connection = original_get_db_connection
def test_position_returns_initial_state_for_first_day(temp_db):
"""Test that first trading day returns initial cash."""
# Mock get_db_connection to return our test db
import agent_tools.tool_trade as trade_module
original_get_db_connection = trade_module.get_db_connection
def mock_get_db_connection(path):
return get_db_connection(temp_db)
trade_module.get_db_connection = mock_get_db_connection
try:
# No previous trading days exist
position, _ = get_current_position_from_db(
job_id="new-job-uuid",
model="new-model",
date="2025-10-13",
initial_cash=10000.0
)
# Should return initial position
assert position == {"CASH": 10000.0}
finally:
# Restore original function
trade_module.get_db_connection = original_get_db_connection
def test_position_uses_most_recent_prior_date(temp_db):
"""Test that position query uses the most recent date before current."""
conn = get_db_connection(temp_db)
cursor = conn.cursor()
# Insert two trading days
cursor.execute("""
INSERT INTO trading_days (
job_id, model, date, starting_cash, ending_cash,
profit, return_pct, portfolio_value, completed_at
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
"job-1",
"model-a",
"2025-10-13",
10000.0,
9500.0,
-500.0,
-5.0,
9500.0,
"2025-11-07T01:00:00Z"
))
cursor.execute("""
INSERT INTO trading_days (
job_id, model, date, starting_cash, ending_cash,
profit, return_pct, portfolio_value, completed_at
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
"job-2",
"model-a",
"2025-10-14",
9500.0,
12000.0,
2500.0,
26.3,
12000.0,
"2025-11-07T02:00:00Z"
))
conn.commit()
conn.close()
# Mock get_db_connection to return our test db
import agent_tools.tool_trade as trade_module
original_get_db_connection = trade_module.get_db_connection
def mock_get_db_connection(path):
return get_db_connection(temp_db)
trade_module.get_db_connection = mock_get_db_connection
try:
# Query for 2025-10-15 should use 2025-10-14's ending position
position, _ = get_current_position_from_db(
job_id="job-3",
model="model-a",
date="2025-10-15",
initial_cash=10000.0
)
assert position["CASH"] == 12000.0 # From 2025-10-14, not 2025-10-13
finally:
# Restore original function
trade_module.get_db_connection = original_get_db_connection

View File

@@ -130,44 +130,6 @@ class TestDatabaseHelpers:
assert previous is not None
assert previous["date"] == "2025-01-17"
def test_get_previous_trading_day_across_jobs(self, db):
"""Test retrieving previous trading day from different job (cross-job continuity)."""
# Setup: Create two jobs
db.connection.execute(
"INSERT INTO jobs (job_id, status, config_path, date_range, models, created_at) VALUES (?, ?, ?, ?, ?, ?)",
("job-1", "completed", "config.json", "2025-10-07,2025-10-07", "deepseek-chat-v3.1", "2025-11-07T00:00:00Z")
)
db.connection.execute(
"INSERT INTO jobs (job_id, status, config_path, date_range, models, created_at) VALUES (?, ?, ?, ?, ?, ?)",
("job-2", "running", "config.json", "2025-10-08,2025-10-08", "deepseek-chat-v3.1", "2025-11-07T01:00:00Z")
)
# Day 1 in job-1
db.create_trading_day(
job_id="job-1",
model="deepseek-chat-v3.1",
date="2025-10-07",
starting_cash=10000.0,
starting_portfolio_value=10000.0,
daily_profit=214.58,
daily_return_pct=2.15,
ending_cash=123.59,
ending_portfolio_value=10214.58
)
# Test: Get previous day from job-2 on next date
# Should find job-1's record (cross-job continuity)
previous = db.get_previous_trading_day(
job_id="job-2",
model="deepseek-chat-v3.1",
current_date="2025-10-08"
)
assert previous is not None
assert previous["date"] == "2025-10-07"
assert previous["ending_cash"] == 123.59
assert previous["ending_portfolio_value"] == 10214.58
def test_get_ending_holdings(self, db):
"""Test retrieving ending holdings for a trading day."""
db.connection.execute(

View File

@@ -26,12 +26,11 @@ class TestJobCreation:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5", "claude-3.7-sonnet"]
)
job_id = job_result["job_id"]
assert job_id is not None
job = manager.get_job(job_id)
@@ -45,12 +44,11 @@ class TestJobCreation:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
progress = manager.get_job_progress(job_id)
assert progress["total_model_days"] == 2 # 2 dates × 1 model
@@ -62,12 +60,11 @@ class TestJobCreation:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job1_result = manager.create_job(
job1_id = manager.create_job(
"configs/test.json",
["2025-01-16"],
["gpt-5"]
)
job1_id = job1_result["job_id"]
with pytest.raises(ValueError, match="Another simulation job is already running"):
manager.create_job(
@@ -81,22 +78,20 @@ class TestJobCreation:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job1_result = manager.create_job(
job1_id = manager.create_job(
"configs/test.json",
["2025-01-16"],
["gpt-5"]
)
job1_id = job1_result["job_id"]
manager.update_job_status(job1_id, "completed")
# Now second job should be allowed
job2_result = manager.create_job(
job2_id = manager.create_job(
"configs/test.json",
["2025-01-17"],
["gpt-5"]
)
job2_id = job2_result["job_id"]
assert job2_id is not None
@@ -109,12 +104,11 @@ class TestJobStatusTransitions:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
"configs/test.json",
["2025-01-16"],
["gpt-5"]
)
job_id = job_result["job_id"]
# Update detail to running
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
@@ -128,12 +122,11 @@ class TestJobStatusTransitions:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
"configs/test.json",
["2025-01-16"],
["gpt-5"]
)
job_id = job_result["job_id"]
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "completed")
@@ -148,12 +141,11 @@ class TestJobStatusTransitions:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
"configs/test.json",
["2025-01-16"],
["gpt-5", "claude-3.7-sonnet"]
)
job_id = job_result["job_id"]
# First model succeeds
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
@@ -191,12 +183,10 @@ class TestJobRetrieval:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job1_result = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
job1_id = job1_result["job_id"]
job1_id = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
manager.update_job_status(job1_id, "completed")
job2_result = manager.create_job("configs/test.json", ["2025-01-17"], ["gpt-5"])
job2_id = job2_result["job_id"]
job2_id = manager.create_job("configs/test.json", ["2025-01-17"], ["gpt-5"])
current = manager.get_current_job()
assert current["job_id"] == job2_id
@@ -214,12 +204,11 @@ class TestJobRetrieval:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
"configs/test.json",
["2025-01-16", "2025-01-17"],
["gpt-5"]
)
job_id = job_result["job_id"]
found = manager.find_job_by_date_range(["2025-01-16", "2025-01-17"])
assert found["job_id"] == job_id
@@ -248,12 +237,11 @@ class TestJobProgress:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
"configs/test.json",
["2025-01-16", "2025-01-17"],
["gpt-5"]
)
job_id = job_result["job_id"]
progress = manager.get_job_progress(job_id)
assert progress["total_model_days"] == 2
@@ -266,12 +254,11 @@ class TestJobProgress:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
"configs/test.json",
["2025-01-16"],
["gpt-5"]
)
job_id = job_result["job_id"]
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
@@ -283,12 +270,11 @@ class TestJobProgress:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
"configs/test.json",
["2025-01-16"],
["gpt-5", "claude-3.7-sonnet"]
)
job_id = job_result["job_id"]
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "completed")
@@ -325,8 +311,7 @@ class TestConcurrencyControl:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
job_id = job_result["job_id"]
job_id = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
manager.update_job_status(job_id, "running")
assert manager.can_start_new_job() is False
@@ -336,8 +321,7 @@ class TestConcurrencyControl:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
job_id = job_result["job_id"]
job_id = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
manager.update_job_status(job_id, "completed")
assert manager.can_start_new_job() is True
@@ -347,15 +331,13 @@ class TestConcurrencyControl:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job1_result = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
job1_id = job1_result["job_id"]
job1_id = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
# Complete first job
manager.update_job_status(job1_id, "completed")
# Create second job
job2_result = manager.create_job("configs/test.json", ["2025-01-17"], ["gpt-5"])
job2_id = job2_result["job_id"]
job2_id = manager.create_job("configs/test.json", ["2025-01-17"], ["gpt-5"])
running = manager.get_running_jobs()
assert len(running) == 1
@@ -386,13 +368,12 @@ class TestJobCleanup:
conn.close()
# Create recent job
recent_result = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
recent_id = recent_result["job_id"]
recent_id = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
# Cleanup jobs older than 30 days
cleanup_result = manager.cleanup_old_jobs(days=30)
result = manager.cleanup_old_jobs(days=30)
assert cleanup_result["jobs_deleted"] == 1
assert result["jobs_deleted"] == 1
assert manager.get_job("old-job") is None
assert manager.get_job(recent_id) is not None
@@ -406,8 +387,7 @@ class TestJobUpdateOperations:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
job_id = job_result["job_id"]
job_id = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
manager.update_job_status(job_id, "failed", error="MCP service unavailable")
@@ -421,8 +401,7 @@ class TestJobUpdateOperations:
import time
manager = JobManager(db_path=clean_db)
job_result = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
job_id = job_result["job_id"]
job_id = manager.create_job("configs/test.json", ["2025-01-16"], ["gpt-5"])
# Start
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
@@ -453,12 +432,11 @@ class TestJobWarnings:
job_manager = JobManager(db_path=clean_db)
# Create a job
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Add warnings
warnings = ["Rate limit reached", "Skipped 2 dates"]
@@ -470,172 +448,4 @@ class TestJobWarnings:
assert stored_warnings == warnings
@pytest.mark.unit
class TestStaleJobCleanup:
"""Test cleanup of stale jobs from container restarts."""
def test_cleanup_stale_pending_job(self, clean_db):
"""Should mark pending job as failed with no progress."""
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Job is pending - simulate container restart
result = manager.cleanup_stale_jobs()
assert result["jobs_cleaned"] == 1
job = manager.get_job(job_id)
assert job["status"] == "failed"
assert "container restart" in job["error"].lower()
assert "pending" in job["error"]
assert "no progress" in job["error"]
def test_cleanup_stale_running_job_with_partial_progress(self, clean_db):
"""Should mark running job as partial if some model-days completed."""
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Mark job as running and complete one model-day
manager.update_job_status(job_id, "running")
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "completed")
# Simulate container restart
result = manager.cleanup_stale_jobs()
assert result["jobs_cleaned"] == 1
job = manager.get_job(job_id)
assert job["status"] == "partial"
assert "container restart" in job["error"].lower()
assert "1/2" in job["error"] # 1 out of 2 model-days completed
def test_cleanup_stale_downloading_data_job(self, clean_db):
"""Should mark downloading_data job as failed."""
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Mark as downloading data
manager.update_job_status(job_id, "downloading_data")
# Simulate container restart
result = manager.cleanup_stale_jobs()
assert result["jobs_cleaned"] == 1
job = manager.get_job(job_id)
assert job["status"] == "failed"
assert "downloading_data" in job["error"]
def test_cleanup_marks_incomplete_job_details_as_failed(self, clean_db):
"""Should mark incomplete job_details as failed."""
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Mark job as running, one detail running, one pending
manager.update_job_status(job_id, "running")
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "running")
# Simulate container restart
manager.cleanup_stale_jobs()
# Check job_details were marked as failed
progress = manager.get_job_progress(job_id)
assert progress["failed"] == 2 # Both model-days marked failed
assert progress["pending"] == 0
details = manager.get_job_details(job_id)
for detail in details:
assert detail["status"] == "failed"
assert "container restarted" in detail["error"].lower()
def test_cleanup_no_stale_jobs(self, clean_db):
"""Should report 0 cleaned jobs when none are stale."""
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Complete the job
manager.update_job_detail_status(job_id, "2025-01-16", "gpt-5", "completed")
# Simulate container restart
result = manager.cleanup_stale_jobs()
assert result["jobs_cleaned"] == 0
job = manager.get_job(job_id)
assert job["status"] == "completed"
def test_cleanup_multiple_stale_jobs(self, clean_db):
"""Should clean up multiple stale jobs."""
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
# Create first job
job1_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job1_id = job1_result["job_id"]
manager.update_job_status(job1_id, "running")
manager.update_job_status(job1_id, "completed")
# Create second job (pending)
job2_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-17"],
models=["gpt-5"]
)
job2_id = job2_result["job_id"]
# Create third job (running)
manager.update_job_status(job2_id, "completed")
job3_result = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-18"],
models=["gpt-5"]
)
job3_id = job3_result["job_id"]
manager.update_job_status(job3_id, "running")
# Simulate container restart
result = manager.cleanup_stale_jobs()
assert result["jobs_cleaned"] == 1 # Only job3 is running
assert manager.get_job(job1_id)["status"] == "completed"
assert manager.get_job(job2_id)["status"] == "completed"
assert manager.get_job(job3_id)["status"] == "failed"
# Coverage target: 95%+ for api/job_manager.py

View File

@@ -1,256 +0,0 @@
"""Test duplicate detection in job creation."""
import pytest
import tempfile
import os
from pathlib import Path
from api.job_manager import JobManager
@pytest.fixture
def temp_db():
"""Create temporary database for testing."""
fd, path = tempfile.mkstemp(suffix='.db')
os.close(fd)
# Initialize schema
from api.database import get_db_connection
conn = get_db_connection(path)
cursor = conn.cursor()
# Create jobs table
cursor.execute("""
CREATE TABLE IF NOT EXISTS jobs (
job_id TEXT PRIMARY KEY,
config_path TEXT NOT NULL,
status TEXT NOT NULL,
date_range TEXT NOT NULL,
models TEXT NOT NULL,
created_at TEXT NOT NULL,
started_at TEXT,
updated_at TEXT,
completed_at TEXT,
total_duration_seconds REAL,
error TEXT,
warnings TEXT
)
""")
# Create job_details table
cursor.execute("""
CREATE TABLE IF NOT EXISTS job_details (
id INTEGER PRIMARY KEY AUTOINCREMENT,
job_id TEXT NOT NULL,
date TEXT NOT NULL,
model TEXT NOT NULL,
status TEXT NOT NULL,
started_at TEXT,
completed_at TEXT,
duration_seconds REAL,
error TEXT,
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE,
UNIQUE(job_id, date, model)
)
""")
conn.commit()
conn.close()
yield path
# Cleanup
if os.path.exists(path):
os.remove(path)
def test_create_job_with_filter_skips_completed_simulations(temp_db):
"""Test that job creation with model_day_filter skips already-completed pairs."""
manager = JobManager(db_path=temp_db)
# Create first job and mark model-day as completed
result_1 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["deepseek-chat-v3.1"],
model_day_filter=[("deepseek-chat-v3.1", "2025-10-15")]
)
job_id_1 = result_1["job_id"]
# Mark as completed
manager.update_job_detail_status(
job_id_1,
"2025-10-15",
"deepseek-chat-v3.1",
"completed"
)
# Try to create second job with overlapping date
model_day_filter = [
("deepseek-chat-v3.1", "2025-10-15"), # Already completed
("deepseek-chat-v3.1", "2025-10-16") # Not yet completed
]
result_2 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["deepseek-chat-v3.1"],
model_day_filter=model_day_filter
)
job_id_2 = result_2["job_id"]
# Get job details for second job
details = manager.get_job_details(job_id_2)
# Should only have 2025-10-16 (2025-10-15 was skipped as already completed)
assert len(details) == 1
assert details[0]["date"] == "2025-10-16"
assert details[0]["model"] == "deepseek-chat-v3.1"
def test_create_job_without_filter_skips_all_completed_simulations(temp_db):
"""Test that job creation without filter skips all completed model-day pairs."""
manager = JobManager(db_path=temp_db)
# Create first job and complete some model-days
result_1 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15"],
models=["model-a", "model-b"]
)
job_id_1 = result_1["job_id"]
# Mark model-a/2025-10-15 as completed
manager.update_job_detail_status(job_id_1, "2025-10-15", "model-a", "completed")
# Mark model-b/2025-10-15 as failed to complete the job
manager.update_job_detail_status(job_id_1, "2025-10-15", "model-b", "failed")
# Create second job with same date range and models
result_2 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["model-a", "model-b"]
)
job_id_2 = result_2["job_id"]
# Get job details for second job
details = manager.get_job_details(job_id_2)
# Should have 3 entries (skip only completed model-a/2025-10-15):
# - model-b/2025-10-15 (failed in job 1, so not skipped - retry)
# - model-a/2025-10-16 (new date)
# - model-b/2025-10-16 (new date)
assert len(details) == 3
dates_models = [(d["date"], d["model"]) for d in details]
assert ("2025-10-15", "model-a") not in dates_models # Skipped (completed)
assert ("2025-10-15", "model-b") in dates_models # NOT skipped (failed, not completed)
assert ("2025-10-16", "model-a") in dates_models
assert ("2025-10-16", "model-b") in dates_models
def test_create_job_returns_warnings_for_skipped_simulations(temp_db):
"""Test that skipped simulations are returned as warnings."""
manager = JobManager(db_path=temp_db)
# Create and complete first simulation
result_1 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15"],
models=["model-a"]
)
job_id_1 = result_1["job_id"]
manager.update_job_detail_status(job_id_1, "2025-10-15", "model-a", "completed")
# Try to create job with overlapping date (one completed, one new)
result = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"], # Add new date
models=["model-a"]
)
# Result should be a dict with job_id and warnings
assert isinstance(result, dict)
assert "job_id" in result
assert "warnings" in result
assert len(result["warnings"]) == 1
assert "model-a" in result["warnings"][0]
assert "2025-10-15" in result["warnings"][0]
# Verify job_details only has the new date
details = manager.get_job_details(result["job_id"])
assert len(details) == 1
assert details[0]["date"] == "2025-10-16"
def test_create_job_raises_error_when_all_simulations_completed(temp_db):
"""Test that ValueError is raised when ALL requested simulations are already completed."""
manager = JobManager(db_path=temp_db)
# Create and complete first simulation
result_1 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["model-a", "model-b"]
)
job_id_1 = result_1["job_id"]
# Mark all model-days as completed
manager.update_job_detail_status(job_id_1, "2025-10-15", "model-a", "completed")
manager.update_job_detail_status(job_id_1, "2025-10-15", "model-b", "completed")
manager.update_job_detail_status(job_id_1, "2025-10-16", "model-a", "completed")
manager.update_job_detail_status(job_id_1, "2025-10-16", "model-b", "completed")
# Try to create job with same date range and models (all already completed)
with pytest.raises(ValueError) as exc_info:
manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["model-a", "model-b"]
)
# Verify error message contains expected text
error_message = str(exc_info.value)
assert "All requested simulations are already completed" in error_message
assert "Skipped 4 model-day pair(s)" in error_message
def test_create_job_with_skip_completed_false_includes_all_simulations(temp_db):
"""Test that skip_completed=False includes ALL simulations, even already-completed ones."""
manager = JobManager(db_path=temp_db)
# Create first job and complete some model-days
result_1 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["model-a", "model-b"]
)
job_id_1 = result_1["job_id"]
# Mark all model-days as completed
manager.update_job_detail_status(job_id_1, "2025-10-15", "model-a", "completed")
manager.update_job_detail_status(job_id_1, "2025-10-15", "model-b", "completed")
manager.update_job_detail_status(job_id_1, "2025-10-16", "model-a", "completed")
manager.update_job_detail_status(job_id_1, "2025-10-16", "model-b", "completed")
# Create second job with skip_completed=False
result_2 = manager.create_job(
config_path="test_config.json",
date_range=["2025-10-15", "2025-10-16"],
models=["model-a", "model-b"],
skip_completed=False
)
job_id_2 = result_2["job_id"]
# Get job details for second job
details = manager.get_job_details(job_id_2)
# Should have ALL 4 model-day pairs (no skipping)
assert len(details) == 4
dates_models = [(d["date"], d["model"]) for d in details]
assert ("2025-10-15", "model-a") in dates_models
assert ("2025-10-15", "model-b") in dates_models
assert ("2025-10-16", "model-a") in dates_models
assert ("2025-10-16", "model-b") in dates_models
# Verify no warnings were returned
assert result_2.get("warnings") == []

View File

@@ -41,12 +41,11 @@ class TestSkipStatusDatabase:
def test_skipped_status_allowed_in_job_details(self, job_manager):
"""Test job_details accepts 'skipped' status without constraint violation."""
# Create job
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02"],
models=["test-model"]
)
job_id = job_result["job_id"]
# Mark a detail as skipped - should not raise constraint violation
job_manager.update_job_detail_status(
@@ -71,12 +70,11 @@ class TestJobCompletionWithSkipped:
def test_job_completes_with_all_dates_skipped(self, job_manager):
"""Test job transitions to completed when all dates are skipped."""
# Create job with 3 dates
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02", "2025-10-03"],
models=["test-model"]
)
job_id = job_result["job_id"]
# Mark all as skipped
for date in ["2025-10-01", "2025-10-02", "2025-10-03"]:
@@ -95,12 +93,11 @@ class TestJobCompletionWithSkipped:
def test_job_completes_with_mixed_completed_and_skipped(self, job_manager):
"""Test job completes when some dates completed, some skipped."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02", "2025-10-03"],
models=["test-model"]
)
job_id = job_result["job_id"]
# Mark some completed, some skipped
job_manager.update_job_detail_status(
@@ -122,12 +119,11 @@ class TestJobCompletionWithSkipped:
def test_job_partial_with_mixed_completed_failed_skipped(self, job_manager):
"""Test job status 'partial' when some failed, some completed, some skipped."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02", "2025-10-03"],
models=["test-model"]
)
job_id = job_result["job_id"]
# Mix of statuses
job_manager.update_job_detail_status(
@@ -149,12 +145,11 @@ class TestJobCompletionWithSkipped:
def test_job_remains_running_with_pending_dates(self, job_manager):
"""Test job stays running when some dates are still pending."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02", "2025-10-03"],
models=["test-model"]
)
job_id = job_result["job_id"]
# Only mark some as terminal states
job_manager.update_job_detail_status(
@@ -178,12 +173,11 @@ class TestProgressTrackingWithSkipped:
def test_progress_includes_skipped_count(self, job_manager):
"""Test get_job_progress returns skipped count."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02", "2025-10-03", "2025-10-04"],
models=["test-model"]
)
job_id = job_result["job_id"]
# Set various statuses
job_manager.update_job_detail_status(
@@ -211,12 +205,11 @@ class TestProgressTrackingWithSkipped:
def test_progress_all_skipped(self, job_manager):
"""Test progress when all dates are skipped."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02"],
models=["test-model"]
)
job_id = job_result["job_id"]
# Mark all as skipped
for date in ["2025-10-01", "2025-10-02"]:
@@ -238,12 +231,11 @@ class TestMultiModelSkipHandling:
def test_different_models_different_skip_states(self, job_manager):
"""Test that different models can have different skip states for same date."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02"],
models=["model-a", "model-b"]
)
job_id = job_result["job_id"]
# Model A: 10/1 skipped (already completed), 10/2 completed
job_manager.update_job_detail_status(
@@ -284,12 +276,11 @@ class TestMultiModelSkipHandling:
def test_job_completes_with_per_model_skips(self, job_manager):
"""Test job completes when different models have different skip patterns."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01", "2025-10-02"],
models=["model-a", "model-b"]
)
job_id = job_result["job_id"]
# Model A: one skipped, one completed
job_manager.update_job_detail_status(
@@ -327,12 +318,11 @@ class TestSkipReasons:
def test_skip_reason_already_completed(self, job_manager):
"""Test 'Already completed' skip reason is stored."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-01"],
models=["test-model"]
)
job_id = job_result["job_id"]
job_manager.update_job_detail_status(
job_id=job_id, date="2025-10-01", model="test-model",
@@ -344,12 +334,11 @@ class TestSkipReasons:
def test_skip_reason_incomplete_price_data(self, job_manager):
"""Test 'Incomplete price data' skip reason is stored."""
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="test_config.json",
date_range=["2025-10-04"],
models=["test-model"]
)
job_id = job_result["job_id"]
job_manager.update_job_detail_status(
job_id=job_id, date="2025-10-04", model="test-model",

View File

@@ -112,12 +112,11 @@ class TestModelDayExecutorExecution:
# Create job and job_detail
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path=str(config_path),
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Mock agent execution
mock_agent = create_mock_agent(
@@ -157,12 +156,11 @@ class TestModelDayExecutorExecution:
# Create job
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Mock agent to raise error
with patch("api.model_day_executor.RuntimeConfigManager") as mock_runtime:
@@ -214,12 +212,11 @@ class TestModelDayExecutorDataPersistence:
# Create job
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path=str(config_path),
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Mock successful execution (no trades)
mock_agent = create_mock_agent(
@@ -272,12 +269,11 @@ class TestModelDayExecutorDataPersistence:
# Create job
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
# Mock execution with reasoning
mock_agent = create_mock_agent(
@@ -324,12 +320,11 @@ class TestModelDayExecutorCleanup:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
mock_agent = create_mock_agent(
session_result={"success": True}
@@ -360,12 +355,11 @@ class TestModelDayExecutorCleanup:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
with patch("api.model_day_executor.RuntimeConfigManager") as mock_runtime:
mock_instance = Mock()

View File

@@ -0,0 +1,108 @@
"""Unit tests for model factory - provider-specific model creation"""
import pytest
from unittest.mock import Mock, patch
from agent.model_factory import create_model
class TestModelFactory:
"""Tests for create_model factory function"""
@patch('agent.model_factory.ChatDeepSeek')
def test_create_model_deepseek(self, mock_deepseek_class):
"""Test that DeepSeek models use ChatDeepSeek"""
mock_model = Mock()
mock_deepseek_class.return_value = mock_model
result = create_model(
basemodel="deepseek/deepseek-chat",
api_key="test-key",
base_url="https://api.deepseek.com",
temperature=0.7,
timeout=30
)
# Verify ChatDeepSeek was called with correct params
mock_deepseek_class.assert_called_once_with(
model="deepseek-chat", # Extracted from "deepseek/deepseek-chat"
api_key="test-key",
base_url="https://api.deepseek.com",
temperature=0.7,
timeout=30
)
assert result == mock_model
@patch('agent.model_factory.ChatOpenAI')
def test_create_model_openai(self, mock_openai_class):
"""Test that OpenAI models use ChatOpenAI"""
mock_model = Mock()
mock_openai_class.return_value = mock_model
result = create_model(
basemodel="openai/gpt-4",
api_key="test-key",
base_url="https://api.openai.com/v1",
temperature=0.7,
timeout=30
)
# Verify ChatOpenAI was called with correct params
mock_openai_class.assert_called_once_with(
model="openai/gpt-4",
api_key="test-key",
base_url="https://api.openai.com/v1",
temperature=0.7,
timeout=30
)
assert result == mock_model
@patch('agent.model_factory.ChatOpenAI')
def test_create_model_anthropic(self, mock_openai_class):
"""Test that Anthropic models use ChatOpenAI (via compatibility)"""
mock_model = Mock()
mock_openai_class.return_value = mock_model
result = create_model(
basemodel="anthropic/claude-sonnet-4.5",
api_key="test-key",
base_url="https://api.anthropic.com/v1",
temperature=0.7,
timeout=30
)
# Verify ChatOpenAI was used (Anthropic via OpenAI-compatible endpoint)
mock_openai_class.assert_called_once()
assert result == mock_model
@patch('agent.model_factory.ChatOpenAI')
def test_create_model_generic_provider(self, mock_openai_class):
"""Test that unknown providers default to ChatOpenAI"""
mock_model = Mock()
mock_openai_class.return_value = mock_model
result = create_model(
basemodel="custom/custom-model",
api_key="test-key",
base_url="https://api.custom.com",
temperature=0.7,
timeout=30
)
# Should fall back to ChatOpenAI for unknown providers
mock_openai_class.assert_called_once()
assert result == mock_model
def test_create_model_deepseek_extracts_model_name(self):
"""Test that DeepSeek model name is extracted correctly"""
with patch('agent.model_factory.ChatDeepSeek') as mock_class:
create_model(
basemodel="deepseek/deepseek-chat-v3.1",
api_key="key",
base_url="url",
temperature=0,
timeout=30
)
# Check that model param is just "deepseek-chat-v3.1"
call_kwargs = mock_class.call_args[1]
assert call_kwargs['model'] == "deepseek-chat-v3.1"

View File

@@ -41,12 +41,11 @@ class TestSimulationWorkerExecution:
# Create job with 2 dates and 2 models = 4 model-days
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5", "claude-3.7-sonnet"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
@@ -74,12 +73,11 @@ class TestSimulationWorkerExecution:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5", "claude-3.7-sonnet"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
@@ -120,12 +118,11 @@ class TestSimulationWorkerExecution:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
@@ -162,12 +159,11 @@ class TestSimulationWorkerExecution:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5", "claude-3.7-sonnet"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
@@ -218,12 +214,11 @@ class TestSimulationWorkerErrorHandling:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5", "claude-3.7-sonnet", "gemini"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
@@ -264,12 +259,11 @@ class TestSimulationWorkerErrorHandling:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
@@ -295,12 +289,11 @@ class TestSimulationWorkerConcurrency:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16"],
models=["gpt-5", "claude-3.7-sonnet"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
@@ -342,12 +335,11 @@ class TestSimulationWorkerJobRetrieval:
from api.job_manager import JobManager
manager = JobManager(db_path=clean_db)
job_result = manager.create_job(
job_id = manager.create_job(
config_path="configs/test.json",
date_range=["2025-01-16", "2025-01-17"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=clean_db)
job_info = worker.get_job_info()
@@ -477,12 +469,11 @@ class TestSimulationWorkerHelperMethods:
job_manager = JobManager(db_path=db_path)
# Create job
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=db_path)
@@ -507,12 +498,11 @@ class TestSimulationWorkerHelperMethods:
job_manager = JobManager(db_path=db_path)
# Create job
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=db_path)
@@ -555,12 +545,11 @@ class TestSimulationWorkerHelperMethods:
initialize_database(db_path)
job_manager = JobManager(db_path=db_path)
job_result = job_manager.create_job(
job_id = job_manager.create_job(
config_path="config.json",
date_range=["2025-10-01"],
models=["gpt-5"]
)
job_id = job_result["job_id"]
worker = SimulationWorker(job_id=job_id, db_path=db_path)