feat: implement reasoning logs API with database-only storage

Complete implementation of reasoning logs retrieval system that
replaces JSONL file-based logging with database-only storage.

Database Changes:
- Add trading_sessions table (one record per model-day)
- Add reasoning_logs table (conversation history with summaries)
- Add session_id column to positions table
- Add indexes for query performance

Agent Changes:
- Add conversation history tracking to BaseAgent
- Add AI-powered summary generation using same model
- Remove JSONL logging code (_log_message, _setup_logging)
- Preserve in-memory conversation tracking

ModelDayExecutor Changes:
- Create trading session at start of execution
- Store reasoning logs with AI-generated summaries
- Update session summary after completion
- Link positions to sessions via session_id

API Changes:
- Add GET /reasoning endpoint with filters (job_id, date, model)
- Support include_full_conversation parameter
- Return both summaries and full conversation on demand
- Include deployment mode info in responses

Documentation:
- Add complete API reference for GET /reasoning
- Add design document with architecture details
- Add implementation guide with step-by-step tasks
- Update Python and TypeScript client examples

Testing:
- Add 6 tests for conversation history tracking
- Add 4 tests for summary generation
- Add 5 tests for model_day_executor integration
- Add 8 tests for GET /reasoning endpoint
- Add 9 integration tests for E2E flow
- Update existing tests for schema changes

All 32 new feature tests passing. Total: 285 tests passing.
This commit is contained in:
2025-11-02 18:31:02 -05:00
parent 2f05418f42
commit f104164187
9 changed files with 3502 additions and 51 deletions

View File

@@ -0,0 +1,553 @@
# Job Skip Status Tracking Design
**Date:** 2025-11-02
**Status:** Approved for implementation
## Problem Statement
The job orchestration system has three related issues when handling date filtering:
1. **Incorrect status reporting** - Dates that are skipped (already completed or missing price data) remain in "pending" status instead of showing their actual state
2. **Jobs hang indefinitely** - Jobs never complete because the completion check only counts "completed" and "failed" statuses, ignoring dates that were intentionally skipped
3. **Unclear skip reasons** - Warning messages don't distinguish between different types of skips (weekends vs already-completed vs rate limits)
### Example of Broken Behavior
Job request: dates [2025-10-01 to 2025-10-05], model [gpt-5]
Current (broken) response:
```json
{
"status": "running", // STUCK - never completes
"progress": {
"pending": 3, // WRONG - these will never be executed
"completed": 2,
"failed": 0
},
"details": [
{"date": "2025-10-01", "status": "pending"}, // Already completed
{"date": "2025-10-02", "status": "completed"},
{"date": "2025-10-03", "status": "completed"},
{"date": "2025-10-04", "status": "pending"}, // Weekend (no data)
{"date": "2025-10-05", "status": "pending"} // Weekend (no data)
]
}
```
## Solution Overview
Add "skipped" status to track dates that were intentionally not executed. Update job completion logic to count skipped dates as "done" since they don't require execution.
### Core Principles
1. **Status accuracy** - Every job_details entry reflects what actually happened
2. **Proper completion** - Jobs complete when all dates are in terminal states (completed/failed/skipped)
3. **Clear attribution** - Skip reasons stored in error field explain why each date was skipped
4. **Per-model granularity** - Multi-model jobs correctly handle different completion states per model
## Design Details
### 1. Database Schema Changes
**Current constraint:**
```sql
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed'))
```
**New constraint:**
```sql
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed', 'skipped'))
```
**Migration strategy:**
- Dev mode: Table recreated on startup (already happens with `PRESERVE_DEV_DATA=false`)
- Production: Provide manual migration SQL script
**No new columns needed:**
- Skip reasons stored in existing `error` field
- Field semantics: "error message for failures, skip reason for skips"
### 2. Skip Reason Categories
Three skip reasons stored in the `error` field:
| Reason | Description | When Applied |
|--------|-------------|--------------|
| "Already completed" | Position data exists from previous job | Per-model, based on job_details history |
| "Incomplete price data" | Missing stock prices for date | All models, for weekends/holidays/future dates |
| "Rate limited during download" | API rate limit hit during download | All models (optional, may merge with incomplete data) |
### 3. SimulationWorker Changes
#### Modified `_prepare_data()` Flow
**Current:**
```python
available_dates = price_manager.get_available_trading_dates(start, end)
available_dates = self._filter_completed_dates(available_dates, models)
# Skipped dates just disappear with no status update
```
**New:**
```python
# Step 1: Filter price data and track skips
available_dates = price_manager.get_available_trading_dates(start, end)
price_skips = set(requested_dates) - set(available_dates)
# Step 2: Filter completed dates per-model and track skips
dates_to_process, completion_skips = self._filter_completed_dates_with_tracking(
available_dates, models
)
# Step 3: Update job_details status for all skipped dates
self._mark_skipped_dates(price_skips, completion_skips, models)
# Step 4: Execute only dates_to_process
return dates_to_process, warnings
```
#### New Helper: `_filter_completed_dates_with_tracking()`
```python
def _filter_completed_dates_with_tracking(
self,
available_dates: List[str],
models: List[str]
) -> Tuple[List[str], Dict[str, Set[str]]]:
"""
Filter already-completed dates per model.
Args:
available_dates: Dates with complete price data
models: Model signatures
Returns:
- dates_to_process: Union of all dates needed by any model
- completion_skips: {model: {dates_to_skip_for_this_model}}
"""
if not available_dates:
return [], {}
# Get completed dates from job_details history
start_date = available_dates[0]
end_date = available_dates[-1]
completed_dates = self.job_manager.get_completed_model_dates(
models, start_date, end_date
)
completion_skips = {}
dates_needed_by_any_model = set()
for model in models:
model_completed = set(completed_dates.get(model, []))
model_skips = set(available_dates) & model_completed
completion_skips[model] = model_skips
# Track dates this model still needs
dates_needed_by_any_model.update(
set(available_dates) - model_skips
)
return sorted(list(dates_needed_by_any_model)), completion_skips
```
#### New Helper: `_mark_skipped_dates()`
```python
def _mark_skipped_dates(
self,
price_skips: Set[str],
completion_skips: Dict[str, Set[str]],
models: List[str]
) -> None:
"""
Update job_details status for all skipped dates.
Args:
price_skips: Dates without complete price data (affects all models)
completion_skips: {model: {dates}} already completed per model
models: All model signatures in job
"""
# Price skips affect ALL models equally
for date in price_skips:
for model in models:
self.job_manager.update_job_detail_status(
self.job_id, date, model,
"skipped",
error="Incomplete price data"
)
# Completion skips are per-model
for model, skipped_dates in completion_skips.items():
for date in skipped_dates:
self.job_manager.update_job_detail_status(
self.job_id, date, model,
"skipped",
error="Already completed"
)
```
### 4. JobManager Changes
#### Updated Completion Logic in `update_job_detail_status()`
**Current (around line 419-437):**
```python
cursor.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed
FROM job_details
WHERE job_id = ?
""", (job_id,))
total, completed, failed = cursor.fetchone()
if completed + failed == total: # Never true with skipped entries!
# Determine final status
```
**New:**
```python
cursor.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed,
SUM(CASE WHEN status = 'skipped' THEN 1 ELSE 0 END) as skipped
FROM job_details
WHERE job_id = ?
""", (job_id,))
total, completed, failed, skipped = cursor.fetchone()
# Job is done when all details are in terminal states
if completed + failed + skipped == total:
# Determine final status based only on executed dates
# (skipped dates don't affect job success/failure)
if failed == 0:
final_status = "completed"
elif completed > 0:
final_status = "partial"
else:
final_status = "failed"
# Update job to final status...
```
#### Updated Progress Tracking in `get_job_progress()`
**Current:**
```python
cursor.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed
FROM job_details
WHERE job_id = ?
""", (job_id,))
total, completed, failed = cursor.fetchone()
return {
"total_model_days": total,
"completed": completed or 0,
"failed": failed or 0,
# ...
}
```
**New:**
```python
cursor.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed,
SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) as pending,
SUM(CASE WHEN status = 'skipped' THEN 1 ELSE 0 END) as skipped
FROM job_details
WHERE job_id = ?
""", (job_id,))
total, completed, failed, pending, skipped = cursor.fetchone()
return {
"total_model_days": total,
"completed": completed or 0,
"failed": failed or 0,
"pending": pending or 0,
"skipped": skipped or 0, # NEW
# ...
}
```
### 5. Warning Message Updates
**Current:**
```python
warnings.append(f"Skipped {len(skipped)} dates due to incomplete price data: {sorted(list(skipped))}")
```
**New (distinguish skip types):**
```python
if price_skips:
warnings.append(
f"Skipped {len(price_skips)} dates due to incomplete price data: "
f"{sorted(list(price_skips))}"
)
# Count total completion skips across all models
total_completion_skips = sum(len(dates) for dates in completion_skips.values())
if total_completion_skips > 0:
warnings.append(
f"Skipped {total_completion_skips} model-days already completed"
)
```
### 6. Expected API Response
Using example: dates [2025-10-01 to 2025-10-05], model [gpt-5]
- 10/1: Already completed
- 10/2, 10/3: Executed successfully
- 10/4, 10/5: Weekends (no price data)
**After fix:**
```json
{
"job_id": "c2b68f6a-8beb-4bd2-bd98-749cdd98dda6",
"status": "completed", // ✓ Job completes correctly
"progress": {
"total_model_days": 5,
"completed": 2,
"failed": 0,
"pending": 0, // ✓ No longer stuck
"skipped": 3 // ✓ Clear accounting
},
"details": [
{
"date": "2025-10-01",
"model": "gpt-5",
"status": "skipped",
"error": "Already completed", // ✓ Clear reason
"started_at": null,
"completed_at": null
},
{
"date": "2025-10-02",
"model": "gpt-5",
"status": "completed",
"error": null,
"started_at": "2025-11-02T14:05:45.592208Z",
"completed_at": "2025-11-02T14:05:45.625924Z"
},
{
"date": "2025-10-03",
"model": "gpt-5",
"status": "completed",
"error": null,
"started_at": "2025-11-02T14:05:45.636893Z",
"completed_at": "2025-11-02T14:05:45.663431Z"
},
{
"date": "2025-10-04",
"model": "gpt-5",
"status": "skipped",
"error": "Incomplete price data", // ✓ Clear reason
"started_at": null,
"completed_at": null
},
{
"date": "2025-10-05",
"model": "gpt-5",
"status": "skipped",
"error": "Incomplete price data", // ✓ Clear reason
"started_at": null,
"completed_at": null
}
],
"warnings": [
"Skipped 2 dates due to incomplete price data: ['2025-10-04', '2025-10-05']",
"Skipped 1 model-days already completed"
]
}
```
## Multi-Model Handling
The design correctly handles multiple models with different completion states.
**Example scenario:**
- Job: dates [10/1, 10/2, 10/3], models [gpt-5, claude-opus]
- gpt-5: Already completed 10/1
- claude-opus: Needs all dates
**Correct behavior:**
```json
{
"details": [
{
"date": "2025-10-01",
"model": "gpt-5",
"status": "skipped",
"error": "Already completed"
},
{
"date": "2025-10-01",
"model": "claude-opus",
"status": "completed", // ✓ Executed for this model
"error": null
},
// ... other dates
]
}
```
**Implementation detail:**
- `completion_skips` tracks per-model: `{"gpt-5": {"2025-10-01"}, "claude-opus": set()}`
- Only gpt-5's 10/1 entry gets marked skipped
- 10/1 still gets executed because claude-opus needs it
## Implementation Checklist
### 1. Database Migration
- [ ] Update database.py schema with 'skipped' status
- [ ] Test dev mode table recreation
- [ ] Create migration SQL for production users
### 2. JobManager Updates (api/job_manager.py)
- [ ] Update `update_job_detail_status()` completion logic (line ~419)
- [ ] Update `get_job_progress()` to include skipped count (line ~504)
- [ ] Test job completion with mixed statuses
### 3. SimulationWorker Updates (api/simulation_worker.py)
- [ ] Implement `_filter_completed_dates_with_tracking()` helper
- [ ] Implement `_mark_skipped_dates()` helper
- [ ] Update `_prepare_data()` to track and mark skips (line ~303)
- [ ] Update warning messages to distinguish skip types (line ~355)
### 4. Testing
- [ ] Unit test: Skip dates with incomplete price data
- [ ] Unit test: Skip dates already completed (single model)
- [ ] Unit test: Multi-model with different completion states
- [ ] Unit test: Job completes with all dates skipped
- [ ] Unit test: Mixed completed/failed/skipped determines correct final status
- [ ] Integration test: Full workflow with mixed scenarios
- [ ] Update existing tests expecting old behavior
### 5. Documentation
- [ ] Update API_REFERENCE.md with skipped status
- [ ] Update database-schema.md with new constraint
- [ ] Add migration notes to CHANGELOG.md
## Testing Strategy
### Unit Tests
**Test: Skip incomplete price data**
```python
def test_skip_incomplete_price_data():
# Setup: Job with weekend dates
# Mock: price_manager returns only weekdays
# Assert: Weekend dates marked as skipped with "Incomplete price data"
```
**Test: Skip already completed**
```python
def test_skip_already_completed():
# Setup: Job with dates already in job_details as completed
# Assert: Those dates marked as skipped with "Already completed"
# Assert: Job still completes successfully
```
**Test: Multi-model different states**
```python
def test_multi_model_skip_handling():
# Setup: Two models, one has completed 10/1, other hasn't
# Assert: Only first model's 10/1 is skipped
# Assert: Second model's 10/1 executes normally
```
**Test: Job completion with skips**
```python
def test_job_completes_with_skipped():
# Setup: Job where all dates are skipped
# Assert: Job status becomes "completed"
# Assert: Progress shows pending=0, skipped=N
```
### Integration Test
**Test: Mixed execution scenario**
```python
def test_mixed_completed_skipped_failed():
# Setup: Date range with:
# - Some dates already completed
# - Some dates missing price data
# - Some dates to execute (mix success/failure)
# Assert: Final status reflects executed dates only
# Assert: All skip reasons correct
# Assert: Job completes when all terminal
```
## Migration Notes
### For Development
No action needed - dev database recreates on startup.
### For Production Users
Run this SQL before deploying the updated code:
```sql
-- Backup existing data
CREATE TABLE job_details_backup AS SELECT * FROM job_details;
-- Drop old constraint and add new one
-- SQLite doesn't support ALTER CONSTRAINT, so recreate table
CREATE TABLE job_details_new (
id INTEGER PRIMARY KEY AUTOINCREMENT,
job_id TEXT NOT NULL,
date TEXT NOT NULL,
model TEXT NOT NULL,
status TEXT NOT NULL CHECK(status IN ('pending', 'running', 'completed', 'failed', 'skipped')),
started_at TEXT,
completed_at TEXT,
duration_seconds REAL,
error TEXT,
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE
);
-- Copy data
INSERT INTO job_details_new SELECT * FROM job_details;
-- Swap tables
DROP TABLE job_details;
ALTER TABLE job_details_new RENAME TO job_details;
-- Clean up backup (optional)
-- DROP TABLE job_details_backup;
```
## Rollback Plan
If issues arise:
1. Revert code changes
2. Restore database from backup (job_details_backup table)
3. Pending entries will remain pending (original behavior)
## Success Metrics
1. **No stuck jobs** - All jobs reach terminal status (completed/partial/failed)
2. **Clear status accounting** - API responses show exact counts for each status
3. **Accurate skip reasons** - Users can distinguish between skip types
4. **Multi-model correctness** - Different models can have different skip states for same date
## References
- Database schema: `api/database.py`
- Job manager: `api/job_manager.py`
- Simulation worker: `api/simulation_worker.py`
- Migration strategy: docs/developer/database-schema.md

View File

@@ -0,0 +1,396 @@
# Reasoning Logs API Design
**Date:** 2025-11-02
**Status:** Approved for Implementation
## Overview
Add API endpoint to retrieve AI reasoning logs for simulation days, replacing JSONL file-based logging with database-only storage. The system will store both full conversation history and AI-generated summaries, with clear associations to trading positions.
## Goals
1. **Database-only storage** - Eliminate JSONL files (`data/agent_data/[model]/log/[date]/log.jsonl`)
2. **Dual storage** - Store both full conversation and AI-generated summaries in same table
3. **Trading event association** - Easy to review reasoning alongside positions taken
4. **Query flexibility** - Filter by job_id, date, and/or model
## Database Schema Changes
### New Table: trading_sessions
One record per model-day trading session.
```sql
CREATE TABLE IF NOT EXISTS trading_sessions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
job_id TEXT NOT NULL,
date TEXT NOT NULL,
model TEXT NOT NULL,
session_summary TEXT, -- AI-generated summary of entire session
started_at TEXT NOT NULL,
completed_at TEXT,
total_messages INTEGER,
FOREIGN KEY (job_id) REFERENCES jobs(job_id) ON DELETE CASCADE,
UNIQUE(job_id, date, model)
)
```
### Modified Table: reasoning_logs
Store individual messages linked to trading session.
```sql
CREATE TABLE IF NOT EXISTS reasoning_logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id INTEGER NOT NULL,
message_index INTEGER NOT NULL, -- Order in conversation (0, 1, 2...)
role TEXT NOT NULL CHECK(role IN ('user', 'assistant', 'tool')),
content TEXT NOT NULL, -- Full message content
summary TEXT, -- AI-generated summary (for assistant messages)
tool_name TEXT, -- Tool name (for tool role)
tool_input TEXT, -- Tool input args (for tool role)
timestamp TEXT NOT NULL,
FOREIGN KEY (session_id) REFERENCES trading_sessions(id) ON DELETE CASCADE,
UNIQUE(session_id, message_index)
)
```
**Key changes from current schema:**
- Added `session_id` foreign key instead of `(job_id, date, model)` tuple
- Added `message_index` to preserve conversation order
- Added `summary` column for AI-generated summaries of assistant responses
- Added `tool_input` to capture tool call arguments
- Changed `content` to NOT NULL
- Removed `step_number` (replaced by `message_index`)
- Added UNIQUE constraint to enforce ordering
### Modified Table: positions
Add link to trading session.
```sql
ALTER TABLE positions ADD COLUMN session_id INTEGER REFERENCES trading_sessions(id)
```
**Migration:** Column addition is non-breaking. Existing rows will have NULL `session_id`.
## Data Flow
### 1. Trading Session Lifecycle
**Start of simulation day:**
```python
session_id = create_trading_session(
job_id=job_id,
date=date,
model=model_sig,
started_at=datetime.utcnow().isoformat() + "Z"
)
```
**During agent execution:**
- BaseAgent captures all messages in memory via `get_conversation_history()`
- No file I/O during execution
**After agent completes:**
```python
conversation = agent.get_conversation_history()
# Store all messages
for idx, message in enumerate(conversation):
summary = None
if message["role"] == "assistant":
# Use same AI model to generate summary
summary = await agent.generate_summary(message["content"])
insert_reasoning_log(
session_id=session_id,
message_index=idx,
role=message["role"],
content=message["content"],
summary=summary,
tool_name=message.get("tool_name"),
tool_input=message.get("tool_input"),
timestamp=message.get("timestamp")
)
# Generate and store session summary
session_summary = await agent.generate_summary(
"\n\n".join([m["content"] for m in conversation if m["role"] == "assistant"])
)
update_trading_session(session_id, session_summary=session_summary)
```
### 2. Position Linking
When inserting positions, include `session_id`:
```python
cursor.execute("""
INSERT INTO positions (
job_id, date, model, action_id, action_type, symbol,
amount, price, cash, portfolio_value, daily_profit,
daily_return_pct, session_id, created_at
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (..., session_id, created_at))
```
## Summary Generation
### Strategy: Use Same Model
For each assistant message, generate a concise summary using the same AI model:
```python
async def generate_summary(self, content: str) -> str:
"""
Generate 1-2 sentence summary of reasoning.
Uses same model that generated the content to ensure
consistency and accuracy.
"""
prompt = f"""Summarize the following trading decision in 1-2 sentences,
focusing on the key reasoning and actions taken:
{content[:2000]} # Truncate to avoid token limits
Summary:"""
response = await self.model.ainvoke(prompt)
return response.content.strip()
```
**Cost consideration:** Summaries add minimal token cost (50-100 tokens per message) compared to full reasoning.
**Session summary:** Concatenate all assistant messages and summarize the entire trading day's reasoning.
## API Endpoint
### GET /reasoning
Retrieve reasoning logs with optional filters.
**Query Parameters:**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `job_id` | string | No | Filter by job UUID |
| `date` | string | No | Filter by date (YYYY-MM-DD) |
| `model` | string | No | Filter by model signature |
| `include_full_conversation` | boolean | No | Include all messages (default: false, only returns summaries) |
**Response (200 OK):**
```json
{
"sessions": [
{
"session_id": 123,
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"date": "2025-10-02",
"model": "gpt-5",
"session_summary": "Analyzed AI infrastructure market conditions. Decided to establish positions in NVDA, GOOGL, AMD, and CRWD based on secular AI demand trends and strong Q2 results. Maintained 51% cash reserve for volatility management.",
"started_at": "2025-10-02T10:00:00Z",
"completed_at": "2025-10-02T10:05:23Z",
"total_messages": 4,
"positions": [
{
"action_id": 1,
"action_type": "buy",
"symbol": "NVDA",
"amount": 10,
"price": 189.60,
"cash_after": 8104.00,
"portfolio_value": 10000.00
},
{
"action_id": 2,
"action_type": "buy",
"symbol": "GOOGL",
"amount": 6,
"price": 245.15,
"cash_after": 6633.10,
"portfolio_value": 10104.00
}
],
"conversation": [ // Only if include_full_conversation=true
{
"message_index": 0,
"role": "user",
"content": "Please analyze and update today's (2025-10-02) positions.",
"timestamp": "2025-10-02T10:00:00Z"
},
{
"message_index": 1,
"role": "assistant",
"content": "Key intermediate steps\n\n- Read yesterday's positions...",
"summary": "Analyzed market conditions and decided to buy NVDA (10 shares), GOOGL (6 shares), AMD (6 shares), and CRWD (1 share) based on AI infrastructure trends.",
"timestamp": "2025-10-02T10:05:20Z"
}
]
}
],
"count": 1
}
```
**Error Responses:**
- **400 Bad Request** - Invalid date format
- **404 Not Found** - No sessions found matching filters
**Examples:**
```bash
# Get summaries for all sessions in a job
curl "http://localhost:8080/reasoning?job_id=550e8400-..."
# Get full conversation for specific model-day
curl "http://localhost:8080/reasoning?date=2025-10-02&model=gpt-5&include_full_conversation=true"
# Get all reasoning for a specific date
curl "http://localhost:8080/reasoning?date=2025-10-02"
```
## Implementation Plan
### Phase 1: Database Schema (Step 1)
**Files to modify:**
- `api/database.py`
- Add `trading_sessions` table to `initialize_database()`
- Modify `reasoning_logs` table schema
- Add migration logic for `positions.session_id` column
**Tasks:**
1. Update `initialize_database()` with new schema
2. Create `initialize_dev_database()` variant for testing
3. Write unit tests for schema creation
### Phase 2: Data Capture (Steps 2-3)
**Files to modify:**
- `agent/base_agent/base_agent.py`
- Add `conversation_history` instance variable
- Add `get_conversation_history()` method
- Add `generate_summary()` method
- Capture messages during execution
- Remove JSONL file logging
- `api/model_day_executor.py`
- Add `_create_trading_session()` method
- Add `_store_reasoning_logs()` method
- Add `_update_session_summary()` method
- Modify position insertion to include `session_id`
- Remove old `get_reasoning_steps()` logic
**Tasks:**
1. Implement conversation history capture in BaseAgent
2. Implement summary generation in BaseAgent
3. Update model_day_executor to create sessions and store logs
4. Write unit tests for conversation capture
5. Write unit tests for summary generation
### Phase 3: API Endpoint (Step 4)
**Files to modify:**
- `api/main.py`
- Add `/reasoning` endpoint
- Add request/response models
- Add query logic with filters
**Tasks:**
1. Create Pydantic models for request/response
2. Implement endpoint handler
3. Write unit tests for endpoint
4. Write integration tests
### Phase 4: Documentation & Cleanup (Step 5)
**Files to modify:**
- `API_REFERENCE.md` - Document new endpoint
- `CLAUDE.md` - Update architecture docs
- `docs/developer/database-schema.md` - Document new tables
**Tasks:**
1. Update API documentation
2. Update architecture documentation
3. Create cleanup script for old JSONL files
4. Remove JSONL-related code from BaseAgent
### Phase 5: Testing (Step 6)
**Test scenarios:**
1. Run simulation and verify reasoning logs stored
2. Query reasoning endpoint with various filters
3. Verify positions linked to sessions
4. Test with/without `include_full_conversation`
5. Verify summaries are meaningful
6. Test dev mode behavior
## Migration Strategy
### Database Migration
**Production:**
```sql
-- Run on existing production database
ALTER TABLE positions ADD COLUMN session_id INTEGER REFERENCES trading_sessions(id);
```
**Note:** Existing positions will have NULL `session_id`. This is acceptable as they predate the new system.
### JSONL File Cleanup
**After verifying new system works:**
```bash
# Production cleanup script
#!/bin/bash
# cleanup_old_logs.sh
# Verify database has reasoning_logs data
echo "Checking database for reasoning logs..."
REASONING_COUNT=$(sqlite3 data/jobs.db "SELECT COUNT(*) FROM reasoning_logs")
if [ "$REASONING_COUNT" -gt 0 ]; then
echo "Found $REASONING_COUNT reasoning log entries in database"
echo "Removing old JSONL files..."
# Backup first (optional)
tar -czf data/agent_data_logs_backup_$(date +%Y%m%d).tar.gz data/agent_data/*/log/
# Remove log directories
find data/agent_data/*/log -type f -name "*.jsonl" -delete
find data/agent_data/*/log -type d -empty -delete
echo "Cleanup complete"
else
echo "WARNING: No reasoning logs found in database. Keeping JSONL files."
fi
```
## Rollback Plan
If issues arise:
1. **Keep JSONL logging temporarily** - Don't remove `_log_message()` calls until database storage is proven
2. **Database rollback** - Drop new tables if needed:
```sql
DROP TABLE IF EXISTS reasoning_logs;
DROP TABLE IF EXISTS trading_sessions;
ALTER TABLE positions DROP COLUMN session_id;
```
3. **API rollback** - Remove `/reasoning` endpoint
## Success Criteria
1. ✅ Trading sessions created for each model-day execution
2. ✅ Full conversation history stored in `reasoning_logs` table
3. ✅ Summaries generated for assistant messages
4. ✅ Positions linked to trading sessions via `session_id`
5. ✅ `/reasoning` endpoint returns sessions with filters
6. ✅ API documentation updated
7. ✅ All tests passing
8. ✅ JSONL files eliminated

File diff suppressed because it is too large Load Diff