xer-mcp/specs/002-direct-db-access/tasks.md

# Implementation Tasks: Direct Database Access for Scripts

**Branch**: `002-direct-db-access` | **Spec**: [spec.md](./spec.md) | **Plan**: [plan.md](./plan.md)

## Quick Reference

- **Feature**: Direct Database Access for Scripts
- **Version**: 0.2.0
- **Test Command**: `pytest`
- **Lint Command**: `ruff check .`

---

## Phase 1: Setup

### Task 1.1: Create Feature Branch and Documentation Structure [X]

**Type**: Setup
**Why**: Establish isolated workspace for feature development

**Steps**:
1. Verify branch `002-direct-db-access` exists (already created during spec/plan)
2. Verify all design artifacts exist:
   - `specs/002-direct-db-access/spec.md`
   - `specs/002-direct-db-access/plan.md`
   - `specs/002-direct-db-access/research.md`
   - `specs/002-direct-db-access/data-model.md`
   - `specs/002-direct-db-access/contracts/mcp-tools.json`
   - `specs/002-direct-db-access/quickstart.md`

**Verification**: `git branch --show-current` shows `002-direct-db-access`

**Acceptance**: All design artifacts present and branch is active

---

## Phase 2: Foundational - DatabaseManager Extension

### Task 2.1: Add File-Based Database Support to DatabaseManager [X]

**Type**: Implementation
**Why**: Foundation for all persistent database features
**Dependencies**: None

**Contract Reference**: `contracts/mcp-tools.json` - DatabaseInfo schema

**Test First** (TDD):
```python
# tests/unit/test_db_manager.py

def test_initialize_with_memory_by_default():
    """Default initialization uses in-memory database."""
    dm = DatabaseManager()
    dm.initialize()
    assert dm.db_path == ":memory:"
    assert dm.is_persistent is False

def test_initialize_with_file_path():
    """Can initialize with explicit file path."""
    dm = DatabaseManager()
    dm.initialize(db_path="/tmp/test.db")
    assert dm.db_path == "/tmp/test.db"
    assert dm.is_persistent is True
    # Cleanup
    os.unlink("/tmp/test.db")

def test_initialize_with_empty_string_auto_generates_path():
    """Empty string db_path with source_file auto-generates path."""
    dm = DatabaseManager()
    dm.initialize(db_path="", source_file="/path/to/schedule.xer")
    assert dm.db_path == "/path/to/schedule.sqlite"
    assert dm.is_persistent is True

def test_file_database_persists_after_close():
    """File-based database persists after connection close."""
    dm = DatabaseManager()
    dm.initialize(db_path="/tmp/persist_test.db")
    # Insert test data would go here
    dm.close()
    assert os.path.exists("/tmp/persist_test.db")
    # Cleanup
    os.unlink("/tmp/persist_test.db")
```

**Implementation**:
- Modify `src/xer_mcp/db/__init__.py`:
  - Add `db_path` property to track current database path
  - Add `is_persistent` property (True if file-based, False if in-memory)
  - Add `source_file` property to track loaded XER file
  - Add `loaded_at` property (datetime when data was loaded)
  - Modify `initialize()` to accept optional `db_path` and `source_file` parameters
  - If `db_path` is empty string and `source_file` provided, derive path from source file
  - Use WAL mode for file-based databases: `PRAGMA journal_mode=WAL`

**Files Changed**:
- `src/xer_mcp/db/__init__.py`
- `tests/unit/test_db_manager.py` (new)

**Verification**: `pytest tests/unit/test_db_manager.py -v`

**Acceptance**: All unit tests pass; DatabaseManager supports both in-memory and file-based modes

---

### Task 2.2: Implement Atomic Write Pattern [X]

**Type**: Implementation
**Why**: Prevents corrupted database files if process interrupted during load
**Dependencies**: Task 2.1

**Contract Reference**: `research.md` - Atomic Write Strategy

**Test First** (TDD):
```python
# tests/unit/test_db_manager.py

def test_atomic_write_creates_temp_file_first():
    """Database is created at .tmp path first, then renamed."""
    dm = DatabaseManager()
    target = "/tmp/atomic_test.db"
    # During initialization, temp file should exist
    # After completion, only target should exist
    dm.initialize(db_path=target)
    assert os.path.exists(target)
    assert not os.path.exists(target + ".tmp")
    os.unlink(target)

def test_atomic_write_removes_temp_on_failure():
    """Temp file is cleaned up if initialization fails."""
    # Test with invalid schema or similar failure scenario
    pass
```

**Implementation**:
- In `initialize()` method when `db_path` is not `:memory:`:
  1. Create connection to `{db_path}.tmp`
  2. Execute schema creation
  3. Close connection
  4. Rename `{db_path}.tmp` to `{db_path}` (atomic on POSIX)
  5. Reopen connection to final path
- Handle cleanup of `.tmp` file on failure

**Files Changed**:
- `src/xer_mcp/db/__init__.py`
- `tests/unit/test_db_manager.py`

**Verification**: `pytest tests/unit/test_db_manager.py -v`

**Acceptance**: Atomic write tests pass; no partial database files created on failure

---

### Task 2.3: Add Schema Introspection Query [X]

**Type**: Implementation
**Why**: Required for returning schema information in responses
**Dependencies**: Task 2.1

**Contract Reference**: `contracts/mcp-tools.json` - SchemaInfo, TableInfo, ColumnInfo schemas

**Test First** (TDD):
```python
# tests/unit/test_db_manager.py

def test_get_schema_info_returns_all_tables():
    """Schema info includes all database tables."""
    dm = DatabaseManager()
    dm.initialize()
    schema = dm.get_schema_info()
    assert schema["version"] == "0.2.0"
    table_names = [t["name"] for t in schema["tables"]]
    assert "projects" in table_names
    assert "activities" in table_names
    assert "relationships" in table_names
    assert "wbs" in table_names
    assert "calendars" in table_names

def test_get_schema_info_includes_column_details():
    """Schema info includes column names, types, and nullable."""
    dm = DatabaseManager()
    dm.initialize()
    schema = dm.get_schema_info()
    activities_table = next(t for t in schema["tables"] if t["name"] == "activities")
    column_names = [c["name"] for c in activities_table["columns"]]
    assert "task_id" in column_names
    assert "task_name" in column_names
    # Check column details
    task_id_col = next(c for c in activities_table["columns"] if c["name"] == "task_id")
    assert task_id_col["type"] == "TEXT"
    assert task_id_col["nullable"] is False

def test_get_schema_info_includes_row_counts():
    """Schema info includes row counts for each table."""
    dm = DatabaseManager()
    dm.initialize()
    schema = dm.get_schema_info()
    for table in schema["tables"]:
        assert "row_count" in table
        assert isinstance(table["row_count"], int)
```

**Implementation**:
- Add `get_schema_info()` method to DatabaseManager:
  - Query `sqlite_master` for table names
  - Use `PRAGMA table_info(table_name)` for column details
  - Use `PRAGMA foreign_key_list(table_name)` for foreign keys
  - Query `SELECT COUNT(*) FROM table` for row counts
  - Return SchemaInfo structure matching contract

**Files Changed**:
- `src/xer_mcp/db/__init__.py`
- `tests/unit/test_db_manager.py`

**Verification**: `pytest tests/unit/test_db_manager.py -v`

**Acceptance**: Schema introspection returns accurate table/column information

---

## Phase 3: User Story 1 - Load XER to Persistent Database (P1)

### Task 3.1: Extend load_xer Tool with db_path Parameter [X]

**Type**: Implementation
**Why**: Core feature - enables persistent database creation
**Dependencies**: Task 2.1, Task 2.2

**Contract Reference**: `contracts/mcp-tools.json` - load_xer inputSchema

**Test First** (TDD):
```python
# tests/contract/test_load_xer.py

@pytest.mark.asyncio
async def test_load_xer_with_db_path_creates_file(tmp_path, sample_xer_file):
    """load_xer with db_path creates persistent database file."""
    db_file = tmp_path / "schedule.db"
    result = await load_xer(sample_xer_file, db_path=str(db_file))
    assert result["success"] is True
    assert db_file.exists()
    assert result["database"]["db_path"] == str(db_file)
    assert result["database"]["is_persistent"] is True

@pytest.mark.asyncio
async def test_load_xer_with_empty_db_path_auto_generates(sample_xer_file):
    """load_xer with empty db_path generates path from XER filename."""
    result = await load_xer(sample_xer_file, db_path="")
    assert result["success"] is True
    expected_db = sample_xer_file.replace(".xer", ".sqlite")
    assert result["database"]["db_path"] == expected_db
    assert result["database"]["is_persistent"] is True
    # Cleanup
    os.unlink(expected_db)

@pytest.mark.asyncio
async def test_load_xer_without_db_path_uses_memory(sample_xer_file):
    """load_xer without db_path uses in-memory database (backward compatible)."""
    result = await load_xer(sample_xer_file)
    assert result["success"] is True
    assert result["database"]["db_path"] == ":memory:"
    assert result["database"]["is_persistent"] is False

@pytest.mark.asyncio
async def test_load_xer_database_contains_all_data(tmp_path, sample_xer_file):
    """Persistent database contains all parsed data."""
    db_file = tmp_path / "schedule.db"
    result = await load_xer(sample_xer_file, db_path=str(db_file))

    # Verify data via direct SQL
    import sqlite3
    conn = sqlite3.connect(str(db_file))
    cursor = conn.execute("SELECT COUNT(*) FROM activities")
    count = cursor.fetchone()[0]
    conn.close()

    assert count == result["activity_count"]
```

**Implementation**:
- Modify `src/xer_mcp/tools/load_xer.py`:
  - Add `db_path: str | None = None` parameter
  - Pass `db_path` and `file_path` (as source_file) to `db.initialize()`
  - Include `database` field in response with DatabaseInfo

**Files Changed**:
- `src/xer_mcp/tools/load_xer.py`
- `tests/contract/test_load_xer.py`

**Verification**: `pytest tests/contract/test_load_xer.py -v`

**Acceptance**: load_xer creates persistent database when db_path provided

---

### Task 3.2: Add Database Info to load_xer Response [X]

**Type**: Implementation
**Why**: Response must include all info needed to connect to database
**Dependencies**: Task 3.1, Task 2.3

**Contract Reference**: `contracts/mcp-tools.json` - load_xer outputSchema.database

**Test First** (TDD):
```python
# tests/contract/test_load_xer.py

@pytest.mark.asyncio
async def test_load_xer_response_includes_database_info(tmp_path, sample_xer_file):
    """load_xer response includes complete database info."""
    db_file = tmp_path / "schedule.db"
    result = await load_xer(sample_xer_file, db_path=str(db_file))

    assert "database" in result
    db_info = result["database"]
    assert "db_path" in db_info
    assert "is_persistent" in db_info
    assert "source_file" in db_info
    assert "loaded_at" in db_info
    assert "schema" in db_info

@pytest.mark.asyncio
async def test_load_xer_response_schema_includes_tables(tmp_path, sample_xer_file):
    """load_xer response schema includes table information."""
    db_file = tmp_path / "schedule.db"
    result = await load_xer(sample_xer_file, db_path=str(db_file))

    schema = result["database"]["schema"]
    assert "version" in schema
    assert "tables" in schema
    table_names = [t["name"] for t in schema["tables"]]
    assert "activities" in table_names
    assert "relationships" in table_names
```

**Implementation**:
- Modify `src/xer_mcp/tools/load_xer.py`:
  - After successful load, call `db.get_schema_info()`
  - Build DatabaseInfo response structure
  - Include in return dictionary

**Files Changed**:
- `src/xer_mcp/tools/load_xer.py`
- `tests/contract/test_load_xer.py`

**Verification**: `pytest tests/contract/test_load_xer.py -v`

**Acceptance**: load_xer response includes complete DatabaseInfo with schema

---

### Task 3.3: Register db_path Parameter with MCP Server [X]

**Type**: Implementation
**Why**: MCP server must expose the new parameter to clients
**Dependencies**: Task 3.1

**Contract Reference**: `contracts/mcp-tools.json` - load_xer inputSchema

**Test First** (TDD):
```python
# tests/contract/test_load_xer.py

def test_load_xer_tool_schema_includes_db_path():
    """MCP tool schema includes db_path parameter."""
    from xer_mcp.server import server
    tools = server.list_tools()
    load_xer_tool = next(t for t in tools if t.name == "load_xer")
    props = load_xer_tool.inputSchema["properties"]
    assert "db_path" in props
    assert props["db_path"]["type"] == "string"
```

**Implementation**:
- Modify MCP tool registration in `src/xer_mcp/server.py`:
  - Add `db_path` to load_xer tool inputSchema
  - Update tool handler to pass db_path to load_xer function

**Files Changed**:
- `src/xer_mcp/server.py`
- `tests/contract/test_load_xer.py`

**Verification**: `pytest tests/contract/test_load_xer.py -v`

**Acceptance**: MCP server exposes db_path parameter for load_xer tool

---

### Task 3.4: Handle Database Write Errors [X]

**Type**: Implementation
**Why**: Clear error messages for write failures (FR-008)
**Dependencies**: Task 3.1

**Contract Reference**: `contracts/mcp-tools.json` - Error schema

**Test First** (TDD):
```python
# tests/contract/test_load_xer.py

@pytest.mark.asyncio
async def test_load_xer_error_on_unwritable_path(sample_xer_file):
    """load_xer returns error for unwritable path."""
    result = await load_xer(sample_xer_file, db_path="/root/forbidden.db")
    assert result["success"] is False
    assert result["error"]["code"] == "FILE_NOT_WRITABLE"

@pytest.mark.asyncio
async def test_load_xer_error_on_invalid_path(sample_xer_file):
    """load_xer returns error for invalid path."""
    result = await load_xer(sample_xer_file, db_path="/nonexistent/dir/file.db")
    assert result["success"] is False
    assert result["error"]["code"] == "FILE_NOT_WRITABLE"
```

**Implementation**:
- Add error handling in load_xer for database creation failures:
  - Catch PermissionError → FILE_NOT_WRITABLE
  - Catch OSError with ENOSPC → DISK_FULL
  - Catch other sqlite3 errors → DATABASE_ERROR

**Files Changed**:
- `src/xer_mcp/tools/load_xer.py`
- `src/xer_mcp/errors.py` (add new error classes if needed)
- `tests/contract/test_load_xer.py`

**Verification**: `pytest tests/contract/test_load_xer.py -v`

**Acceptance**: Clear error messages returned for all database write failures

---

## Phase 4: User Story 2 - Retrieve Database Connection Information (P1)

### Task 4.1: Create get_database_info Tool [X]

**Type**: Implementation
**Why**: Allows retrieval of database info without reloading
**Dependencies**: Task 2.3

**Contract Reference**: `contracts/mcp-tools.json` - get_database_info

**Test First** (TDD):
```python
# tests/contract/test_get_database_info.py

@pytest.mark.asyncio
async def test_get_database_info_returns_current_database(tmp_path, sample_xer_file):
    """get_database_info returns info about currently loaded database."""
    db_file = tmp_path / "schedule.db"
    await load_xer(sample_xer_file, db_path=str(db_file))

    result = await get_database_info()
    assert "database" in result
    assert result["database"]["db_path"] == str(db_file)
    assert result["database"]["is_persistent"] is True

@pytest.mark.asyncio
async def test_get_database_info_error_when_no_database():
    """get_database_info returns error when no database loaded."""
    # Reset database state
    from xer_mcp.db import db
    db.close()

    result = await get_database_info()
    assert "error" in result
    assert result["error"]["code"] == "NO_FILE_LOADED"

@pytest.mark.asyncio
async def test_get_database_info_includes_schema(tmp_path, sample_xer_file):
    """get_database_info includes schema information."""
    db_file = tmp_path / "schedule.db"
    await load_xer(sample_xer_file, db_path=str(db_file))

    result = await get_database_info()
    assert "schema" in result["database"]
    assert "tables" in result["database"]["schema"]
```

**Implementation**:
- Create `src/xer_mcp/tools/get_database_info.py`:
  - Check if database is initialized
  - Return NO_FILE_LOADED error if not
  - Return DatabaseInfo structure with schema

**Files Changed**:
- `src/xer_mcp/tools/get_database_info.py` (new)
- `tests/contract/test_get_database_info.py` (new)

**Verification**: `pytest tests/contract/test_get_database_info.py -v`

**Acceptance**: get_database_info returns complete DatabaseInfo or appropriate error

---

### Task 4.2: Register get_database_info with MCP Server [X]

**Type**: Implementation
**Why**: MCP server must expose the new tool to clients
**Dependencies**: Task 4.1

**Contract Reference**: `contracts/mcp-tools.json` - get_database_info

**Test First** (TDD):
```python
# tests/contract/test_get_database_info.py

def test_get_database_info_tool_registered():
    """get_database_info tool is registered with MCP server."""
    from xer_mcp.server import server
    tools = server.list_tools()
    tool_names = [t.name for t in tools]
    assert "get_database_info" in tool_names
```

**Implementation**:
- Modify `src/xer_mcp/server.py`:
  - Import get_database_info function
  - Add tool definition with empty inputSchema
  - Add handler for get_database_info calls

**Files Changed**:
- `src/xer_mcp/server.py`
- `tests/contract/test_get_database_info.py`

**Verification**: `pytest tests/contract/test_get_database_info.py -v`

**Acceptance**: get_database_info tool accessible via MCP

---

## Phase 5: User Story 3 - Query Database Schema Information (P2)

### Task 5.1: Add Primary Key Information to Schema [X]

**Type**: Implementation
**Why**: Helps developers understand table structure for queries
**Dependencies**: Task 2.3

**Contract Reference**: `contracts/mcp-tools.json` - TableInfo.primary_key

**Test First** (TDD):
```python
# tests/unit/test_db_manager.py

def test_schema_info_includes_primary_keys():
    """Schema info includes primary key for each table."""
    dm = DatabaseManager()
    dm.initialize()
    schema = dm.get_schema_info()

    activities_table = next(t for t in schema["tables"] if t["name"] == "activities")
    assert "primary_key" in activities_table
    assert "task_id" in activities_table["primary_key"]
```

**Implementation**:
- Enhance `get_schema_info()` in DatabaseManager:
  - Use `PRAGMA table_info()` to identify PRIMARY KEY columns
  - Add to TableInfo structure

**Files Changed**:
- `src/xer_mcp/db/__init__.py`
- `tests/unit/test_db_manager.py`

**Verification**: `pytest tests/unit/test_db_manager.py -v`

**Acceptance**: Primary key information included in schema response

---

### Task 5.2: Add Foreign Key Information to Schema [X]

**Type**: Implementation
**Why**: Documents table relationships for complex queries
**Dependencies**: Task 5.1

**Contract Reference**: `contracts/mcp-tools.json` - ForeignKeyInfo

**Test First** (TDD):
```python
# tests/unit/test_db_manager.py

def test_schema_info_includes_foreign_keys():
    """Schema info includes foreign key relationships."""
    dm = DatabaseManager()
    dm.initialize()
    schema = dm.get_schema_info()

    activities_table = next(t for t in schema["tables"] if t["name"] == "activities")
    assert "foreign_keys" in activities_table
    # activities.proj_id -> projects.proj_id
    fk = next((fk for fk in activities_table["foreign_keys"]
               if fk["column"] == "proj_id"), None)
    assert fk is not None
    assert fk["references_table"] == "projects"
    assert fk["references_column"] == "proj_id"
```

**Implementation**:
- Enhance `get_schema_info()` in DatabaseManager:
  - Use `PRAGMA foreign_key_list(table_name)` to get FK relationships
  - Add to TableInfo structure

**Files Changed**:
- `src/xer_mcp/db/__init__.py`
- `tests/unit/test_db_manager.py`

**Verification**: `pytest tests/unit/test_db_manager.py -v`

**Acceptance**: Foreign key information included in schema response

---

## Phase 6: Polish

### Task 6.1: Integration Test - External Script Access [X]

**Type**: Testing
**Why**: Validates end-to-end workflow matches quickstart documentation
**Dependencies**: All previous tasks

**Contract Reference**: `quickstart.md` - Python Example

**Test**:
```python
# tests/integration/test_direct_db_access.py

@pytest.mark.asyncio
async def test_external_script_can_query_database(tmp_path, sample_xer_file):
    """External script can query database using returned path."""
    db_file = tmp_path / "schedule.db"
    result = await load_xer(sample_xer_file, db_path=str(db_file))

    # Simulate external script access (as shown in quickstart.md)
    import sqlite3
    db_path = result["database"]["db_path"]

    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row

    # Query milestones
    cursor = conn.execute("""
        SELECT task_code, task_name, target_start_date, milestone_type
        FROM activities
        WHERE task_type IN ('TT_Mile', 'TT_FinMile')
        ORDER BY target_start_date
    """)

    milestones = cursor.fetchall()
    conn.close()

    assert len(milestones) > 0
    assert all(row["task_code"] for row in milestones)
```

**Files Changed**:
- `tests/integration/test_direct_db_access.py` (new)

**Verification**: `pytest tests/integration/test_direct_db_access.py -v`

**Acceptance**: External script workflow matches quickstart documentation

---

### Task 6.2: Update Version to 0.2.0 [X]

**Type**: Configuration
**Why**: Semantic versioning for new feature release
**Dependencies**: All previous tasks

**Steps**:
1. Update version in `pyproject.toml` to `0.2.0`
2. Update version in schema introspection response to `0.2.0`
3. Update any version references in documentation

**Files Changed**:
- `pyproject.toml`
- `src/xer_mcp/db/__init__.py` (SCHEMA_VERSION constant)

**Verification**: `grep -r "0.2.0" pyproject.toml src/`

**Acceptance**: Version consistently shows 0.2.0 across project

---

### Task 6.3: Run Full Test Suite and Linting [X]

**Type**: Verification
**Why**: Ensure all tests pass and code meets standards
**Dependencies**: All previous tasks

**Steps**:
1. Run `pytest` - all tests must pass
2. Run `ruff check .` - no linting errors
3. Run `ruff format --check .` - code properly formatted

**Verification**:
```bash
pytest
ruff check .
ruff format --check .
```

**Acceptance**: All tests pass, no linting errors, code properly formatted

---

### Task 6.4: Commit and Prepare for Merge [X]

**Type**: Git Operations
**Why**: Prepare feature for merge to main branch
**Dependencies**: Task 6.3

**Steps**:
1. Review all changes with `git diff main`
2. Commit any uncommitted changes with descriptive messages
3. Verify branch is ready for PR/merge

**Verification**: `git status` shows clean working directory

**Acceptance**: All changes committed, branch ready for merge

---

## Summary

| Phase | Tasks | Focus |
|-------|-------|-------|
| Phase 1 | 1 | Setup and verification |
| Phase 2 | 3 | DatabaseManager foundation |
| Phase 3 | 4 | US1 - Load to persistent DB (P1) |
| Phase 4 | 2 | US2 - Retrieve DB info (P1) |
| Phase 5 | 2 | US3 - Schema information (P2) |
| Phase 6 | 4 | Integration, versioning, polish |

**Total Tasks**: 16
**Estimated Test Count**: ~25 new tests