xer-mcp/specs/002-direct-db-access/spec.md

# Feature Specification: Direct Database Access for Scripts

**Feature Branch**: `002-direct-db-access`
**Created**: 2026-01-08
**Status**: Draft
**Input**: User description: "Create a new feature that allows for scripts to directly query the schedule loaded into the database. The mcp endpoint should be used to load the xer file to a database. The response should provide the necessary information for a script to then access that database directly to perform queries. The intent is to minimize the costly and time-consuming workload on the LLM for large data processing."

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Load XER to Persistent Database (Priority: P1)

As a developer building schedule analysis scripts, I want to load an XER file into a database that persists beyond the MCP session so that my scripts can query the data directly without going through the LLM.

**Why this priority**: This is the foundation of the feature - without a persistent database, scripts cannot access the data directly. This enables the primary use case of offloading large data processing from the LLM.

**Independent Test**: Can be tested by calling the load endpoint and verifying the database file is created at the returned path with the expected schema and data.

**Acceptance Scenarios**:

1. **Given** a valid XER file path, **When** I call the load-to-database endpoint, **Then** the system creates a SQLite database file at a predictable location and returns the database file path
2. **Given** an XER file is loaded to database, **When** I examine the database file, **Then** it contains all activities, relationships, WBS elements, and project data from the XER file
3. **Given** a database was previously created for an XER file, **When** I load the same XER file again, **Then** the existing database is replaced with fresh data
4. **Given** an invalid or non-existent XER file path, **When** I call the load-to-database endpoint, **Then** I receive a clear error message and no database file is created

---

### User Story 2 - Retrieve Database Connection Information (Priority: P1)

As a developer, I want the load response to include all information needed to connect to and query the database so that I can immediately start writing queries in my scripts.

**Why this priority**: Without connection information, developers cannot use the database even if it exists. This is equally critical to the first story.

**Independent Test**: Can be tested by using the returned connection info to successfully open and query the database from an external script.

**Acceptance Scenarios**:

1. **Given** a successful XER load to database, **When** I receive the response, **Then** it includes the absolute path to the database file
2. **Given** a successful XER load to database, **When** I receive the response, **Then** it includes the database schema description (table names and key columns)
3. **Given** the returned database path, **When** I connect to it from a Python/SQL script, **Then** I can successfully query activities, relationships, and other schedule data

---

### User Story 3 - Query Database Schema Information (Priority: P2)

As a developer unfamiliar with the database structure, I want to retrieve the database schema so that I can write correct SQL queries without guessing table and column names.

**Why this priority**: While developers can explore the database manually, having schema information readily available improves developer experience and reduces errors.

**Independent Test**: Can be tested by calling the schema endpoint and verifying the returned schema matches the actual database structure.

**Acceptance Scenarios**:

1. **Given** a database has been created, **When** I request the schema, **Then** I receive a list of all tables with their columns and data types
2. **Given** a database has been created, **When** I request the schema, **Then** I receive information about relationships between tables (foreign keys)
3. **Given** no database has been created yet, **When** I request the schema, **Then** I receive an informative error indicating no database is available

---

### Edge Cases

- What happens when the disk is full and database cannot be created? Return a clear error message indicating storage issue.
- What happens when the database file path is not writable? Return a clear error message indicating permission issue.
- What happens when a script is querying the database while a new XER file is being loaded? The load operation should complete atomically - either fully succeed or fully fail, preventing partial/corrupted reads.
- What happens when multiple XER files are loaded in sequence? Each load replaces the previous database content; only one project's data is available at a time.

## Requirements *(mandatory)*

### Functional Requirements

- **FR-001**: System MUST provide an MCP tool to load an XER file into a persistent SQLite database file (not just in-memory)
- **FR-002**: System MUST return the absolute file path to the created database in the load response
- **FR-003**: System MUST return a summary of the database schema (tables and key columns) in the load response
- **FR-004**: Database file MUST be stored in a predictable, accessible location that scripts can reach
- **FR-005**: System MUST preserve all data currently stored by the in-memory database (activities, relationships, WBS, calendars, projects)
- **FR-006**: System MUST provide an MCP tool to retrieve the current database path and schema without reloading data
- **FR-007**: System MUST handle concurrent access safely - database remains queryable while MCP tools are used
- **FR-008**: System MUST return clear errors when database operations fail (file not writable, disk full, etc.)
- **FR-009**: Database MUST use standard SQLite format readable by any SQLite client (Python sqlite3, DBeaver, etc.)

### Key Entities

- **Database File**: A persistent SQLite database file containing parsed XER data; has a file path, creation timestamp, and source XER file reference
- **Schema Information**: Metadata describing database structure; includes table names, column names, data types, and foreign key relationships
- **Connection Info**: All information needed to connect to and query the database; includes file path, schema summary, and access instructions

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: Scripts can query loaded schedule data directly via SQL without MCP tool calls after initial load
- **SC-002**: Database file is accessible and queryable by standard SQLite clients within 1 second of load completion
- **SC-003**: Large schedules (10,000+ activities) can be queried directly by scripts in under 100ms per query
- **SC-004**: Developers can write working SQL queries using only the schema information returned by the system
- **SC-005**: 100% of data available through existing MCP tools is also available in the direct database

## Assumptions

- SQLite is an appropriate database format for this use case (widely supported, file-based, no server needed)
- Scripts will primarily use Python, but any language with SQLite support should work
- The database file will be stored in a project-relative or user-accessible directory
- Single-user operation - concurrent writes from multiple sources are not required
- Database persistence is session-based; the file may be cleaned up when the MCP server stops (or may persist based on configuration)