# Feature Specification: Direct Database Access for Scripts **Feature Branch**: `002-direct-db-access` **Created**: 2026-01-08 **Status**: Draft **Input**: User description: "Create a new feature that allows for scripts to directly query the schedule loaded into the database. The mcp endpoint should be used to load the xer file to a database. The response should provide the necessary information for a script to then access that database directly to perform queries. The intent is to minimize the costly and time-consuming workload on the LLM for large data processing." ## User Scenarios & Testing *(mandatory)* ### User Story 1 - Load XER to Persistent Database (Priority: P1) As a developer building schedule analysis scripts, I want to load an XER file into a database that persists beyond the MCP session so that my scripts can query the data directly without going through the LLM. **Why this priority**: This is the foundation of the feature - without a persistent database, scripts cannot access the data directly. This enables the primary use case of offloading large data processing from the LLM. **Independent Test**: Can be tested by calling the load endpoint and verifying the database file is created at the returned path with the expected schema and data. **Acceptance Scenarios**: 1. **Given** a valid XER file path, **When** I call the load-to-database endpoint, **Then** the system creates a SQLite database file at a predictable location and returns the database file path 2. **Given** an XER file is loaded to database, **When** I examine the database file, **Then** it contains all activities, relationships, WBS elements, and project data from the XER file 3. **Given** a database was previously created for an XER file, **When** I load the same XER file again, **Then** the existing database is replaced with fresh data 4. **Given** an invalid or non-existent XER file path, **When** I call the load-to-database endpoint, **Then** I receive a clear error message and no database file is created --- ### User Story 2 - Retrieve Database Connection Information (Priority: P1) As a developer, I want the load response to include all information needed to connect to and query the database so that I can immediately start writing queries in my scripts. **Why this priority**: Without connection information, developers cannot use the database even if it exists. This is equally critical to the first story. **Independent Test**: Can be tested by using the returned connection info to successfully open and query the database from an external script. **Acceptance Scenarios**: 1. **Given** a successful XER load to database, **When** I receive the response, **Then** it includes the absolute path to the database file 2. **Given** a successful XER load to database, **When** I receive the response, **Then** it includes the database schema description (table names and key columns) 3. **Given** the returned database path, **When** I connect to it from a Python/SQL script, **Then** I can successfully query activities, relationships, and other schedule data --- ### User Story 3 - Query Database Schema Information (Priority: P2) As a developer unfamiliar with the database structure, I want to retrieve the database schema so that I can write correct SQL queries without guessing table and column names. **Why this priority**: While developers can explore the database manually, having schema information readily available improves developer experience and reduces errors. **Independent Test**: Can be tested by calling the schema endpoint and verifying the returned schema matches the actual database structure. **Acceptance Scenarios**: 1. **Given** a database has been created, **When** I request the schema, **Then** I receive a list of all tables with their columns and data types 2. **Given** a database has been created, **When** I request the schema, **Then** I receive information about relationships between tables (foreign keys) 3. **Given** no database has been created yet, **When** I request the schema, **Then** I receive an informative error indicating no database is available --- ### Edge Cases - What happens when the disk is full and database cannot be created? Return a clear error message indicating storage issue. - What happens when the database file path is not writable? Return a clear error message indicating permission issue. - What happens when a script is querying the database while a new XER file is being loaded? The load operation should complete atomically - either fully succeed or fully fail, preventing partial/corrupted reads. - What happens when multiple XER files are loaded in sequence? Each load replaces the previous database content; only one project's data is available at a time. ## Requirements *(mandatory)* ### Functional Requirements - **FR-001**: System MUST provide an MCP tool to load an XER file into a persistent SQLite database file (not just in-memory) - **FR-002**: System MUST return the absolute file path to the created database in the load response - **FR-003**: System MUST return a summary of the database schema (tables and key columns) in the load response - **FR-004**: Database file MUST be stored in a predictable, accessible location that scripts can reach - **FR-005**: System MUST preserve all data currently stored by the in-memory database (activities, relationships, WBS, calendars, projects) - **FR-006**: System MUST provide an MCP tool to retrieve the current database path and schema without reloading data - **FR-007**: System MUST handle concurrent access safely - database remains queryable while MCP tools are used - **FR-008**: System MUST return clear errors when database operations fail (file not writable, disk full, etc.) - **FR-009**: Database MUST use standard SQLite format readable by any SQLite client (Python sqlite3, DBeaver, etc.) ### Key Entities - **Database File**: A persistent SQLite database file containing parsed XER data; has a file path, creation timestamp, and source XER file reference - **Schema Information**: Metadata describing database structure; includes table names, column names, data types, and foreign key relationships - **Connection Info**: All information needed to connect to and query the database; includes file path, schema summary, and access instructions ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-001**: Scripts can query loaded schedule data directly via SQL without MCP tool calls after initial load - **SC-002**: Database file is accessible and queryable by standard SQLite clients within 1 second of load completion - **SC-003**: Large schedules (10,000+ activities) can be queried directly by scripts in under 100ms per query - **SC-004**: Developers can write working SQL queries using only the schema information returned by the system - **SC-005**: 100% of data available through existing MCP tools is also available in the direct database ## Assumptions - SQLite is an appropriate database format for this use case (widely supported, file-based, no server needed) - Scripts will primarily use Python, but any language with SQLite support should work - The database file will be stored in a project-relative or user-accessible directory - Single-user operation - concurrent writes from multiple sources are not required - Database persistence is session-based; the file may be cleaned up when the MCP server stops (or may persist based on configuration)