This feature (002-direct-db-access) enables scripts to query schedule data directly via SQL after loading XER files through MCP. Key additions: - spec.md: Feature specification with 3 user stories - plan.md: Implementation plan with constitution check - research.md: Technology decisions (SQLite file, WAL mode, atomic writes) - data-model.md: DatabaseInfo, SchemaInfo, TableInfo entities - contracts/mcp-tools.json: Extended load_xer schema, new get_database_info tool - quickstart.md: Usage examples for direct database access - tasks.md: 16 implementation tasks across 6 phases
103 lines
7.3 KiB
Markdown
103 lines
7.3 KiB
Markdown
# Feature Specification: Direct Database Access for Scripts
|
|
|
|
**Feature Branch**: `002-direct-db-access`
|
|
**Created**: 2026-01-08
|
|
**Status**: Draft
|
|
**Input**: User description: "Create a new feature that allows for scripts to directly query the schedule loaded into the database. The mcp endpoint should be used to load the xer file to a database. The response should provide the necessary information for a script to then access that database directly to perform queries. The intent is to minimize the costly and time-consuming workload on the LLM for large data processing."
|
|
|
|
## User Scenarios & Testing *(mandatory)*
|
|
|
|
### User Story 1 - Load XER to Persistent Database (Priority: P1)
|
|
|
|
As a developer building schedule analysis scripts, I want to load an XER file into a database that persists beyond the MCP session so that my scripts can query the data directly without going through the LLM.
|
|
|
|
**Why this priority**: This is the foundation of the feature - without a persistent database, scripts cannot access the data directly. This enables the primary use case of offloading large data processing from the LLM.
|
|
|
|
**Independent Test**: Can be tested by calling the load endpoint and verifying the database file is created at the returned path with the expected schema and data.
|
|
|
|
**Acceptance Scenarios**:
|
|
|
|
1. **Given** a valid XER file path, **When** I call the load-to-database endpoint, **Then** the system creates a SQLite database file at a predictable location and returns the database file path
|
|
2. **Given** an XER file is loaded to database, **When** I examine the database file, **Then** it contains all activities, relationships, WBS elements, and project data from the XER file
|
|
3. **Given** a database was previously created for an XER file, **When** I load the same XER file again, **Then** the existing database is replaced with fresh data
|
|
4. **Given** an invalid or non-existent XER file path, **When** I call the load-to-database endpoint, **Then** I receive a clear error message and no database file is created
|
|
|
|
---
|
|
|
|
### User Story 2 - Retrieve Database Connection Information (Priority: P1)
|
|
|
|
As a developer, I want the load response to include all information needed to connect to and query the database so that I can immediately start writing queries in my scripts.
|
|
|
|
**Why this priority**: Without connection information, developers cannot use the database even if it exists. This is equally critical to the first story.
|
|
|
|
**Independent Test**: Can be tested by using the returned connection info to successfully open and query the database from an external script.
|
|
|
|
**Acceptance Scenarios**:
|
|
|
|
1. **Given** a successful XER load to database, **When** I receive the response, **Then** it includes the absolute path to the database file
|
|
2. **Given** a successful XER load to database, **When** I receive the response, **Then** it includes the database schema description (table names and key columns)
|
|
3. **Given** the returned database path, **When** I connect to it from a Python/SQL script, **Then** I can successfully query activities, relationships, and other schedule data
|
|
|
|
---
|
|
|
|
### User Story 3 - Query Database Schema Information (Priority: P2)
|
|
|
|
As a developer unfamiliar with the database structure, I want to retrieve the database schema so that I can write correct SQL queries without guessing table and column names.
|
|
|
|
**Why this priority**: While developers can explore the database manually, having schema information readily available improves developer experience and reduces errors.
|
|
|
|
**Independent Test**: Can be tested by calling the schema endpoint and verifying the returned schema matches the actual database structure.
|
|
|
|
**Acceptance Scenarios**:
|
|
|
|
1. **Given** a database has been created, **When** I request the schema, **Then** I receive a list of all tables with their columns and data types
|
|
2. **Given** a database has been created, **When** I request the schema, **Then** I receive information about relationships between tables (foreign keys)
|
|
3. **Given** no database has been created yet, **When** I request the schema, **Then** I receive an informative error indicating no database is available
|
|
|
|
---
|
|
|
|
### Edge Cases
|
|
|
|
- What happens when the disk is full and database cannot be created? Return a clear error message indicating storage issue.
|
|
- What happens when the database file path is not writable? Return a clear error message indicating permission issue.
|
|
- What happens when a script is querying the database while a new XER file is being loaded? The load operation should complete atomically - either fully succeed or fully fail, preventing partial/corrupted reads.
|
|
- What happens when multiple XER files are loaded in sequence? Each load replaces the previous database content; only one project's data is available at a time.
|
|
|
|
## Requirements *(mandatory)*
|
|
|
|
### Functional Requirements
|
|
|
|
- **FR-001**: System MUST provide an MCP tool to load an XER file into a persistent SQLite database file (not just in-memory)
|
|
- **FR-002**: System MUST return the absolute file path to the created database in the load response
|
|
- **FR-003**: System MUST return a summary of the database schema (tables and key columns) in the load response
|
|
- **FR-004**: Database file MUST be stored in a predictable, accessible location that scripts can reach
|
|
- **FR-005**: System MUST preserve all data currently stored by the in-memory database (activities, relationships, WBS, calendars, projects)
|
|
- **FR-006**: System MUST provide an MCP tool to retrieve the current database path and schema without reloading data
|
|
- **FR-007**: System MUST handle concurrent access safely - database remains queryable while MCP tools are used
|
|
- **FR-008**: System MUST return clear errors when database operations fail (file not writable, disk full, etc.)
|
|
- **FR-009**: Database MUST use standard SQLite format readable by any SQLite client (Python sqlite3, DBeaver, etc.)
|
|
|
|
### Key Entities
|
|
|
|
- **Database File**: A persistent SQLite database file containing parsed XER data; has a file path, creation timestamp, and source XER file reference
|
|
- **Schema Information**: Metadata describing database structure; includes table names, column names, data types, and foreign key relationships
|
|
- **Connection Info**: All information needed to connect to and query the database; includes file path, schema summary, and access instructions
|
|
|
|
## Success Criteria *(mandatory)*
|
|
|
|
### Measurable Outcomes
|
|
|
|
- **SC-001**: Scripts can query loaded schedule data directly via SQL without MCP tool calls after initial load
|
|
- **SC-002**: Database file is accessible and queryable by standard SQLite clients within 1 second of load completion
|
|
- **SC-003**: Large schedules (10,000+ activities) can be queried directly by scripts in under 100ms per query
|
|
- **SC-004**: Developers can write working SQL queries using only the schema information returned by the system
|
|
- **SC-005**: 100% of data available through existing MCP tools is also available in the direct database
|
|
|
|
## Assumptions
|
|
|
|
- SQLite is an appropriate database format for this use case (widely supported, file-based, no server needed)
|
|
- Scripts will primarily use Python, but any language with SQLite support should work
|
|
- The database file will be stored in a project-relative or user-accessible directory
|
|
- Single-user operation - concurrent writes from multiple sources are not required
|
|
- Database persistence is session-based; the file may be cleaned up when the MCP server stops (or may persist based on configuration)
|