Files
xer-mcp/specs/002-direct-db-access/spec.md
Bill Ballou 3e7ad39eb8 docs: add specification and implementation plan for direct database access feature
This feature (002-direct-db-access) enables scripts to query schedule data
directly via SQL after loading XER files through MCP. Key additions:

- spec.md: Feature specification with 3 user stories
- plan.md: Implementation plan with constitution check
- research.md: Technology decisions (SQLite file, WAL mode, atomic writes)
- data-model.md: DatabaseInfo, SchemaInfo, TableInfo entities
- contracts/mcp-tools.json: Extended load_xer schema, new get_database_info tool
- quickstart.md: Usage examples for direct database access
- tasks.md: 16 implementation tasks across 6 phases
2026-01-08 12:38:42 -05:00

7.3 KiB

Feature Specification: Direct Database Access for Scripts

Feature Branch: 002-direct-db-access Created: 2026-01-08 Status: Draft Input: User description: "Create a new feature that allows for scripts to directly query the schedule loaded into the database. The mcp endpoint should be used to load the xer file to a database. The response should provide the necessary information for a script to then access that database directly to perform queries. The intent is to minimize the costly and time-consuming workload on the LLM for large data processing."

User Scenarios & Testing (mandatory)

User Story 1 - Load XER to Persistent Database (Priority: P1)

As a developer building schedule analysis scripts, I want to load an XER file into a database that persists beyond the MCP session so that my scripts can query the data directly without going through the LLM.

Why this priority: This is the foundation of the feature - without a persistent database, scripts cannot access the data directly. This enables the primary use case of offloading large data processing from the LLM.

Independent Test: Can be tested by calling the load endpoint and verifying the database file is created at the returned path with the expected schema and data.

Acceptance Scenarios:

  1. Given a valid XER file path, When I call the load-to-database endpoint, Then the system creates a SQLite database file at a predictable location and returns the database file path
  2. Given an XER file is loaded to database, When I examine the database file, Then it contains all activities, relationships, WBS elements, and project data from the XER file
  3. Given a database was previously created for an XER file, When I load the same XER file again, Then the existing database is replaced with fresh data
  4. Given an invalid or non-existent XER file path, When I call the load-to-database endpoint, Then I receive a clear error message and no database file is created

User Story 2 - Retrieve Database Connection Information (Priority: P1)

As a developer, I want the load response to include all information needed to connect to and query the database so that I can immediately start writing queries in my scripts.

Why this priority: Without connection information, developers cannot use the database even if it exists. This is equally critical to the first story.

Independent Test: Can be tested by using the returned connection info to successfully open and query the database from an external script.

Acceptance Scenarios:

  1. Given a successful XER load to database, When I receive the response, Then it includes the absolute path to the database file
  2. Given a successful XER load to database, When I receive the response, Then it includes the database schema description (table names and key columns)
  3. Given the returned database path, When I connect to it from a Python/SQL script, Then I can successfully query activities, relationships, and other schedule data

User Story 3 - Query Database Schema Information (Priority: P2)

As a developer unfamiliar with the database structure, I want to retrieve the database schema so that I can write correct SQL queries without guessing table and column names.

Why this priority: While developers can explore the database manually, having schema information readily available improves developer experience and reduces errors.

Independent Test: Can be tested by calling the schema endpoint and verifying the returned schema matches the actual database structure.

Acceptance Scenarios:

  1. Given a database has been created, When I request the schema, Then I receive a list of all tables with their columns and data types
  2. Given a database has been created, When I request the schema, Then I receive information about relationships between tables (foreign keys)
  3. Given no database has been created yet, When I request the schema, Then I receive an informative error indicating no database is available

Edge Cases

  • What happens when the disk is full and database cannot be created? Return a clear error message indicating storage issue.
  • What happens when the database file path is not writable? Return a clear error message indicating permission issue.
  • What happens when a script is querying the database while a new XER file is being loaded? The load operation should complete atomically - either fully succeed or fully fail, preventing partial/corrupted reads.
  • What happens when multiple XER files are loaded in sequence? Each load replaces the previous database content; only one project's data is available at a time.

Requirements (mandatory)

Functional Requirements

  • FR-001: System MUST provide an MCP tool to load an XER file into a persistent SQLite database file (not just in-memory)
  • FR-002: System MUST return the absolute file path to the created database in the load response
  • FR-003: System MUST return a summary of the database schema (tables and key columns) in the load response
  • FR-004: Database file MUST be stored in a predictable, accessible location that scripts can reach
  • FR-005: System MUST preserve all data currently stored by the in-memory database (activities, relationships, WBS, calendars, projects)
  • FR-006: System MUST provide an MCP tool to retrieve the current database path and schema without reloading data
  • FR-007: System MUST handle concurrent access safely - database remains queryable while MCP tools are used
  • FR-008: System MUST return clear errors when database operations fail (file not writable, disk full, etc.)
  • FR-009: Database MUST use standard SQLite format readable by any SQLite client (Python sqlite3, DBeaver, etc.)

Key Entities

  • Database File: A persistent SQLite database file containing parsed XER data; has a file path, creation timestamp, and source XER file reference
  • Schema Information: Metadata describing database structure; includes table names, column names, data types, and foreign key relationships
  • Connection Info: All information needed to connect to and query the database; includes file path, schema summary, and access instructions

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Scripts can query loaded schedule data directly via SQL without MCP tool calls after initial load
  • SC-002: Database file is accessible and queryable by standard SQLite clients within 1 second of load completion
  • SC-003: Large schedules (10,000+ activities) can be queried directly by scripts in under 100ms per query
  • SC-004: Developers can write working SQL queries using only the schema information returned by the system
  • SC-005: 100% of data available through existing MCP tools is also available in the direct database

Assumptions

  • SQLite is an appropriate database format for this use case (widely supported, file-based, no server needed)
  • Scripts will primarily use Python, but any language with SQLite support should work
  • The database file will be stored in a project-relative or user-accessible directory
  • Single-user operation - concurrent writes from multiple sources are not required
  • Database persistence is session-based; the file may be cleaned up when the MCP server stops (or may persist based on configuration)