# Extraction Guide ## Extraction Prompt Template For each document x column, use this reasoning structure: 1. Read the document text carefully 2. Locate text relevant to the extraction prompt 3. Extract the value, noting its exact location 4. Assess confidence based on clarity of the source text ## Per-Cell Output Structure For each extraction, produce a JSON object: ```json { "value": "", "confidence": "high | medium | low", "supporting_quote": "", "reasoning": "<1-2 sentences explaining why this value was chosen>" } ``` ### Confidence Levels - **high**: Value is explicitly stated, unambiguous, directly answers the prompt - **medium**: Value is implied or requires minor inference, or multiple possible values exist - **low**: Value is uncertain, requires significant inference, or the document may not contain the answer ### Type Handling | Column Type | Value Format | Example | |-------------|-------------|---------| | text | Plain string | "Acme Corporation" | | number | Numeric value (no currency symbols) | 500000 | | date | ISO 8601 format (YYYY-MM-DD) | "2024-01-15" | | boolean | true or false | true | | list | JSON array of strings | ["item1", "item2"] | ### When a Value Cannot Be Found If the document does not contain information for a column: - Set value to null - Set confidence to "low" - Set supporting_quote to "" - Set reasoning to explain why the value could not be found ## Full Output JSON Schema ```json { "extraction": { "created": "ISO 8601 timestamp", "source_directory": "/absolute/path/to/docs", "documents_processed": 0, "documents_skipped": [], "columns": [ { "name": "Column Name", "type": "text|number|date|boolean|list", "prompt": "The extraction prompt used" } ], "results": [ { "document": "filename.pdf", "fields": { "Column Name": { "value": "extracted value", "confidence": "high|medium|low", "supporting_quote": "exact text from document", "reasoning": "explanation" } } } ] } } ``` ## Markdown Table Format Display results as a pipe-delimited markdown table. Append `(?)` to low-confidence values. Truncate cell values longer than 60 characters with `...`. Example: ``` | Document | Party Name | Date | Amount | |----------|-----------|------|--------| | contract1.pdf | Acme Corp | 2024-01-15 | 500000 | | contract2.pdf | Beta LLC(?) | 2024-03-22 | 1200000 | ```