2.5 KiB
2.5 KiB
Extraction Guide
Extraction Prompt Template
For each document x column, use this reasoning structure:
- Read the document text carefully
- Locate text relevant to the extraction prompt
- Extract the value, noting its exact location
- Assess confidence based on clarity of the source text
Per-Cell Output Structure
For each extraction, produce a JSON object:
{
"value": "<extracted value, typed appropriately>",
"confidence": "high | medium | low",
"supporting_quote": "<exact text from the document that supports this value>",
"reasoning": "<1-2 sentences explaining why this value was chosen>"
}
Confidence Levels
- high: Value is explicitly stated, unambiguous, directly answers the prompt
- medium: Value is implied or requires minor inference, or multiple possible values exist
- low: Value is uncertain, requires significant inference, or the document may not contain the answer
Type Handling
| Column Type | Value Format | Example |
|---|---|---|
| text | Plain string | "Acme Corporation" |
| number | Numeric value (no currency symbols) | 500000 |
| date | ISO 8601 format (YYYY-MM-DD) | "2024-01-15" |
| boolean | true or false | true |
| list | JSON array of strings | ["item1", "item2"] |
When a Value Cannot Be Found
If the document does not contain information for a column:
- Set value to null
- Set confidence to "low"
- Set supporting_quote to ""
- Set reasoning to explain why the value could not be found
Full Output JSON Schema
{
"extraction": {
"created": "ISO 8601 timestamp",
"source_directory": "/absolute/path/to/docs",
"documents_processed": 0,
"documents_skipped": [],
"columns": [
{
"name": "Column Name",
"type": "text|number|date|boolean|list",
"prompt": "The extraction prompt used"
}
],
"results": [
{
"document": "filename.pdf",
"fields": {
"Column Name": {
"value": "extracted value",
"confidence": "high|medium|low",
"supporting_quote": "exact text from document",
"reasoning": "explanation"
}
}
}
]
}
}
Markdown Table Format
Display results as a pipe-delimited markdown table.
Append (?) to low-confidence values.
Truncate cell values longer than 60 characters with ....
Example:
| Document | Party Name | Date | Amount |
|----------|-----------|------|--------|
| contract1.pdf | Acme Corp | 2024-01-15 | 500000 |
| contract2.pdf | Beta LLC(?) | 2024-03-22 | 1200000 |