feat: add extraction guide reference document
This commit is contained in:
94
skills/tabular-extract/references/extraction-guide.md
Normal file
94
skills/tabular-extract/references/extraction-guide.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Extraction Guide
|
||||
|
||||
## Extraction Prompt Template
|
||||
|
||||
For each document x column, use this reasoning structure:
|
||||
|
||||
1. Read the document text carefully
|
||||
2. Locate text relevant to the extraction prompt
|
||||
3. Extract the value, noting its exact location
|
||||
4. Assess confidence based on clarity of the source text
|
||||
|
||||
## Per-Cell Output Structure
|
||||
|
||||
For each extraction, produce a JSON object:
|
||||
|
||||
```json
|
||||
{
|
||||
"value": "<extracted value, typed appropriately>",
|
||||
"confidence": "high | medium | low",
|
||||
"supporting_quote": "<exact text from the document that supports this value>",
|
||||
"reasoning": "<1-2 sentences explaining why this value was chosen>"
|
||||
}
|
||||
```
|
||||
|
||||
### Confidence Levels
|
||||
|
||||
- **high**: Value is explicitly stated, unambiguous, directly answers the prompt
|
||||
- **medium**: Value is implied or requires minor inference, or multiple possible values exist
|
||||
- **low**: Value is uncertain, requires significant inference, or the document may not contain the answer
|
||||
|
||||
### Type Handling
|
||||
|
||||
| Column Type | Value Format | Example |
|
||||
|-------------|-------------|---------|
|
||||
| text | Plain string | "Acme Corporation" |
|
||||
| number | Numeric value (no currency symbols) | 500000 |
|
||||
| date | ISO 8601 format (YYYY-MM-DD) | "2024-01-15" |
|
||||
| boolean | true or false | true |
|
||||
| list | JSON array of strings | ["item1", "item2"] |
|
||||
|
||||
### When a Value Cannot Be Found
|
||||
|
||||
If the document does not contain information for a column:
|
||||
- Set value to null
|
||||
- Set confidence to "low"
|
||||
- Set supporting_quote to ""
|
||||
- Set reasoning to explain why the value could not be found
|
||||
|
||||
## Full Output JSON Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"extraction": {
|
||||
"created": "ISO 8601 timestamp",
|
||||
"source_directory": "/absolute/path/to/docs",
|
||||
"documents_processed": 0,
|
||||
"documents_skipped": [],
|
||||
"columns": [
|
||||
{
|
||||
"name": "Column Name",
|
||||
"type": "text|number|date|boolean|list",
|
||||
"prompt": "The extraction prompt used"
|
||||
}
|
||||
],
|
||||
"results": [
|
||||
{
|
||||
"document": "filename.pdf",
|
||||
"fields": {
|
||||
"Column Name": {
|
||||
"value": "extracted value",
|
||||
"confidence": "high|medium|low",
|
||||
"supporting_quote": "exact text from document",
|
||||
"reasoning": "explanation"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Markdown Table Format
|
||||
|
||||
Display results as a pipe-delimited markdown table.
|
||||
Append `(?)` to low-confidence values.
|
||||
Truncate cell values longer than 60 characters with `...`.
|
||||
|
||||
Example:
|
||||
```
|
||||
| Document | Party Name | Date | Amount |
|
||||
|----------|-----------|------|--------|
|
||||
| contract1.pdf | Acme Corp | 2024-01-15 | 500000 |
|
||||
| contract2.pdf | Beta LLC(?) | 2024-03-22 | 1200000 |
|
||||
```
|
||||
Reference in New Issue
Block a user