Claude Code skill that extracts structured data from document collections into tabular format using Claude's native document understanding capabilities.
2.5 KiB
2.5 KiB
Extraction Guide
Extraction Prompt Template
For each document x column, use this reasoning structure:
- Read the document text carefully
- Locate text relevant to the extraction prompt
- Extract the value, noting its exact location
- Assess confidence based on clarity of the source text
Per-Cell Output Structure
For each extraction, produce a JSON object:
{
"value": "<extracted value, typed appropriately>",
"confidence": "high | medium | low",
"supporting_quote": "<exact text from the document that supports this value>",
"reasoning": "<1-2 sentences explaining why this value was chosen>"
}
Confidence Levels
- high: Value is explicitly stated, unambiguous, directly answers the prompt
- medium: Value is implied or requires minor inference, or multiple possible values exist
- low: Value is uncertain, requires significant inference, or the document may not contain the answer
Type Handling
| Column Type | Value Format | Example |
|---|---|---|
| text | Plain string | "Acme Corporation" |
| number | Numeric value (no currency symbols) | 500000 |
| date | ISO 8601 format (YYYY-MM-DD) | "2024-01-15" |
| boolean | true or false | true |
| list | JSON array of strings | ["item1", "item2"] |
When a Value Cannot Be Found
If the document does not contain information for a column:
- Set value to null
- Set confidence to "low"
- Set supporting_quote to ""
- Set reasoning to explain why the value could not be found
Full Output JSON Schema
{
"extraction": {
"created": "ISO 8601 timestamp",
"source_directory": "/absolute/path/to/docs",
"documents_processed": 0,
"documents_skipped": [],
"columns": [
{
"name": "Column Name",
"type": "text|number|date|boolean|list",
"prompt": "The extraction prompt used"
}
],
"results": [
{
"document": "filename.pdf",
"fields": {
"Column Name": {
"value": "extracted value",
"confidence": "high|medium|low",
"supporting_quote": "exact text from document",
"reasoning": "explanation"
}
}
}
]
}
}
Markdown Table Format
Display results as a pipe-delimited markdown table.
Append (?) to low-confidence values.
Truncate cell values longer than 60 characters with ....
Example:
| Document | Party Name | Date | Amount |
|----------|-----------|------|--------|
| contract1.pdf | Acme Corp | 2024-01-15 | 500000 |
| contract2.pdf | Beta LLC(?) | 2024-03-22 | 1200000 |