Files
claude-code-user-config/skills/tabular-extract/references/extraction-guide.md

2.5 KiB

Extraction Guide

Extraction Prompt Template

For each document x column, use this reasoning structure:

  1. Read the document text carefully
  2. Locate text relevant to the extraction prompt
  3. Extract the value, noting its exact location
  4. Assess confidence based on clarity of the source text

Per-Cell Output Structure

For each extraction, produce a JSON object:

{
  "value": "<extracted value, typed appropriately>",
  "confidence": "high | medium | low",
  "supporting_quote": "<exact text from the document that supports this value>",
  "reasoning": "<1-2 sentences explaining why this value was chosen>"
}

Confidence Levels

  • high: Value is explicitly stated, unambiguous, directly answers the prompt
  • medium: Value is implied or requires minor inference, or multiple possible values exist
  • low: Value is uncertain, requires significant inference, or the document may not contain the answer

Type Handling

Column Type Value Format Example
text Plain string "Acme Corporation"
number Numeric value (no currency symbols) 500000
date ISO 8601 format (YYYY-MM-DD) "2024-01-15"
boolean true or false true
list JSON array of strings ["item1", "item2"]

When a Value Cannot Be Found

If the document does not contain information for a column:

  • Set value to null
  • Set confidence to "low"
  • Set supporting_quote to ""
  • Set reasoning to explain why the value could not be found

Full Output JSON Schema

{
  "extraction": {
    "created": "ISO 8601 timestamp",
    "source_directory": "/absolute/path/to/docs",
    "documents_processed": 0,
    "documents_skipped": [],
    "columns": [
      {
        "name": "Column Name",
        "type": "text|number|date|boolean|list",
        "prompt": "The extraction prompt used"
      }
    ],
    "results": [
      {
        "document": "filename.pdf",
        "fields": {
          "Column Name": {
            "value": "extracted value",
            "confidence": "high|medium|low",
            "supporting_quote": "exact text from document",
            "reasoning": "explanation"
          }
        }
      }
    ]
  }
}

Markdown Table Format

Display results as a pipe-delimited markdown table. Append (?) to low-confidence values. Truncate cell values longer than 60 characters with ....

Example:

| Document | Party Name | Date | Amount |
|----------|-----------|------|--------|
| contract1.pdf | Acme Corp | 2024-01-15 | 500000 |
| contract2.pdf | Beta LLC(?) | 2024-03-22 | 1200000 |