Skip to main content
We recommend using a virtual environment to avoid installing packages globally. Here’s a quick setup:
# Create a virtual environment
python -m venv venv

# Activate it (macOS/Linux)
source venv/bin/activate

# Activate it (Windows)
venv\Scripts\activate

Installation

Once activated, install the package:
pip install parsefy
  • parsefy: Parsefy SDK for document extraction (includes Pydantic)
To deactivate the virtual environment later, simply run:
deactivate
For more details on virtual environments, see the official Python documentation.

Quick example

Try this simple example with a receipt. Download the sample receipt
from parsefy import Parsefy
from pydantic import BaseModel, Field

client = Parsefy(api_key='pk_your_api_key')

class Receipt(BaseModel):
    vendor: str = Field(description="Merchant or store name")
    total: float = Field(description="Total amount paid")
    purchase_date: str = Field(description="Purchase date in YYYY-MM-DD format")

result = client.extract(file="receipt.pdf", schema=Receipt)

print(result.data)
# Receipt(vendor='Cafe Mason', total=32.59, purchase_date='2023-08-29')
To get your API key, join the waitlist.

Setup

Set your API key as an environment variable:
export PARSEFY_API_KEY=pk_your_api_key

Extract your first document

from parsefy import Parsefy
from pydantic import BaseModel, Field


client = Parsefy() # If not explicitly provided, Parsefy will try to get PARSEFY_API_KEY from environment variables

class Invoice(BaseModel):
    # REQUIRED - triggers fallback if below confidence threshold
    invoice_number: str = Field(description="The invoice number")
    total: float = Field(description="Total amount including tax")

    # OPTIONAL - won't trigger fallback if missing or low confidence
    vendor: str | None = Field(default=None, description="Vendor name")
    due_date: str | None = Field(default=None, description="Payment due date")

result = client.extract(
    file="invoice.pdf",
    schema=Invoice,
    enable_verification=True  # Enable math verification
)

if result.error is None:
    print(f"Invoice #{result.data.invoice_number}")
    print(f"Total: ${result.data.total}")
    
    # Access field-level confidence from meta
    if result.meta:
        print(f"Overall confidence: {result.meta.confidence_score}")
        for fc in result.meta.field_confidence:
            print(f"{fc.field}: {fc.score} ({fc.reason}) - '{fc.text}'")
    
    # Check verification results
    if result.verification:
        print(f"Verification: {result.verification.status}")

Required vs Optional Fields

All fields are required by default. Required fields that return null or fall below confidence_threshold trigger the expensive fallback model (Tier 2).
Pydantic DefinitionBehavior
name: str = Field(...)Required: triggers fallback if low confidence
name: str | None = Field(default=None, ...)Optional: won’t trigger fallback
Rule of thumb: Mark fields as optional if they might be missing in >20% of your documents.

Confidence Threshold

Control when the fallback model is triggered:
result = client.extract(
    file="invoice.pdf",
    schema=Invoice,
    confidence_threshold=0.85  # default
)
ThresholdBehavior
Lower (0.70)Faster: Accepts Tier 1 results more often
Higher (0.95)More accurate: Triggers Tier 2 fallback more often

Async usage

import asyncio
from parsefy import Parsefy
from pydantic import BaseModel, Field

class Invoice(BaseModel):
    invoice_number: str = Field(description="The invoice number")
    total: float = Field(description="Total amount")
    vendor: str | None = Field(default=None, description="Vendor name")

async def main():
    async with Parsefy() as client:
        result = await client.extract_async(
            file="invoice.pdf",
            schema=Invoice,
            confidence_threshold=0.85,
            enable_verification=True
        )
        if result.error is None:
            print(result.data)
            if result.meta:
                print(f"Confidence: {result.meta.confidence_score}")
            if result.verification:
                print(f"Verification: {result.verification.status}")

asyncio.run(main())

Next steps