Virtual environment (recommended)
We recommend using a virtual environment to avoid installing packages globally. Here’s a quick setup:
# Create a virtual environment
python -m venv venv
# Activate it (macOS/Linux)
source venv/bin/activate
# Activate it (Windows)
venv\Scripts\activate
Installation
Once activated, install the package:
- parsefy: Parsefy SDK for document extraction (includes Pydantic)
To deactivate the virtual environment later, simply run:
Quick example
Try this simple example with a receipt. Download the sample receipt
from parsefy import Parsefy
from pydantic import BaseModel, Field
client = Parsefy(api_key='pk_your_api_key')
class Receipt(BaseModel):
vendor: str = Field(description="Merchant or store name")
total: float = Field(description="Total amount paid")
purchase_date: str = Field(description="Purchase date in YYYY-MM-DD format")
result = client.extract(file="receipt.pdf", schema=Receipt)
print(result.data)
# Receipt(vendor='Cafe Mason', total=32.59, purchase_date='2023-08-29')
Setup
Set your API key as an environment variable:
export PARSEFY_API_KEY=pk_your_api_key
from parsefy import Parsefy
from pydantic import BaseModel, Field
client = Parsefy() # If not explicitly provided, Parsefy will try to get PARSEFY_API_KEY from environment variables
class Invoice(BaseModel):
# REQUIRED - triggers fallback if below confidence threshold
invoice_number: str = Field(description="The invoice number")
total: float = Field(description="Total amount including tax")
# OPTIONAL - won't trigger fallback if missing or low confidence
vendor: str | None = Field(default=None, description="Vendor name")
due_date: str | None = Field(default=None, description="Payment due date")
result = client.extract(
file="invoice.pdf",
schema=Invoice,
enable_verification=True # Enable math verification
)
if result.error is None:
print(f"Invoice #{result.data.invoice_number}")
print(f"Total: ${result.data.total}")
# Access field-level confidence from meta
if result.meta:
print(f"Overall confidence: {result.meta.confidence_score}")
for fc in result.meta.field_confidence:
print(f"{fc.field}: {fc.score} ({fc.reason}) - '{fc.text}'")
# Check verification results
if result.verification:
print(f"Verification: {result.verification.status}")
Required vs Optional Fields
All fields are required by default. Required fields that return null or fall below confidence_threshold trigger the expensive fallback model (Tier 2).
| Pydantic Definition | Behavior |
|---|
name: str = Field(...) | Required: triggers fallback if low confidence |
name: str | None = Field(default=None, ...) | Optional: won’t trigger fallback |
Rule of thumb: Mark fields as optional if they might be missing in >20% of your documents.
Confidence Threshold
Control when the fallback model is triggered:
result = client.extract(
file="invoice.pdf",
schema=Invoice,
confidence_threshold=0.85 # default
)
| Threshold | Behavior |
|---|
| Lower (0.70) | Faster: Accepts Tier 1 results more often |
| Higher (0.95) | More accurate: Triggers Tier 2 fallback more often |
Async usage
import asyncio
from parsefy import Parsefy
from pydantic import BaseModel, Field
class Invoice(BaseModel):
invoice_number: str = Field(description="The invoice number")
total: float = Field(description="Total amount")
vendor: str | None = Field(default=None, description="Vendor name")
async def main():
async with Parsefy() as client:
result = await client.extract_async(
file="invoice.pdf",
schema=Invoice,
confidence_threshold=0.85,
enable_verification=True
)
if result.error is None:
print(result.data)
if result.meta:
print(f"Confidence: {result.meta.confidence_score}")
if result.verification:
print(f"Verification: {result.verification.status}")
asyncio.run(main())
Next steps