Endpoints
Extract
Extract structured data from financial documents
POST
Extract
Overview
The/v1/extract endpoint is the primary way to extract structured data from financial documents (invoices, receipts, bills). It processes PDF and DOCX files according to your JSON Schema definition and returns validated JSON with field-level confidence and evidence.
Our goal: 0% silent errors. You get validated output with field-level evidence, or clear failure reasons; never unreliable data silently.
Request
The document to extract data from.
- Supported formats: PDF (
.pdf), Microsoft Word (.docx) - Maximum size: 10 MB
A JSON Schema string defining the structure of data to extract.See the Schema Guide for detailed documentation.
Minimum confidence score (0.0 to 1.0) required before accepting Tier 1 results.
- Lower values (e.g., 0.70): Faster and cheaper (accepts Tier 1 results more often)
- Higher values (e.g., 0.95): More accurate but more expensive (triggers Tier 2 fallback more often)
0.85Enable math verification to ensure extracted numeric data is mathematically consistent.When enabled, Parsefy automatically:
- Verifies totals match subtotals + tax
- Validates line item sums
- Performs shadow extraction for single-field verification
falseBearer token authentication.Format:
Bearer pk_your_api_keyResponse
The extracted data matching your schema.
Processing information.
Math verification results (only present if
enable_verification was true).Present only if extraction failed.
Examples
Basic Invoice Extraction with Confidence Threshold
Response with Field-Level Confidence
Complex Schema with Line Items
Response with Line Items and Verification
Fallback Behavior
When a required field returnsnull or falls below confidence_threshold, the API automatically triggers the fallback model (Tier 2):
Error Responses
Invalid File Type (400)
Invalid Schema (400)
Unauthorized (401)
Rate Limited (429)
Extraction Failed (200 with error)
Rate Limits
- Request Rate: 1 request per second per IP
- File Size: Maximum 10 MB
Extract
