Overview
Parsefy extracts structured data from financial PDFs (invoices, receipts, bills) and returns validated JSON with field-level confidence and evidence. You define what data you want using a schema, and Parsefy returns structured data or fails with clear reasons.Our goal: 0% silent errors. You get validated output or clear failure reasons; never unreliable data silently.
Basic extraction
Confidence threshold
Control when the fallback model is triggered:| Threshold | Behavior | Use Case |
|---|---|---|
| Lower (e.g., 0.70) | Faster: Accepts Tier 1 results more often | High-volume, less critical |
| Higher (e.g., 0.95) | More accurate: Triggers Tier 2 fallback more often | Financial reconciliation |
0.85
File inputs
Parsefy accepts multiple file input types:| Input | Description |
|---|---|
| File path | "./document.pdf" - reads from disk |
| Buffer/bytes | In-memory file data |
| File object | Browser File from form input |
| Blob | Raw binary with MIME type |
Response format
Every extraction returns structured data with field-level confidence:The _meta field
Every extraction includes quality metrics with evidence:
| Property | Type | Description |
|---|---|---|
confidence_score | number | Overall confidence (0.0 to 1.0) |
field_confidence | array | Per-field confidence with evidence |
issues | array | Warnings or anomalies detected |
Field confidence object
Each entry infield_confidence contains:
| Property | Type | Description |
|---|---|---|
field | string | JSON path (e.g., $.invoice_number) |
score | number | Confidence score (0.0 to 1.0) |
reason | string | ”Exact match”, “Inferred from header”, etc. |
page | number | Page number where found |
text | string | Source text evidence |
Metadata
Processing information:| Property | Type | Description |
|---|---|---|
processing_time_ms | integer | How long the extraction took |
credits | integer | Credits consumed (~1 per page) |
fallback_triggered | boolean | Whether Tier 2 model was used |
Verification (optional)
Whenenable_verification is set to true, the response includes math verification:
| Property | Type | Description |
|---|---|---|
status | string | PASSED, FAILED, PARTIAL, CANNOT_VERIFY, or NO_RULES |
checks_passed | integer | Number of checks that passed |
checks_failed | integer | Number of checks that failed |
checks_run | array | Details of each verification check |
Supported formats
| Format | Extension | Processing |
|---|---|---|
.pdf | Native multimodal AI (can “see” the document) | |
| Microsoft Word | .docx | Converted to Markdown |
Error handling
Next steps
Schema Basics
Learn how to define extraction schemas with required vs optional fields
Confidence Scores
Understanding field-level confidence and the fallback mechanism
Extraction Rules
Improve accuracy with custom rules
API Reference
Full endpoint documentation
