Overview
Parsefy extracts structured data from PDF and DOCX files. You define what data you want using a schema, and Parsefy returns perfectly structured JSON.Basic extraction
File inputs
Parsefy accepts multiple file input types:| Input | Description |
|---|---|
| File path | "./document.pdf" - reads from disk |
| Buffer/bytes | In-memory file data |
| File object | Browser File from form input |
| Blob | Raw binary with MIME type |
Response format
Every extraction returns:The _meta field
Every extraction includes quality metrics:
- confidence_score: 0.0 to 1.0 indicating extraction certainty
- issues: Array of any concerns encountered
Metadata
Processing information:- processing_time_ms: How long the extraction took
- credits: Credits consumed (~1 per page)
- fallback_triggered: Whether the fallback model was used
Supported formats
| Format | Extension | Processing |
|---|---|---|
.pdf | Native multimodal AI (can “see” the document) | |
| Microsoft Word | .docx | Converted to Markdown |
