Installation
- parsefy: Parsefy SDK for document extraction
- zod: TypeScript-first schema validation library
Quick example
Try this simple example with a receipt. Download the sample receipt
import { Parsefy } from 'parsefy';
import * as z from 'zod';
const client = new Parsefy('pk_your_api_key');
const schema = z.object({
vendor: z.string().describe('Merchant or store name'),
total: z.number().describe('Total amount paid'),
purchase_date: z.string().describe('Purchase date in YYYY-MM-DD format'),
});
const { object } = await client.extract({
file: './receipt.pdf',
schema,
});
console.log(object);
// { vendor: "Cafe Mason", total: 32.59, purchase_date: "2023-08-29" }
Setup
Set your API key as an environment variable:
export PARSEFY_API_KEY=pk_your_api_key
import { Parsefy } from 'parsefy';
import * as z from 'zod';
const client = new Parsefy(); // If not explicitly provided, Parsefy will try to get PARSEFY_API_KEY from environment variables
const schema = z.object({
// REQUIRED - triggers fallback if below confidence threshold
invoice_number: z.string().describe('The invoice number'),
total: z.number().describe('Total amount including tax'),
// OPTIONAL - won't trigger fallback if missing or low confidence
vendor: z.string().optional().describe('Vendor name'),
due_date: z.string().optional().describe('Payment due date'),
});
const { object, metadata, verification, error } = await client.extract({
file: './invoice.pdf',
schema,
enableVerification: true, // Enable math verification
});
if (!error && object) {
console.log(object.invoice_number); // Fully typed!
// Access field-level confidence from _meta
console.log(`Overall confidence: ${object._meta.confidence_score}`);
object._meta.field_confidence.forEach((fc) => {
console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
});
// Check verification results
if (verification) {
console.log(`Verification: ${verification.status}`);
}
}
Required vs Optional Fields
All fields are required by default. Required fields that return null or fall below confidenceThreshold trigger the expensive fallback model (Tier 2).
| SDK Definition | Behavior |
|---|
z.string() | Required: triggers fallback if low confidence |
z.string().optional() | Optional: won’t trigger fallback |
Rule of thumb: Mark fields as .optional() if they might be missing in >20% of your documents.
Confidence Threshold
Control when the fallback model is triggered:
const { object, metadata } = await client.extract({
file: './invoice.pdf',
schema,
confidenceThreshold: 0.85, // default
});
| Threshold | Behavior |
|---|
| Lower (0.70) | Faster: Accepts Tier 1 results more often |
| Higher (0.95) | More accurate: Triggers Tier 2 fallback more often |
Next steps