Skip to main content

Overview

Parsefy extracts structured data from financial PDFs (invoices, receipts, bills) and returns validated JSON with field-level confidence and evidence. You define what data you want using a schema, and Parsefy returns structured data or fails with clear reasons.
Our goal: 0% silent errors. You get validated output or clear failure reasons; never unreliable data silently.

Basic extraction

import { Parsefy } from 'parsefy';
import * as z from 'zod';

const client = new Parsefy();

const schema = z.object({
  // REQUIRED - triggers fallback if below confidence threshold
  invoice_number: z.string().describe('The invoice number'),
  total: z.number().describe('Total amount including tax'),

  // OPTIONAL - won't trigger fallback if missing
  vendor: z.string().optional().describe('Vendor name'),
  due_date: z.string().optional().describe('Payment due date'),
});

const { object, metadata, verification, error } = await client.extract({
  file: './invoice.pdf',
  schema,
  enableVerification: true, // Enable math verification
});

if (!error && object) {
  console.log(object.invoice_number); // Fully typed!

  // Access field-level confidence from _meta
  console.log(`Overall confidence: ${object._meta.confidence_score}`);
  object._meta.field_confidence.forEach((fc) => {
    console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
  });

  // Check verification results
  if (verification) {
    console.log(`Verification: ${verification.status}`);
  }
}

Confidence threshold

Control when the fallback model is triggered:
const { object, metadata } = await client.extract({
  file: './invoice.pdf',
  schema,
  confidenceThreshold: 0.85, // default
});
ThresholdBehaviorUse Case
Lower (e.g., 0.70)Faster: Accepts Tier 1 results more oftenHigh-volume, less critical
Higher (e.g., 0.95)More accurate: Triggers Tier 2 fallback more oftenFinancial reconciliation
Default: 0.85

File inputs

Parsefy accepts multiple file input types:
InputDescription
File path"./document.pdf" - reads from disk
Buffer/bytesIn-memory file data
File objectBrowser File from form input
BlobRaw binary with MIME type
// File path
const result = await client.extract({ file: './doc.pdf', schema });

// Buffer (Node.js)
import { readFileSync } from 'fs';
const result = await client.extract({ 
  file: readFileSync('./doc.pdf'), 
  schema 
});

// File object (browser)
const fileInput = document.querySelector('input[type="file"]');
const result = await client.extract({ 
  file: fileInput.files[0], 
  schema 
});

Response format

Every extraction returns structured data with field-level confidence:
{
  "object": {
    "invoice_number": "INV-2024-0042",
    "date": "2024-01-15",
    "subtotal": 1150.00,
    "tax": 100.00,
    "total": 1250.00,
    "vendor": "Acme Corp",
    "_meta": {
      "confidence_score": 0.94,
      "field_confidence": [
        { "field": "$.invoice_number", "score": 0.98, "reason": "Exact match", "page": 1, "text": "Invoice # INV-2024-0042" },
        { "field": "$.date", "score": 0.95, "reason": "Exact match", "page": 1, "text": "Date: 01/15/2024" },
        { "field": "$.subtotal", "score": 0.95, "reason": "Exact match", "page": 1, "text": "Subtotal: $1,150.00" },
        { "field": "$.tax", "score": 0.95, "reason": "Exact match", "page": 1, "text": "Tax: $100.00" },
        { "field": "$.total", "score": 0.92, "reason": "Formatting ambiguous", "page": 1, "text": "Total: $1,250.00" },
        { "field": "$.vendor", "score": 0.90, "reason": "Inferred from header", "page": 1, "text": "Acme Corp" }
      ],
      "issues": []
    }
  },
  "metadata": {
    "processing_time_ms": 2340,
    "credits": 1,
    "fallback_triggered": false
  },
  "verification": {
    "status": "PASSED",
    "checks_passed": 1,
    "checks_failed": 0,
    "cannot_verify_count": 0,
    "checks_run": [
      {
        "type": "HORIZONTAL_SUM",
        "status": "PASSED",
        "fields": ["total", "subtotal", "tax"],
        "passed": true,
        "delta": 0.0,
        "expected": 1250.00,
        "actual": 1250.00
      }
    ]
  },
  "error": null
}

The _meta field

Every extraction includes quality metrics with evidence:
PropertyTypeDescription
confidence_scorenumberOverall confidence (0.0 to 1.0)
field_confidencearrayPer-field confidence with evidence
issuesarrayWarnings or anomalies detected

Field confidence object

Each entry in field_confidence contains:
PropertyTypeDescription
fieldstringJSON path (e.g., $.invoice_number)
scorenumberConfidence score (0.0 to 1.0)
reasonstring”Exact match”, “Inferred from header”, etc.
pagenumberPage number where found
textstringSource text evidence

Metadata

Processing information:
PropertyTypeDescription
processing_time_msintegerHow long the extraction took
creditsintegerCredits consumed (~1 per page)
fallback_triggeredbooleanWhether Tier 2 model was used

Verification (optional)

When enable_verification is set to true, the response includes math verification:
PropertyTypeDescription
statusstringPASSED, FAILED, PARTIAL, CANNOT_VERIFY, or NO_RULES
checks_passedintegerNumber of checks that passed
checks_failedintegerNumber of checks that failed
checks_runarrayDetails of each verification check

Supported formats

FormatExtensionProcessing
PDF.pdfNative multimodal AI (can “see” the document)
Microsoft Word.docxConverted to Markdown
Maximum file size: 10 MB

Error handling

import { Parsefy, APIError, ValidationError, ParsefyError } from 'parsefy';

try {
  const { object, error, metadata } = await client.extract({
    file: './document.pdf',
    schema,
  });

  // Extraction-level error (request succeeded, but extraction failed)
  if (error) {
    console.error(`Extraction failed: [${error.code}] ${error.message}`);
    console.log(`Fallback triggered: ${metadata.fallbackTriggered}`);
    console.log(`Issues: ${metadata.issues.join(', ')}`);
    return;
  }

  // Check for low confidence fields
  const lowConfidence = metadata.fieldConfidence.filter((fc) => fc.score < 0.80);
  if (lowConfidence.length > 0) {
    console.warn('Low confidence fields:', lowConfidence);
  }

  console.log('Success:', object);
} catch (err) {
  // HTTP/Network errors
  if (err instanceof APIError) {
    console.error(`API Error ${err.statusCode}: ${err.message}`);
  } else if (err instanceof ValidationError) {
    console.error(`Validation Error: ${err.message}`);
  } else if (err instanceof ParsefyError) {
    console.error(`Parsefy Error: ${err.message}`);
  }
}

Next steps