Confidence Scores

Overview

Parsefy provides field-level confidence scoring with evidence tracking. Every extracted field comes with:

A confidence score (0.0 to 1.0)
The source text evidence
The page number where it was found
A reason explaining the score

Our goal: 0% silent errors. If a required field can’t be extracted with sufficient confidence, the API triggers a fallback model or fails with clear reasons; never returns unreliable data silently.

The `_meta` Structure

Every extraction includes detailed metadata:

{
  "invoice_number": "INV-2024-0042",
  "date": "2024-01-15",
  "total": 1250.00,
  "vendor": "Acme Corp",
  "_meta": {
    "confidence_score": 0.94,
    "field_confidence": [
      { "field": "$.invoice_number", "score": 0.98, "reason": "Exact match", "page": 1, "text": "Invoice # INV-2024-0042" },
      { "field": "$.date", "score": 0.95, "reason": "Exact match", "page": 1, "text": "Date: 01/15/2024" },
      { "field": "$.total", "score": 0.92, "reason": "Formatting ambiguous", "page": 1, "text": "Total: $1,250.00" },
      { "field": "$.vendor", "score": 0.90, "reason": "Inferred from header", "page": 1, "text": "Acme Corp" }
    ],
    "issues": []
  }
}

You don’t need to define _meta in your schema; it’s injected automatically.

Field Confidence Object

Each entry in field_confidence contains:

Property	Type	Description
`field`	string	JSON path to the field (e.g., `$.invoice_number`)
`score`	number	Confidence score (0.0 to 1.0)
`reason`	string	Explanation: “Exact match”, “Inferred from header”, etc.
`page`	integer	Page number where the value was found
`text`	string	Source text evidence from the document

Common Reasons

Reason	Meaning
`Exact match`	Field value was found exactly as expected
`Inferred from header`	Value was derived from document header/context
`Formatting ambiguous`	Value found but format was unclear
`Multiple values found`	Several possible values were detected
`Partially obscured`	Some text was difficult to read

Confidence Threshold

Control when the fallback model is triggered using confidence_threshold:

curl -X POST "https://api.parsefy.io/v1/extract" \
  -H "Authorization: Bearer pk_your_api_key" \
  -F "file=@invoice.pdf" \
  -F 'output_schema={
    "type": "object",
    "properties": {
      "invoice_number": {"type": "string"},
      "date": {"type": "string"},
      "total": {"type": "number"},
      "vendor": {"type": "string"}
    },
    "required": ["invoice_number", "total"]
  }' \
  -F "confidence_threshold=0.85"

Default: 0.85

Threshold	Behavior	Use Case
Lower (e.g., 0.70)	Faster: Accepts Tier 1 results more often	High-volume, less critical
Higher (e.g., 0.95)	More accurate: Triggers Tier 2 fallback more often	Financial reconciliation

Lower confidence_threshold = faster and cheaper (accepts Tier 1 more often). Higher confidence_threshold = more accurate but more expensive (triggers Tier 2 fallback more often).

Automatic Fallback

Parsefy uses a two-tier model architecture for reliability:

Tier 1 Extraction

Your document is first processed by a fast, efficient model.

Confidence Check

If any required field returns null or falls below confidence_threshold, the extraction is automatically re-run.

Tier 2 Fallback

A more powerful (and more expensive) model processes the document for improved accuracy.

Important: If a required field can’t be extracted with sufficient confidence, it triggers the fallback model. This is critical for billing. See the section on Required vs Optional Fields.

The metadata.fallback_triggered field tells you if the fallback was used:

{
  "object": { ... },
  "metadata": {
    "processing_time_ms": 4500,
    "credits": 2,
    "fallback_triggered": true
  },
  "verification": { ... }
}

Required vs Optional Fields (Critical for Billing)

All fields are required by default in both SDKs. This is intentional for data safety, but it significantly impacts your costs.

Why This Matters

If a required field returns null or falls below the confidence_threshold, the API triggers the fallback model (Tier 2), which is significantly more expensive.

User writes (SDK)	SDK converts to (JSON Schema)	API interprets as
`name: z.string()`	`required: ["name"]`	Required: triggers fallback if low confidence
`name: z.string().optional()`	`required: []`	Optional: won’t trigger fallback

To Avoid Unexpected High Billing

Mark fields as optional if they might be missing in >20% of your documents:

const schema = z.object({
  // REQUIRED - Always present on invoices, keep required
  invoice_number: z.string().describe('The invoice number'),
  total: z.number().describe('Total amount including tax'),

  // OPTIONAL - May not appear on all documents, mark optional!
  vendor: z.string().optional().describe('Vendor name'),       // Not all invoices have vendor name
  tax_id: z.string().optional().describe('Tax ID number'),     // Rarely present
  notes: z.string().optional().describe('Additional notes'),   // Usually empty
  due_date: z.string().optional().describe('Payment due date'),// Sometimes missing
});

Rule of thumb: If a field might be missing in >20% of your documents, mark it as optional.

Score Interpretation

Score	Level	Meaning	Recommended Action
0.95 - 1.0	Very High	All fields found with high certainty	Use directly
0.90 - 0.94	High	Minor uncertainties, excellent extraction	Use directly
0.85 - 0.89	Moderate	Some unclear fields	Review if critical
0.70 - 0.84	Low	Multiple issues detected	Manual review recommended
< 0.70	Very Low	Significant problems	Results may be unreliable

The Issues Array

The issues array contains human-readable descriptions of any problems encountered:

{
  "_meta": {
    "confidence_score": 0.82,
    "field_confidence": [...],
    "issues": [
      "Date format ambiguous: could be DD/MM/YYYY or MM/DD/YYYY",
      "Total amount unclear - multiple totals found",
      "Vendor name partially obscured"
    ]
  }
}

Using Confidence in Your Application

TypeScript Example

const { object, metadata, verification, error } = await client.extract({
  file: './invoice.pdf',
  schema,
  confidenceThreshold: 0.85,
  enableVerification: true, // Enable math verification
});

if (!error && object) {
  // Overall confidence from _meta
  console.log(`Overall confidence: ${object._meta.confidence_score}`);

  // Check individual field confidence
  object._meta.field_confidence.forEach((fc) => {
    console.log(`${fc.field}: ${fc.score} (${fc.reason}) - "${fc.text}"`);
    
    if (fc.score < 0.80) {
      console.warn(`Low confidence on ${fc.field}`);
    }
  });

  // Check for issues
  if (object._meta.issues.length > 0) {
    console.warn('Issues detected:', object._meta.issues);
  }

  // Check verification results
  if (verification) {
    console.log(`Verification: ${verification.status}`);
    verification.checks_run.forEach((check) => {
      console.log(`${check.type}: ${check.passed ? 'PASSED' : 'FAILED'}`);
    });
  }
}

Python Example

result = client.extract(
    file="document.pdf",
    schema=Invoice,
    confidence_threshold=0.85,
    enable_verification=True  # Enable math verification
)

if result.error is None:
    # Overall confidence from meta
    if result.meta:
        print(f"Overall confidence: {result.meta.confidence_score}")
        
        # Check individual field confidence
        for fc in result.meta.field_confidence:
            print(f"{fc.field}: {fc.score} ({fc.reason}) - '{fc.text}'")
            
            if fc.score < 0.80:
                print(f"  Low confidence on {fc.field}")
        
        # Check for issues
        if result.meta.issues:
            print("Issues:", result.meta.issues)
    
    # Check verification results
    if result.verification:
        print(f"Verification: {result.verification.status}")
        for check in result.verification.checks_run:
            print(f"{check.type}: {'PASSED' if check.passed else 'FAILED'}")

Best Practices

Mark Optional Fields

Any field that might be missing in >20% of documents should be optional to avoid unnecessary fallback triggers.

Set Appropriate Thresholds

Financial reconciliation may need 0.95+, while categorization might accept 0.80+.

Log Field Evidence

Store field_confidence for audit trails and debugging extraction issues.

Handle Low Confidence

Build workflows that route low-confidence extractions to human review.

Next Steps

Schema Basics

Learn how to define schemas with required vs optional fields

Error Handling

Handle extraction errors gracefully

Documentation

Quickstart

Learn

Resources

Confidence Scores

Overview

The `_meta` Structure

Field Confidence Object

Common Reasons

Confidence Threshold

Automatic Fallback

Required vs Optional Fields (Critical for Billing)

Why This Matters

To Avoid Unexpected High Billing

Score Interpretation

The Issues Array

Using Confidence in Your Application

TypeScript Example

Python Example

Best Practices

Mark Optional Fields

Set Appropriate Thresholds

Log Field Evidence

Handle Low Confidence

Next Steps

Schema Basics

Error Handling

Documentation

Quickstart

Learn

Resources

​Overview

​The _meta Structure

​Field Confidence Object

​Common Reasons

​Confidence Threshold

​Automatic Fallback

​Required vs Optional Fields (Critical for Billing)

​Why This Matters

​To Avoid Unexpected High Billing

​Score Interpretation

​The Issues Array

​Using Confidence in Your Application

​TypeScript Example

​Python Example

​Best Practices

Mark Optional Fields

Set Appropriate Thresholds

Log Field Evidence

Handle Low Confidence

​Next Steps

Schema Basics

Error Handling

Overview

The `_meta` Structure

Field Confidence Object

Common Reasons

Confidence Threshold

Automatic Fallback

Required vs Optional Fields (Critical for Billing)

Why This Matters

To Avoid Unexpected High Billing

Score Interpretation

The Issues Array

Using Confidence in Your Application

TypeScript Example

Python Example

Best Practices

Next Steps