Skip to main content

The _meta Field

Every Parsefy extraction automatically includes a _meta field with quality metrics:
{
  "invoice_number": "INV-2024-0042",
  "total": 1250.00,
  "vendor": "Acme Corp",
  "_meta": {
    "confidence_score": 0.95,
    "issues": [
      "Date format ambiguous: could be DD/MM/YYYY or MM/DD/YYYY"
    ]
  }
}
You don’t need to define _meta in your schema—it’s injected automatically.

Confidence Score

The confidence_score is a floating-point number from 0.0 to 1.0 that represents the AI’s certainty in the extraction quality.

Score Interpretation

ScoreLevelMeaningRecommended Action
1.0PerfectAll fields found with complete certaintyUse directly
0.95 - 0.99Very HighMinor uncertainties, excellent extractionUse directly
0.90 - 0.94HighOne or two slightly ambiguous fieldsReview if critical
0.85 - 0.89ModerateSome unclear fieldsManual review recommended
0.70 - 0.84LowMultiple issues detectedRequires verification
< 0.70Very LowSignificant problemsResults may be unreliable

What Affects Confidence?

Document Quality

Blurry scans, low resolution, or damaged documents reduce confidence.

Field Ambiguity

Multiple possible values for a field (e.g., multiple dates) lower confidence.

Missing Data

Required fields that couldn’t be found reduce the score.

Complex Layouts

Unusual document structures may introduce uncertainty.

The Issues Array

The issues array contains human-readable descriptions of any problems encountered during extraction:
{
  "_meta": {
    "confidence_score": 0.82,
    "issues": [
      "Date format ambiguous: could be DD/MM/YYYY or MM/DD/YYYY",
      "Total amount unclear - multiple totals found",
      "Vendor name partially obscured"
    ]
  }
}

Common Issue Types

"Date format ambiguous: could be DD/MM/YYYY or MM/DD/YYYY"
The document contains a date like “01/02/2024” which could be interpreted differently.
"Total amount unclear - multiple totals found"
The document contains several values that could match the requested field.
"Field 'tax_amount' not found in document"
A field in your schema couldn’t be located in the document.
"Text partially obscured in table section"
Part of the document was difficult to read.
"Currency inferred from context as USD"
A value was derived rather than explicitly found.

Automatic Fallback

Parsefy uses a two-tier model architecture for reliability:
1

Tier 1 Extraction

Your document is first processed by a fast, efficient model.
2

Confidence Check

If confidence_score < 0.85, the extraction is automatically re-run.
3

Tier 2 Fallback

A more powerful model processes the document for improved accuracy.
The metadata.fallback_triggered field tells you if the fallback was used:
{
  "object": { ... },
  "metadata": {
    "processing_time_ms": 4500,
    "credits": 1,
    "fallback_triggered": true
  }
}

Using Confidence in Your Application

Basic Threshold Check

result = client.extract(file="document.pdf", schema=Invoice)

if result.error is None:
    confidence = result.data._meta.confidence_score
    
    if confidence >= 0.95:
        # High confidence - auto-process
        process_invoice(result.data)
    elif confidence >= 0.85:
        # Medium confidence - process with logging
        log_for_review(result.data)
        process_invoice(result.data)
    else:
        # Low confidence - queue for manual review
        queue_manual_review(result.data)

Checking Specific Issues

result = client.extract(file="document.pdf", schema=Invoice)

if result.error is None:
    meta = result.data._meta
    
    # Check for critical issues
    date_issues = [i for i in meta.issues if "date" in i.lower()]
    if date_issues:
        # Flag for date verification
        flag_date_review(result.data, date_issues)

TypeScript Example

const { object, metadata } = await client.extract({
  file: './invoice.pdf',
  schema: invoiceSchema,
});

if (object) {
  const { confidence_score, issues } = object._meta;
  
  if (confidence_score >= 0.95) {
    await processInvoice(object);
  } else {
    await queueForReview(object, issues);
  }
}

Best Practices

Set Appropriate Thresholds

Different use cases need different confidence levels. Financial data may require 0.95+, while general categorization might accept 0.80+.

Log Issues

Always log the issues array for debugging and improving your schemas over time.

Handle Low Confidence

Build workflows that route low-confidence extractions to human review.

Use Rules

Add extraction rules to improve accuracy for problematic fields.

Confidence vs. Correctness

A high confidence score indicates the AI’s certainty, not guaranteed correctness. For critical applications, always implement validation and human review processes.
The confidence score is based on:
  • How clearly the AI found each field
  • Whether values match expected patterns
  • Document quality and readability
  • Presence/absence of ambiguities
It does not verify:
  • Mathematical correctness (e.g., line items summing to total)
  • Business logic validation
  • Cross-field consistency
For production use, combine Parsefy with your own validation:
result = client.extract(file="invoice.pdf", schema=Invoice)

if result.error is None and result.data:
    # AI confidence check
    if result.data._meta.confidence_score < 0.85:
        raise LowConfidenceError()
    
    # Business logic validation
    calculated_total = sum(item.amount for item in result.data.line_items)
    if abs(calculated_total - result.data.total) > 0.01:
        raise ValidationError("Line items don't sum to total")

Next Steps