Skip to main content

What is a Schema?

A schema defines the structure of data you want to extract from your documents. Parsefy uses JSON Schema to understand exactly what fields to extract, their types, and any validation rules.
If you’re using our SDKs, you can define schemas using Pydantic models (Python) or Zod schemas (TypeScript) instead of raw JSON Schema.

Basic Structure

Every Parsefy schema is a JSON object with these key properties:
{
  "type": "object",
  "properties": {
    "field_name": {
      "type": "string",
      "description": "What this field contains"
    }
  },
  "required": ["field_name"]
}

Schema Properties

PropertyRequiredDescription
typeYesAlways "object" for the root schema
propertiesYesObject containing field definitions
requiredNoArray of required field names
Add description to each field to help the AI understand what to extract. Field-level descriptions are much more valuable than top-level schema descriptions.

Field Types

Parsefy supports all standard JSON Schema types:
{
  "invoice_number": {
    "type": "string",
    "description": "The invoice or receipt number"
  }
}
{
  "total_amount": {
    "type": "number",
    "description": "Total amount due in dollars"
  }
}
For integers only:
{
  "quantity": {
    "type": "integer",
    "description": "Number of items"
  }
}
{
  "is_paid": {
    "type": "boolean",
    "description": "Whether the invoice has been paid"
  }
}
{
  "line_items": {
    "type": "array",
    "description": "List of items on the invoice",
    "items": {
      "type": "object",
      "properties": {
        "description": {"type": "string"},
        "quantity": {"type": "integer"},
        "price": {"type": "number"}
      }
    }
  }
}
{
  "vendor": {
    "type": "object",
    "description": "Vendor information",
    "properties": {
      "name": {"type": "string"},
      "address": {"type": "string"},
      "phone": {"type": "string"}
    }
  }
}

Complete Example

Here’s a comprehensive invoice extraction schema:
{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "The invoice or receipt number"
    },
    "date": {
      "type": "string",
      "description": "Invoice date (preserve original format)"
    },
    "vendor": {
      "type": "object",
      "description": "Vendor/seller information",
      "properties": {
        "name": {
          "type": "string",
          "description": "Company name"
        },
        "address": {
          "type": "string",
          "description": "Full address"
        },
        "phone": {
          "type": "string",
          "description": "Phone number"
        },
        "email": {
          "type": "string",
          "description": "Email address"
        }
      }
    },
    "customer": {
      "type": "object",
      "description": "Customer/buyer information",
      "properties": {
        "name": {"type": "string"},
        "address": {"type": "string"}
      }
    },
    "line_items": {
      "type": "array",
      "description": "List of purchased items",
      "items": {
        "type": "object",
        "properties": {
          "description": {"type": "string"},
          "quantity": {"type": "integer"},
          "unit_price": {"type": "number"},
          "amount": {"type": "number"}
        }
      }
    },
    "subtotal": {
      "type": "number",
      "description": "Subtotal before tax"
    },
    "tax": {
      "type": "number",
      "description": "Tax amount"
    },
    "total": {
      "type": "number",
      "description": "Total amount due"
    },
    "currency": {
      "type": "string",
      "description": "3-letter currency code (USD, EUR, etc.)"
    }
  },
  "required": ["invoice_number", "total"]
}

Best Practices

Use Descriptions

Always add description fields. They help the AI understand what to look for and where.

Be Specific

“Invoice date in YYYY-MM-DD format” is better than just “date”.

Mark Required Fields

Use the required array to indicate must-have fields.

Use Appropriate Types

Use number for amounts, integer for counts, string for text.

Do’s and Don’ts

{
  "total_amount": {
    "type": "number",
    "description": "The final total amount due, including tax"
  },
  "invoice_date": {
    "type": "string",
    "description": "The date the invoice was issued, preserve original format"
  }
}

The _meta Field

Parsefy automatically injects a _meta field into every extraction response:
{
  "invoice_number": "INV-2024-001",
  "total": 1500.00,
  "_meta": {
    "confidence_score": 0.95,
    "issues": []
  }
}
You don’t need to include _meta in your schema—it’s added automatically.

Next Steps