What is a Schema?
A schema defines the structure of data you want to extract from your documents. Parsefy uses JSON Schema to understand exactly what fields to extract, their types, and any validation rules.If you’re using our SDKs, you can define schemas using Pydantic models (Python) or Zod schemas (TypeScript) instead of raw JSON Schema.
Basic Structure
Every Parsefy schema is a JSON object with these key properties:Schema Properties
| Property | Required | Description |
|---|---|---|
type | Yes | Always "object" for the root schema |
properties | Yes | Object containing field definitions |
required | No | Array of required field names |
Field Types
Parsefy supports all standard JSON Schema types:String
String
Number
Number
Boolean
Boolean
Array
Array
Nested Object
Nested Object
Complete Example
Here’s a comprehensive invoice extraction schema:Best Practices
Use Descriptions
Always add
description fields. They help the AI understand what to look for and where.Be Specific
“Invoice date in YYYY-MM-DD format” is better than just “date”.
Mark Required Fields
Use the
required array to indicate must-have fields.Use Appropriate Types
Use
number for amounts, integer for counts, string for text.Do’s and Don’ts
- ✅ Do
- ❌ Don't
The _meta Field
Parsefy automatically injects a _meta field into every extraction response:
You don’t need to include
_meta in your schema—it’s added automatically.