Skip to main content
POST
/
api
/
v3
/
extract
Extract structured data from a document
curl --request POST \
  --url https://api.lighton.ai/api/v3/extract \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=(binary)' \
  --form 'schema={
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string"
    }
  }
}'
{
  "id": "ext_0196e4b2a3c14d5e8f7a9b2c1d0e3f4a",
  "status": "completed",
  "created_at": "2026-03-31T10:00:00+00:00",
  "completed_at": "2026-03-31T10:00:04+00:00",
  "processing_time_ms": 3200,
  "document": {
    "filename": "invoice.pdf",
    "page_count": 3,
    "file_size_bytes": 245120,
    "mime_type": "application/pdf"
  },
  "result": {
    "data": [
      {
        "invoice_number": "INV-2026-001",
        "total": null,
        "line_items": null
      },
      {
        "invoice_number": null,
        "total": 1250,
        "line_items": [
          {
            "description": "Widget A",
            "quantity": 10,
            "unit_price": 50
          },
          {
            "description": "Widget B",
            "quantity": 5,
            "unit_price": 150
          }
        ]
      },
      {
        "invoice_number": null,
        "total": null,
        "line_items": null
      }
    ],
    "pagination": {
      "page": 1,
      "page_size": 15,
      "total_items": 3,
      "total_pages": 1,
      "has_next": false,
      "has_prev": false
    }
  },
  "usage": {
    "pages_processed": 3
  }
}

Documentation Index

Fetch the complete documentation index at: https://developers.lighton.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

Body for POST /api/v3/extract.

schema is the JSON Schema that drives extraction. It arrives as a dict on JSON requests and as a JSON-encoded string on multipart requests — both are coerced to dict.

options is a free-form dict; currently supports {"async": bool}.

schema
Schema · object
required
document
string | null
options
Options · object

Response

Extraction completed (sync mode).

id
string
required
status
string
required
created_at
string<date-time> | null
completed_at
string<date-time> | null
processing_time_ms
integer | null
document
ExtractDocument · object
result
ExtractResult · object
usage
ExtractUsage · object