---
name: Lighton
description: Use when building document intelligence workflows: ingesting PDFs and documents into searchable indexes, retrieving relevant passages with semantic search, generating grounded answers with RAG, parsing documents to Markdown, or extracting structured data from forms and invoices.
metadata:
    mintlify-proj: lighton
    version: "1.0"
---

# LightOn API Skill

## Product summary

LightOn is a REST API for document intelligence: upload PDFs and documents, search them with hybrid semantic+lexical retrieval, ask questions over them with retrieval-augmented generation (RAG), parse them to clean Markdown, or extract structured fields using JSON schemas. All endpoints live under `https://api.lighton.ai/api/v3/` and require Bearer token authentication. Key workflows: **Files** (upload and manage documents), **Search** (find relevant passages), **Ask** (generate grounded answers), **Parse** (convert to Markdown), **Extract** (pull structured data). See the [API Reference](https://developers.lighton.ai/api-reference/introduction) for complete endpoint documentation and the [OpenAPI spec](https://api.lighton.ai/docs) for machine-readable schema.

## When to use

Reach for this skill when:
- A user wants to upload documents and make them searchable (PDFs, Word docs, images, presentations)
- Building a knowledge base or document corpus that needs semantic search
- Implementing retrieval-augmented generation (RAG) to answer questions grounded in documents
- Converting documents to clean Markdown for downstream processing
- Extracting typed fields from forms, invoices, contracts, or structured documents
- Scoping search to specific workspaces, files, or tagged collections
- Streaming answers token-by-token for chat-style UIs
- Integrating with MCP-compatible clients (Claude Desktop, Cursor, etc.)

## Quick reference

### Authentication
All requests require a Bearer token in the `Authorization` header:
```
Authorization: Bearer $LIGHTON_API_KEY
```
Create keys via the [console](https://console.lighton.ai) or programmatically with `POST /api/v3/keys`.

### Core endpoints

| Endpoint | Method | Purpose |
|---|---|---|
| `/api/v3/files` | POST | Upload a document to a workspace |
| `/api/v3/files/{id}` | GET | Check upload/indexing status |
| `/api/v3/search` | POST | Hybrid semantic+lexical search across documents |
| `/api/v3/ask` | POST | RAG: retrieve passages and generate a grounded answer |
| `/api/v3/parse` | POST | Convert document to Markdown (sync or async) |
| `/api/v3/extract` | POST | Pull typed fields using a JSON Schema (sync or async) |
| `/api/v3/files` | GET | List and filter documents with rich query support |
| `/api/v3/files/{id}` | DELETE | Delete a single document |
| `/api/v3/files/bulk-delete` | POST | Delete multiple documents |

### File upload status lifecycle

| Status | Meaning |
|---|---|
| `pending` | Queued for processing |
| `parsing` | Extracting text from document |
| `embedding` | Generating vector embeddings |
| `embedded` | Ready to search |
| `parsing_failed` / `embedding_failed` / `fail` | Ingestion failed; check `status_detail` |

### Search scoping (mutually exclusive)

- **By workspace**: `"workspace_id": [42, 43]` — search multiple workspaces
- **By file**: `"file_id": [101, 102]` — search specific files only
- **By tag**: `"tag_id": [7]` — search documents with a specific tag
- **Combine workspace + tag**: allowed; `file_id` cannot combine with either

### Parse and Extract modes

| Mode | Max size | Max pages | Behavior |
|---|---|---|---|
| Sync (default) | 20 MB | 15 pages | Returns result immediately |
| Async (`options.async: true`) | 100 MB | 1000 pages | Returns job ID; poll for result |

## Decision guidance

### When to use Search vs Ask

| Use Search | Use Ask |
|---|---|
| You want raw passages to rank, display, or process yourself | You want a direct natural-language answer |
| You're building a multi-step retrieval pipeline or agentic loop | You need a simple single-turn question-answer |
| You want to inspect individual relevance scores | You want the answer + sources in one call |
| You need custom ranking or filtering logic | You're building a chat UI with citations |

### When to use Parse vs Extract

| Use Parse | Use Extract |
|---|---|
| You want the full text content as Markdown | You want specific typed fields from a document |
| You're feeding the document to an LLM or your own pipeline | You're processing forms, invoices, or structured documents |
| You need to preserve headings, lists, tables, code blocks | You have a JSON Schema defining what to pull |
| Nothing is stored; it's a one-time conversion | Nothing is stored; it's a one-time extraction |

### When to use Files (persistent) vs Parse/Extract (ephemeral)

| Use Files | Use Parse/Extract |
|---|---|
| Document should be searchable later | Document is processed once and discarded |
| Building a knowledge base or corpus | Converting or extracting for immediate use |
| Multiple queries over the same document | Single-use document processing |
| Tagging and organizing documents | No need to store or index |

## Workflow

### 1. Set up authentication
- Create an API key from [console.lighton.ai](https://console.lighton.ai) under **API Keys**
- Export as environment variable: `export LIGHTON_API_KEY=your_key_here`
- Include in every request: `Authorization: Bearer $LIGHTON_API_KEY`

### 2. Upload documents (if building a searchable corpus)
- POST to `/api/v3/files` with `workspace_id` and the file as multipart form data
- Optionally include `title`, `tags` (JSON array of tag IDs), or `external_metadata` (JSON string)
- Response includes file `id` and `status: pending`
- Poll `GET /api/v3/files/{id}` every 2 seconds until `status == embedded`
- Check `status_detail` if status is `parsing_failed`, `embedding_failed`, or `fail`

### 3. Search or ask (for persistent documents)
- **Search**: POST to `/api/v3/search` with `query` and optional scoping (`workspace_id`, `file_id`, or `tag_id`)
- **Ask**: POST to `/api/v3/ask` with `query` and optional scoping; returns `answer` + `results` (sources)
- Both support `max_results` (1–50, default 10) and `skip_rerank: true` for lower latency
- Ask supports `stream: true` for token-by-token streaming (Server-Sent Events)

### 4. Parse or extract (for one-time document processing)
- **Parse**: POST to `/api/v3/parse` with `file` (multipart) or `document` (URL); get Markdown back
- **Extract**: POST to `/api/v3/extract` with `file`/`document` and a `schema` (JSON Schema); get structured data back
- For documents >20 MB or >15 pages, set `options.async: true` and poll the job ID
- Polling cadence: 1 s for first 10 s, then 5 s, capped at 30 s

### 5. Verify and iterate
- Check response status codes: `201` for created, `200` for success, `202` for async accepted, `4xx` for client errors, `5xx` for server errors
- Inspect `score` and `scores` breakdown in search results to debug ranking
- For Ask, review `results` to verify sources are relevant
- For Extract, check that `result.data` matches your schema and handle `null` fields

## Common gotchas

- **Forgetting to poll for file indexing**: Upload returns immediately with `status: pending`. You must poll `GET /api/v3/files/{id}` until `status == embedded` before searching.
- **Mixing scoping parameters**: `file_id` is mutually exclusive with `workspace_id` and `tag_id`. Combining them returns a `422` validation error.
- **Async jobs without polling**: Parse and Extract with `options.async: true` return `202` with a job ID, not the result. You must poll `GET /api/v3/parse/{id}` or `GET /api/v3/extract/{id}` until `status` is `completed` or `failed`.
- **Exceeding file size limits**: Sync mode caps at 20 MB / 15 pages. Async mode caps at 100 MB / 1000 pages. Larger files return `413 Payload Too Large`.
- **Extract applies per-page, not per-document**: Extract returns one result object per page. If you need a single consolidated object synthesizing data across pages, use Search to retrieve passages and generate the structured output yourself.
- **Malformed JSON Schema in Extract**: The schema must be valid JSON Schema. Unsupported features (e.g., `$ref`, `allOf`) return `422`. Keep schemas simple: `type`, `properties`, `description`.
- **Rate limits**: 6 requests/second per tenant for Extract. Search and Ask have their own limits. `429` means back off and retry.
- **Vision mode requires vision embeddings**: Search with `mode: vision` only works if documents were indexed with vision embeddings (`status_vision: embedded`). Check the file record.
- **Tags must exist before upload**: Tag IDs in the upload request must already exist. Invalid tag IDs return `207` (multi-status) with a message; the file is still created.
- **External metadata is immutable**: Once set at upload, `external_metadata` cannot be changed. Plan your external ID scheme upfront.

## Verification checklist

Before submitting work with LightOn:

- [ ] API key is set and valid (test with a simple search or ask)
- [ ] File uploads complete successfully and reach `status: embedded` before querying
- [ ] Search queries return results with non-null `score` and `content`
- [ ] Ask queries return both `answer` and `results` (sources)
- [ ] Scoping parameters are correct: no mixing `file_id` with `workspace_id`/`tag_id`
- [ ] Parse/Extract async jobs are polled until `status: completed` or `failed`
- [ ] Extract schema is valid JSON Schema and matches the document structure
- [ ] Error responses are handled: check `status` code and response body for `message` or `error`
- [ ] Rate limits are respected: back off on `429` responses
- [ ] For streaming Ask, SSE events are parsed correctly (`sources`, `token`, `done`, `error`)

## Resources

- **[LightOn llms.txt](https://developers.lighton.ai/llms.txt)** — comprehensive page-by-page navigation for all documentation
- **[API Reference](https://developers.lighton.ai/api-reference/introduction)** — complete endpoint schema, parameters, and error codes
- **[Tutorials](https://developers.lighton.ai/tutorials/index)** — step-by-step guides for Files, Search, Ask, Parse, and Extract
- **[OpenAPI Spec](https://api.lighton.ai/docs)** — machine-readable API specification

---

> For additional documentation and navigation, see: https://developers.lighton.ai/llms.txt