Uploading & managing files - LightOn Developers

Before you can search, your documents need to be in LightOn. Uploading a file triggers an ingestion pipeline that parses the content, splits it into chunks, generates embeddings, and indexes everything. The whole process typically takes a few seconds for a standard PDF. Ingestion is asynchronous — the upload returns immediately with a pending status, and you poll GET /api/v3/files/{id} for completion.

This tutorial covers POST /api/v3/files and GET /api/v3/files. The full schema for every endpoint and parameter lives in the API reference.

Upload a file

Send the file as multipart/form-data with the destination workspace_id:

import requests

headers = {"Authorization": "Bearer $CONSOLE_API_KEY"}

response = requests.post(
    "https://api.lighton.ai/api/v3/files",
    headers=headers,
    data={"workspace_id": 42},
    files={"file": open("handbook.pdf", "rb")},
)

file = response.json()
print(file["id"], file["status"], file["upload_session_uuid"])
# → 12345 pending 550e8400-e29b-41d4-a716-446655440000

The response is a 201 with the new file record, including an upload_session_uuid you can use later to find every file uploaded in the same batch.

Wait for indexing to complete

Poll GET /api/v3/files/{id} until status reaches embedded:

import time

file_id = file["id"]
while True:
    r = requests.get(f"https://api.lighton.ai/api/v3/files/{file_id}", headers=headers)
    body = r.json()
    if body["status"] == "embedded":
        print("Ready to search")
        break
    if body["status"] in ("parsing_failed", "embedding_failed", "fail"):
        print("Ingestion failed:", body.get("status_detail"))
        break
    time.sleep(2)

The status field moves through these stages:

Status	What’s happening
`pending`	Queued for processing
`parsing`	Extracting text from the document
`parsing_failed`	Parsing failed — see `status_detail`
`embedding`	Generating vector embeddings
`embedding_failed`	Embedding failed — see `status_detail`
`embedded`	Indexed and ready to search
`updating`	Re-indexing in progress
`fail`	Generic failure — see `status_detail`

status_vision tracks the same lifecycle for vision/image embeddings: pending, processing, embedded, fail, or - (not available for this file).

Organising documents with tags and titles

Add a human-readable title and assign tag IDs at upload time. Tags can be sent as a JSON-encoded array string or as repeated form fields with the same name.

requests.post(
    "https://api.lighton.ai/api/v3/files",
    headers=headers,
    data={
        "workspace_id": 42,
        "title": "Q4 Financial Report",
        "tags": "[1, 2]",        # JSON-encoded list of tag IDs
    },
    files={"file": open("q4-report.pdf", "rb")},
)

If a tag ID is invalid, the file is still created but the response is a 207 (multi-status) with a message explaining which tags were rejected. To replace tags after upload, PATCH /api/v3/files/{id} with a new tags array — it replaces all existing tags, manual and auto-assigned. Send [0] (sentinel) to remove every tag when using multipart format. To add tags without touching existing ones, POST /api/v3/files/{id}/tags.

Tracking documents from external systems

If you’re ingesting documents from a third-party system (ServiceNow, Confluence, SharePoint, etc.), store the source identifier in external_metadata. This lets you find the LightOn file later given only the external ID, and surface the original URL in your UI.

import json

requests.post(
    "https://api.lighton.ai/api/v3/files",
    headers=headers,
    data={
        "workspace_id": 42,
        "external_metadata": json.dumps({
            "external_id": "SRV-456789",
            "doc_type": "incident",
            "additional_metadata": {
                "external_url": "https://servicenow.example.com/incident/SRV-456789",
            },
        }),
    },
    files={"file": open("srv-456789.pdf", "rb")},
)

external_id is required when creating; doc_type and additional_metadata are optional. When sent via multipart/form-data, the whole external_metadata value must be a JSON string. Retrieve it later by external ID:

GET /api/v3/files?external_metadata__external_id=SRV-456789

Listing and filtering your documents

GET /api/v3/files supports rich filtering. A few common patterns:

# All files in a workspace
requests.get("https://api.lighton.ai/api/v3/files", headers=headers,
             params={"workspace_id": "42"})

# Semantic search across filenames and titles, with the top chunk inline
requests.get("https://api.lighton.ai/api/v3/files", headers=headers,
             params={"search": "security policy", "search_details": True})

# PDFs tagged 'legal', most recent first
requests.get("https://api.lighton.ai/api/v3/files", headers=headers,
             params={"tag_id": "3", "extension": "pdf", "ordering": "-created_at"})

# Files in a 10–50 page window
requests.get("https://api.lighton.ai/api/v3/files", headers=headers,
             params={"total_pages_min": 10, "total_pages_max": 50})

Set include_details=true to receive the signature (TLSH hash for duplicate detection) and parser fields on each result. For schema-driven filtering, the attribute and content_type query parameters support facet filters with operators (=, >, <, * prefix, |/, for OR). See the API reference for the full DSL.

Deleting files

Single delete:

requests.delete(f"https://api.lighton.ai/api/v3/files/{file_id}", headers=headers)

Bulk delete:

requests.post(
    "https://api.lighton.ai/api/v3/files/bulk-delete",
    headers=headers,
    json={"ids": [123, 124, 125]},
)

Both return 204 No Content on success. Files in synced (datasource-managed) workspaces cannot be deleted manually — the API returns 400.

Common errors

Status	Cause
`400`	Validation error, unsupported file type, or synced-workspace constraint
`401`	Missing or invalid API key
`403`	Permission denied (no upload/delete rights)
`404`	File does not exist or is not accessible
`429`	Too many concurrent uploads for this session

Tutorials

Documentation Index

​Upload a file

​Wait for indexing to complete

​Organising documents with tags and titles

​Tracking documents from external systems

​Listing and filtering your documents

​Deleting files

​Common errors