Filtering documents by metadata - LightOn Developers

This tutorial uses GET /api/v3/files. The full schema for every filter parameter lives in the API reference.

Your files are classified and enriched with metadata. Now your app needs to query them: not by what the text says, but by what they are and what their metadata contains. Imagine you’ve uploaded 200 contracts and classified them with facets. Your app needs to answer: “Show all unsigned French NDAs with Acme.” Here’s how to build that query, step by step.

Filter by content type

The content_type parameter filters files by their classification path.

Exact match + subtree

A plain path matches the exact node and all its descendants:

# All NDAs
response = requests.get(
    f"{base_url}/api/v3/files",
    headers=headers,
    params={"content_type": "contract:nda"},
)

# All contracts (NDAs, service agreements, fixed-price, any node in the tree)
response = requests.get(
    f"{base_url}/api/v3/files",
    headers=headers,
    params={"content_type": "contract"},
)

OR across types

Comma-separate multiple paths for an OR query:

# NDAs or service agreements
response = requests.get(
    f"{base_url}/api/v3/files",
    headers=headers,
    params={"content_type": "contract:nda,contract:service-agreement"},
)

Wildcard: contains

Wrap in * to match any path containing the substring:

params={"content_type": "*nda*"}

Wildcard: prefix

Append * for a prefix match:

# contract:service-agreement, contract:service-agreement:fixed-price, etc.
params={"content_type": "contract:service*"}

Filter by attribute value

The attribute parameter filters by attribute name and optionally by value and operator.

Has any value

# Files that have any counterparty value set
params={"attribute": "counterparty"}

Exact match

params={"attribute": "counterparty:Acme Corp"}

OR across values

Use | (pipe) as the OR delimiter. It’s the recommended separator and avoids ambiguity with commas in multi-key values:

# Files where jurisdiction is FR or DE
params={"attribute": "jurisdiction:FR|DE"}

Comparison operators

Use >, <, >=, <= for numbers and dates:

params={"attribute": "contract_value:>50000"}
params={"attribute": "contract_value:>=100000"}
params={"attribute": "effective_date:>2024-01-01"}
params={"attribute": "effective_date:<=2025-12-31"}

Prefix match

# Matches "Acme Corp", "Acme Ltd", etc.
params={"attribute": "counterparty:Acme*"}

Boolean

params={"attribute": "signed:true"}
params={"attribute": "is_mutual:false"}

Date shortcuts

LightOn expands partial dates into ranges automatically:

params={"attribute": "effective_date:2025"}         # any date in 2025 (Jan 1 – Dec 31)
params={"attribute": "effective_date:2025-03"}       # any date in March 2025 (1st – 31st)
params={"attribute": "effective_date:2025-03-01"}    # exact date

For details on date ambiguity, multi-select containment, and which types are filterable, see Rules & constraints.

Combining filters (AND logic)

When you repeat attribute=, each additional filter narrows the results (AND logic). For example, to find files where jurisdiction is FR and signed is true:

params={"attribute": ["jurisdiction:FR", "signed:true"]}

Scoping an attribute to a specific content type

When a file has multiple classifications, the same attribute name may exist under different paths. Use the explicit scope syntax to be precise. When you include a content_type filter alongside attribute filters, attributes are automatically scoped to that content type. You don’t need the explicit scope syntax unless you’re filtering across multiple content types:

# counterparty on contract:nda specifically, not on any other classification
params={"attribute": "content_type(contract:nda).counterparty:Acme Corp"}

Putting it all together

Here’s the full query from our scenario: all signed French NDAs.

filter_files.py

import os
import requests

headers = {"Authorization": f"Bearer {os.environ['LIGHTON_API_KEY']}"}

response = requests.get(
    "https://api.lighton.ai/api/v3/files",
    headers=headers,
    params={
        "content_type": "contract:nda",
        "attribute": ["jurisdiction:FR", "signed:true"],
    },
)
print(response.json())

The response is a paginated list of file objects. If no files match, you get an empty results array with a 200 status code.

Full operator reference

Syntax	Operator	Works on	Example
`name:value`	Equals	all types	`counterparty:Acme Corp`
`name:a\|b`	OR	all types	`jurisdiction:FR\|DE`
`name:>value`	Greater than	number, date	`contract_value:>50000`
`name:<value`	Less than	number, date	`contract_value:<1000`
`name:>=value`	Greater than or equal	number, date	`effective_date:>=2024-01-01`
`name:<=value`	Less than or equal	number, date	`effective_date:<=2025-12-31`
`name:prefix*`	Starts with	text	`counterparty:Acme*`
`name`	Has any value	all types	`counterparty`

Repeated ?attribute= → AND. Use | (pipe) for OR within a value. Comma-separated ?content_type=a,b → OR across content types. content_type and attribute are supported directly in the request body of both POST /api/v3/search and POST /api/v3/ask. No need to pre-filter files: just add facet fields to your query and LightOn narrows the corpus for you.

Search by content type

Scope a search to a specific document type. LightOn returns ranked chunks from matching files only.

search_by_content_type.py

import os
import requests

headers = {"Authorization": f"Bearer {os.environ['LIGHTON_API_KEY']}"}

# Find relevant passages across all NDAs
response = requests.post(
    "https://api.lighton.ai/api/v3/search",
    headers=headers,
    json={
        "query": "termination clause with 30-day notice",
        "content_type": ["contract:nda"],
    },
)
print(response.json())

Search by attributes

Scope a search by attribute values, without specifying a content type.

search_by_attributes.py

import os
import requests

headers = {"Authorization": f"Bearer {os.environ['LIGHTON_API_KEY']}"}

# Find passages only in unsigned French contracts
response = requests.post(
    "https://api.lighton.ai/api/v3/search",
    headers=headers,
    json={
        "query": "liability cap and indemnification",
        "attribute": ["jurisdiction:FR", "signed:false"],
    },
)
print(response.json())

Search by content type and attributes

Combine both for precise scoping: only unsigned French NDAs.

search_by_content_type_and_attributes.py

import os
import requests

headers = {"Authorization": f"Bearer {os.environ['LIGHTON_API_KEY']}"}

# Find passages in unsigned French NDAs specifically
response = requests.post(
    "https://api.lighton.ai/api/v3/search",
    headers=headers,
    json={
        "query": "termination clause with 30-day notice",
        "content_type": ["contract:nda"],
        "attribute": ["jurisdiction:FR", "signed:false"],
    },
)
print(response.json())

Ask: get an LLM answer from facet-filtered documents

POST /api/v3/ask works the same way. LightOn retrieves relevant passages from matching documents, then generates a grounded answer.

ask_by_content_type_and_attributes.py

import os
import requests

headers = {"Authorization": f"Bearer {os.environ['LIGHTON_API_KEY']}"}

# Ask a question scoped to unsigned French NDAs
response = requests.post(
    "https://api.lighton.ai/api/v3/ask",
    headers=headers,
    json={
        "query": "What are the termination conditions across these NDAs?",
        "content_type": ["contract:nda"],
        "attribute": ["jurisdiction:FR", "signed:false"],
    },
)
print(response.json())

BM25 enrichment

Facets also improve search quality automatically. When you set attribute values, LightOn appends the content type labels and attribute values to the file’s BM25 lexical index. A query like “unsigned NDAs with Acme” benefits from the enriched index even if the document body doesn’t mention “unsigned” or “Acme” explicitly. This reindexing happens after every write.

​Filter by content type

​Exact match + subtree

​OR across types

​Wildcard: contains

​Wildcard: prefix

​Filter by attribute value

​Has any value

​Exact match

​OR across values

​Comparison operators

​Prefix match

​Boolean

​Date shortcuts

​Combining filters (AND logic)

​Scoping an attribute to a specific content type

​Putting it all together

​Full operator reference

​Combining facets with semantic search

​Search by content type

​Search by attributes

​Search by content type and attributes

​Ask: get an LLM answer from facet-filtered documents

​BM25 enrichment

Filter by content type

Exact match + subtree

OR across types

Wildcard: contains

Wildcard: prefix

Filter by attribute value

Has any value

Exact match

OR across values

Comparison operators

Prefix match

Boolean

Date shortcuts

Combining filters (AND logic)

Scoping an attribute to a specific content type

Putting it all together

Full operator reference

Combining facets with semantic search

Search by content type

Search by attributes

Search by content type and attributes

Ask: get an LLM answer from facet-filtered documents

BM25 enrichment