Skip to main content
LightOn’s Facets let you organise documents with tree-based content types and custom attributes. Classify files by type, set attribute values, and query by those fields instead of relying on full-text search alone. You build classification trees once at the company level, then classify and enrich documents as they flow in, via the API or automatically with AI.
“Give me all NDAs, signed, valid in France, where the counterparty is Acme.”
That query doesn’t rely on finding those exact words in the document body. It relies on structured metadata you’ve attached to the file: a type (nda), a flag (signed = true), a jurisdiction, a counterparty name. This works for any document type: contracts, invoices, HR policies, technical specs, whatever your app needs.

Workspaces, tags, or facets?

Facets aren’t the only way to organise documents. Workspaces are containers that isolate a team’s or customer’s files, and tags are flat labels that group files into collections (even across workspaces) with zero schema to design. The three compose: a file lives in one workspace, can carry several tags, and can be classified with facets.
WorkspacesTagsFacets
What it isA container; every file lives in exactly oneFlat, reusable labelsTyped, hierarchical metadata with a schema
A file belongs toexactly one workspacemany tags, across workspacesmany content types, across workspaces
Best forIsolating teams, customers, tenantsCross-cutting collections (project, topic)Precise structured queries
Setup costCreate a workspaceCreate a tagDesign a content-type tree
Scope a query withworkspace_idtag_idcontent_type / attribute
Access controlYes: API keys can be scoped to a workspace with a per-key roleNo: not a permission boundaryNo: not a permission boundary
Only workspaces are a permission boundary: API keys can be scoped to specific workspaces, so segment data that needs different permission levels with workspaces rather than facets. Reach for the simplest layer that solves your problem. The rest of this tutorial covers facets.

How it works

Facets are built on three layers:
Layer 1: Schema (define once)
  "A contract has: counterparty (text), jurisdiction (multi-select), signed (boolean)"

Layer 2: Classification (per file)
  "This file is a contract:nda"

Layer 3: Values (per file, per classification)
  "This file's counterparty is Acme Corp, jurisdiction is FR, signed is true"
You define Layer 1 once at the company level. You apply Layers 2 and 3 to individual files. The workflow maps to three API operations:
  1. Build your classification trees: create content types and their attributes (POST /api/v3/content-types), or adopt a ready-made starter kit (legal, finance, healthcare, tech, manufacturing) via GET /api/v3/content-types/templates
  2. Classify files & set values: assign a content type to each file and fill in its attribute values (POST /api/v3/files/{id}/facets)
  3. Query by metadata: filter files by content type and attribute values (GET /api/v3/files)

Glossary

Content Type

A Content Type is a node in your company’s classification tree: a named, hierarchical label that describes what a document is. Content types live in a tree up to 4 levels deep. The path between levels uses : as separator. Each Content Type has:
  • code: a short kebab-case identifier, unique among siblings (nda, service-agreement)
  • label: the human-readable name ("Non-Disclosure Agreement")
  • description: explains what this type means, used by AI to classify documents automatically
  • inherit_attributes: whether documents classified here also get the parent node’s attributes

Attribute

An Attribute is a custom attribute attached to a Content Type. It describes what data a document of that type should carry.
FieldMeaning
namemachine name, e.g. jurisdiction
labeldisplay name, e.g. "Jurisdiction"
attribute_typethe data type (see table below)
requiredwhether this field must have a value
choicesfor select/multi-select: the allowed values
descriptionexplains what this field means, used by AI to extract values
Available attribute types:
Typeattribute_typeWhat you store
Short text"text"Any string
Rich text"rich-text"Markdown string
Number"number"Stored as a float (e.g. 50000, "1.5")
Date"date"ISO 8601 string: "2025-01-31"
Boolean"boolean"true or false
Single option"select"One string from a fixed list (choices)
Multiple options"multi-select"Array of strings from a fixed list (choices)

Classification

A Classification is the link between a specific file and a Content Type path. It answers the question: “What type is this document?” One file can have multiple classifications. A single document might be both a contract:nda and a contract:data-processing-agreement at the same time.

Attribute Value

An Attribute Value is the actual data stored for a specific attribute on a specific classification of a specific file. When a file is classified as contract:nda and you set counterparty = "Acme Corp", you’ve created an Attribute Value: {file: #1234, path: "contract:nda", name: "counterparty", value: "Acme Corp"}. Values are scoped by content type path. If a file has two classifications, each classification has its own set of values. They don’t mix.

Inherited Attributes

When a Content Type node has inherit_attributes: true, files classified at any descendant node automatically have access to the attributes defined on that ancestor.
contract              ← defines: counterparty (text), jurisdiction (multi-select)
  contract:nda        ← inherit_attributes: true → also exposes: counterparty, jurisdiction
If contract:nda also defines its own attribute is_mutual (boolean), then a file classified as contract:nda sees all three fields: counterparty and jurisdiction (inherited from contract) plus is_mutual (defined directly on contract:nda).

Why it matters

Without FacetsWith Facets
Find NDAs → search “NDA” in file namesFilter ?content_type=contract:nda: precise, instant
Find French contracts → full-text search for “France”Filter ?attribute=jurisdiction:FR: structured, no false positives
Find unsigned contracts → manual reviewFilter ?attribute=signed:false
Build a dashboard of contracts by typeGroup by content_type: structured counts, no parsing
AI answers “what NDAs do we have with Acme?”BM25 index enriched with metadata: finds the right documents
Facets also enrich the BM25 lexical search index automatically. When you set attribute values on a file, LightOn reindexes that file’s text to include the content type labels and attribute values. Semantic search queries benefit from this enrichment without any extra work on your side.

Limits at a glance

AreaConstraint
Tree depth4 levels max
Code formatLowercase alphanumeric + hyphens (kebab-case)
Path length768 characters max
Attribute types7: text, number, date, boolean, select, multi-select, rich-text
Reserved namesSome attribute names are reserved by the system and rejected on creation
Classifications per tree1 per file per tree (multiple trees allowed)
Type immutabilityattribute_type cannot be changed after creation
Filterable typestext, number, date, boolean, select. multi-select and rich-text are not filterable
ScopeContent types are company-scoped, not workspace-scoped
For the full list, see the Rules & constraints reference.

Next steps

Building a classification tree

Create content types and custom attributes

Classifying files & setting metadata

Classify files and set attribute values

Filtering documents by metadata

Query by content type and attribute value