Organizing documents with metadata

LightOn’s Facets let you organise documents with tree-based content types and custom attributes. Classify files by type, set attribute values, and query by those fields instead of relying on full-text search alone. You build classification trees once at the company level, then classify and enrich documents as they flow in, via the API or automatically with AI.

“Give me all NDAs, signed, valid in France, where the counterparty is Acme.”

That query doesn’t rely on finding those exact words in the document body. It relies on structured metadata you’ve attached to the file: a type (nda), a flag (signed = true), a jurisdiction, a counterparty name. This works for any document type: contracts, invoices, HR policies, technical specs, whatever your app needs. Facets aren’t the only way to organise documents. Workspaces are containers that isolate a team’s or customer’s files, and tags are flat labels that group files into collections (even across workspaces) with zero schema to design. The three compose: a file lives in one workspace, can carry several tags, and can be classified with facets.

	Workspaces	Tags	Facets
What it is	A container; every file lives in exactly one	Flat, reusable labels	Typed, hierarchical metadata with a schema
A file belongs to	exactly one workspace	many tags, across workspaces	many content types, across workspaces
Best for	Isolating teams, customers, tenants	Cross-cutting collections (project, topic)	Precise structured queries
Setup cost	Create a workspace	Create a tag	Design a content-type tree
Scope a query with	`workspace_id`	`tag_id`	`content_type` / `attribute`
Access control	Yes: API keys can be scoped to a workspace with a per-key role	No: not a permission boundary	No: not a permission boundary

Only workspaces are a permission boundary: API keys can be scoped to specific workspaces, so segment data that needs different permission levels with workspaces rather than facets. Reach for the simplest layer that solves your problem. The rest of this tutorial covers facets.

How it works

Facets are built on three layers:

Layer 1: Schema (define once)
  "A contract has: counterparty (text), jurisdiction (multi-select), signed (boolean)"

Layer 2: Classification (per file)
  "This file is a contract:nda"

Layer 3: Values (per file, per classification)
  "This file's counterparty is Acme Corp, jurisdiction is FR, signed is true"

You define Layer 1 once at the company level. You apply Layers 2 and 3 to individual files. The workflow maps to three API operations:

Build your classification trees: create content types and their attributes (POST /api/v3/content-types), or adopt a ready-made starter kit (legal, finance, healthcare, tech, manufacturing) via GET /api/v3/content-types/templates
Classify files & set values: assign a content type to each file and fill in its attribute values (POST /api/v3/files/{id}/facets)
Query by metadata: filter files by content type and attribute values (GET /api/v3/files)

Glossary

Content Type

A Content Type is a node in your company’s classification tree: a named, hierarchical label that describes what a document is. Content types live in a tree up to 4 levels deep. The path between levels uses : as separator. Each Content Type has:

code: a short kebab-case identifier, unique among siblings (nda, service-agreement)
label: the human-readable name ("Non-Disclosure Agreement")
description: explains what this type means, used by AI to classify documents automatically
inherit_attributes: whether documents classified here also get the parent node’s attributes

Attribute

An Attribute is a custom attribute attached to a Content Type. It describes what data a document of that type should carry.

Field	Meaning
`name`	machine name, e.g. `jurisdiction`
`label`	display name, e.g. `"Jurisdiction"`
`attribute_type`	the data type (see table below)
`required`	whether this field must have a value
`choices`	for select/multi-select: the allowed values
`description`	explains what this field means, used by AI to extract values

Available attribute types:

Type	`attribute_type`	What you store
Short text	`"text"`	Any string
Rich text	`"rich-text"`	Markdown string
Number	`"number"`	Stored as a float (e.g. `50000`, `"1.5"`)
Date	`"date"`	ISO 8601 string: `"2025-01-31"`
Boolean	`"boolean"`	`true` or `false`
Single option	`"select"`	One string from a fixed list (`choices`)
Multiple options	`"multi-select"`	Array of strings from a fixed list (`choices`)

Classification

A Classification is the link between a specific file and a Content Type path. It answers the question: “What type is this document?” One file can have multiple classifications. A single document might be both a contract:nda and a contract:data-processing-agreement at the same time.

Attribute Value

An Attribute Value is the actual data stored for a specific attribute on a specific classification of a specific file. When a file is classified as contract:nda and you set counterparty = "Acme Corp", you’ve created an Attribute Value: {file: #1234, path: "contract:nda", name: "counterparty", value: "Acme Corp"}. Values are scoped by content type path. If a file has two classifications, each classification has its own set of values. They don’t mix.

Inherited Attributes

When a Content Type node has inherit_attributes: true, files classified at any descendant node automatically have access to the attributes defined on that ancestor.

contract              ← defines: counterparty (text), jurisdiction (multi-select)
  contract:nda        ← inherit_attributes: true → also exposes: counterparty, jurisdiction

If contract:nda also defines its own attribute is_mutual (boolean), then a file classified as contract:nda sees all three fields: counterparty and jurisdiction (inherited from contract) plus is_mutual (defined directly on contract:nda).

Why it matters

Without Facets	With Facets
Find NDAs → search “NDA” in file names	Filter `?content_type=contract:nda`: precise, instant
Find French contracts → full-text search for “France”	Filter `?attribute=jurisdiction:FR`: structured, no false positives
Find unsigned contracts → manual review	Filter `?attribute=signed:false`
Build a dashboard of contracts by type	Group by `content_type`: structured counts, no parsing
AI answers “what NDAs do we have with Acme?”	BM25 index enriched with metadata: finds the right documents

Facets also enrich the BM25 lexical search index automatically. When you set attribute values on a file, LightOn reindexes that file’s text to include the content type labels and attribute values. Semantic search queries benefit from this enrichment without any extra work on your side.

Limits at a glance

Area	Constraint
Tree depth	4 levels max
Code format	Lowercase alphanumeric + hyphens (`kebab-case`)
Path length	768 characters max
Attribute types	7: `text`, `number`, `date`, `boolean`, `select`, `multi-select`, `rich-text`
Reserved names	Some attribute names are reserved by the system and rejected on creation
Classifications per tree	1 per file per tree (multiple trees allowed)
Type immutability	`attribute_type` cannot be changed after creation
Filterable types	`text`, `number`, `date`, `boolean`, `select`. `multi-select` and `rich-text` are not filterable
Scope	Content types are company-scoped, not workspace-scoped

For the full list, see the Rules & constraints reference.

Next steps

Building a classification tree

Create content types and custom attributes

Classifying files & setting metadata

Classify files and set attribute values

Filtering documents by metadata

Query by content type and attribute value

​Workspaces, tags, or facets?

​How it works

​Glossary

​Content Type

​Attribute

​Classification

​Attribute Value

​Inherited Attributes

​Why it matters

​Limits at a glance

​Next steps