Lyceum Serverless Inference - LightOn Developers

Lyceum serves open models through an OpenAI-compatible serverless endpoint. Pair them with LightOn search to build RAG pipelines where the retrieval stays on LightOn’s infrastructure and the generation runs on Lyceum. The flow is:

Search LightOn for the passages most relevant to the user’s question.
Pack those passages into the model’s context window.
Call the Lyceum model to generate an answer grounded in the retrieved content.

Prerequisites

A LIGHTON_API_KEY, available in the Console → API Keys section.
A Lyceum API key. Store it as LYCEUM_API_KEY.
At least one workspace with indexed documents on LightOn.

Installation

pip install requests openai

The openai package is used here only for its client; Lyceum’s endpoint is fully compatible with it.

Full example

import os
import requests
from openai import OpenAI

LIGHTON_API_KEY = os.environ["LIGHTON_API_KEY"]
LYCEUM_API_KEY = os.environ["LYCEUM_API_KEY"]

lyceum = OpenAI(
    base_url="https://api.lyceum.technology/api/v2/external/serverless",
    api_key=LYCEUM_API_KEY,
)


def search(query: str, workspace_id: list[int] | None = None, max_results: int = 5) -> list[dict]:
    payload = {"query": query, "max_results": max_results}
    if workspace_id:
        payload["workspace_id"] = workspace_id

    response = requests.post(
        "https://api.lighton.ai/api/v3/search",
        headers={"Authorization": f"Bearer {LIGHTON_API_KEY}"},
        json=payload,
    )
    response.raise_for_status()
    return response.json()["results"]


def answer(question: str, workspace_id: list[int] | None = None, model: str = "openbmb/MiniCPM-V-4_5") -> str:
    results = search(question, workspace_id=workspace_id)

    context = "\n\n".join(
        f"[{r['source']['filename']}, p.{r['source']['page_start']}]\n{r['content']}"
        for r in results
        if r["content"]
    )

    completion = lyceum.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant. Answer the user's question using only "
                    "the provided context. If the context does not contain enough information, "
                    "say so.\n\nContext:\n" + context
                ),
            },
            {"role": "user", "content": question},
        ],
    )
    return completion.choices[0].message.content


print(answer("What is our data retention policy?"))

LightOn search as a tool

Instead of always searching before calling the model, you can expose LightOn search as a tool and let the model decide when to call it. The model issues a lighton_search tool call when it needs context; your code executes the search and feeds the results back; the model then produces a final answer.

import json
import os
import requests
from openai import OpenAI

LIGHTON_API_KEY = os.environ["LIGHTON_API_KEY"]
LYCEUM_API_KEY = os.environ["LYCEUM_API_KEY"]

lyceum = OpenAI(
    base_url="https://api.lyceum.technology/api/v2/external/serverless",
    api_key=LYCEUM_API_KEY,
)

SEARCH_TOOL = {
    "type": "function",
    "function": {
        "name": "lighton_search",
        "description": (
            "Search the company knowledge base for passages relevant to a query. "
            "Returns ranked excerpts with their source filename and page numbers."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Natural-language search query.",
                },
                "max_results": {
                    "type": "integer",
                    "description": "Number of passages to return (1–50, default 5).",
                    "default": 5,
                },
            },
            "required": ["query"],
        },
    },
}


def run_search(query: str, max_results: int = 5) -> str:
    response = requests.post(
        "https://api.lighton.ai/api/v3/search",
        headers={"Authorization": f"Bearer {LIGHTON_API_KEY}"},
        json={"query": query, "max_results": max_results},
    )
    response.raise_for_status()
    results = response.json()["results"]
    passages = [
        f"[{r['source']['filename']}, p.{r['source']['page_start']}]\n{r['content']}"
        for r in results
        if r["content"]
    ]
    return "\n\n".join(passages) if passages else "No results found."


def answer(question: str, model: str = "openbmb/MiniCPM-V-4_5") -> str:
    messages = [{"role": "user", "content": question}]

    while True:
        completion = lyceum.chat.completions.create(
            model=model,
            tools=[SEARCH_TOOL],
            messages=messages,
        )
        choice = completion.choices[0]

        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)
            for call in choice.message.tool_calls:
                args = json.loads(call.function.arguments)
                result = run_search(**args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": result,
                })
        else:
            return choice.message.content


print(answer("What is our data retention policy?"))

The loop handles the case where the model issues multiple search calls in sequence before producing a final answer.

Multimodal input

Several models on Lyceum (such as openbmb/MiniCPM-V-4_5) are vision-language models that accept images alongside text. Pass an image_url content part to describe or reason over an image:

response = lyceum.chat.completions.create(
    model="openbmb/MiniCPM-V-4_5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "<image-url>"}},
            ],
        }
    ],
    max_tokens=256,
)

print(response.choices[0].message.content)

You can combine this with LightOn search to ground answers about an image in your indexed documents.

Scoping retrieval to a workspace

Pass workspace_id to limit search to a specific workspace. This is useful in multi-tenant products where each customer’s data lives in a dedicated workspace.

answer("Summarize the onboarding checklist", workspace_id=[42])

Choosing a model

Lyceum’s catalog includes a wide range of hosted models. Pass the model name to the model parameter:

Model	Notes
`meta-llama/Llama-3.3-70B-Instruct`	Strong general-purpose model, good default choice
`Qwen/Qwen3-235B-A22B-Instruct-2507`	Large MoE model for demanding reasoning
`Qwen/Qwen3-32B`	Capable mid-size model, lower latency
`google/gemma-3-27b-it`	Compact, efficient instruction-tuned model
`openai/gpt-oss-120b`	Open-weight GPT model
`Qwen/Qwen2.5-VL-72B-Instruct`	Vision-language model, accepts image input
`openbmb/MiniCPM-V-4_5`	Lightweight vision-language model

You can list the models available to your key at any time:

print([m.id for m in lyceum.models.list().data])

Check the Lyceum documentation for the current model list and pricing.

Streaming responses

Lyceum’s endpoint supports streaming. Enable it by passing stream=True and iterating over the response:

stream = lyceum.chat.completions.create(
    model="openbmb/MiniCPM-V-4_5",
    messages=[...],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

​Prerequisites

​Installation

​Full example

​LightOn search as a tool

​Multimodal input

​Scoping retrieval to a workspace

​Choosing a model

​Streaming responses