Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.lighton.ai/llms.txt

Use this file to discover all available pages before exploring further.

Scaleway’s Generative APIs expose hosted models (Llama, Mistral, and others) through an OpenAI-compatible endpoint. Pair them with LightOn search to build RAG pipelines where the retrieval stays on LightOn’s infrastructure and the generation runs on Scaleway. The flow is:
  1. Search LightOn for the passages most relevant to the user’s question.
  2. Pack those passages into the model’s context window.
  3. Call the Scaleway model to generate an answer grounded in the retrieved content.

Prerequisites

  • A CONSOLE_API_KEY, available in the Console → API Keys section.
  • A Scaleway API key with access to Generative APIs, available in the Scaleway console under IAM → API Keys.
  • At least one workspace with indexed documents on LightOn.

Installation

pip install requests openai
The openai package is used here only for its client; Scaleway’s endpoint is fully compatible with it.

Full example

import os
import requests
from openai import OpenAI

CONSOLE_API_KEY = os.environ["CONSOLE_API_KEY"]
SCALEWAY_API_KEY = os.environ["SCALEWAY_API_KEY"]

scaleway = OpenAI(
    base_url="https://api.scaleway.ai/v1",
    api_key=SCALEWAY_API_KEY,
)


def search(query: str, workspace_id: list[int] | None = None, max_results: int = 5) -> list[dict]:
    payload = {"query": query, "max_results": max_results}
    if workspace_id:
        payload["workspace_id"] = workspace_id

    response = requests.post(
        "https://api.lighton.ai/api/v3/search",
        headers={"Authorization": f"Bearer {CONSOLE_API_KEY}"},
        json=payload,
    )
    response.raise_for_status()
    return response.json()["results"]


def answer(question: str, workspace_id: list[int] | None = None, model: str = "llama-3.3-70b-instruct") -> str:
    results = search(question, workspace_id=workspace_id)

    context = "\n\n".join(
        f"[{r['source']['filename']}, p.{r['source']['page_start']}]\n{r['content']}"
        for r in results
        if r["content"]
    )

    completion = scaleway.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant. Answer the user's question using only "
                    "the provided context. If the context does not contain enough information, "
                    "say so.\n\nContext:\n" + context
                ),
            },
            {"role": "user", "content": question},
        ],
    )
    return completion.choices[0].message.content


print(answer("What is our data retention policy?"))

LightOn search as a tool

Instead of always searching before calling the model, you can expose LightOn search as a tool and let the model decide when to call it. The model issues a lighton_search tool call when it needs context; your code executes the search and feeds the results back; the model then produces a final answer.
import json
import os
import requests
from openai import OpenAI

CONSOLE_API_KEY = os.environ["CONSOLE_API_KEY"]
SCALEWAY_API_KEY = os.environ["SCALEWAY_API_KEY"]

scaleway = OpenAI(
    base_url="https://api.scaleway.ai/v1",
    api_key=SCALEWAY_API_KEY,
)

SEARCH_TOOL = {
    "type": "function",
    "function": {
        "name": "lighton_search",
        "description": (
            "Search the company knowledge base for passages relevant to a query. "
            "Returns ranked excerpts with their source filename and page numbers."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Natural-language search query.",
                },
                "max_results": {
                    "type": "integer",
                    "description": "Number of passages to return (1–50, default 5).",
                    "default": 5,
                },
            },
            "required": ["query"],
        },
    },
}


def run_search(query: str, max_results: int = 5) -> str:
    response = requests.post(
        "https://api.lighton.ai/api/v3/search",
        headers={"Authorization": f"Bearer {CONSOLE_API_KEY}"},
        json={"query": query, "max_results": max_results},
    )
    response.raise_for_status()
    results = response.json()["results"]
    passages = [
        f"[{r['source']['filename']}, p.{r['source']['page_start']}]\n{r['content']}"
        for r in results
        if r["content"]
    ]
    return "\n\n".join(passages) if passages else "No results found."


def answer(question: str, model: str = "llama-3.3-70b-instruct") -> str:
    messages = [{"role": "user", "content": question}]

    while True:
        completion = scaleway.chat.completions.create(
            model=model,
            tools=[SEARCH_TOOL],
            messages=messages,
        )
        choice = completion.choices[0]

        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)
            for call in choice.message.tool_calls:
                args = json.loads(call.function.arguments)
                result = run_search(**args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": result,
                })
        else:
            return choice.message.content


print(answer("What is our data retention policy?"))
The loop handles the case where the model issues multiple search calls in sequence before producing a final answer.

Scoping retrieval to a workspace

Pass workspace_id to limit search to a specific workspace. This is useful in multi-tenant products where each customer’s data lives in a dedicated workspace.
answer("Summarize the onboarding checklist", workspace_id=[42])

Choosing a model

Scaleway’s catalog includes several hosted models. Pass the model name to the model parameter:
ModelNotes
llama-3.3-70b-instructStrong reasoning, good default choice
llama-3.1-8b-instructFaster and cheaper, suitable for simpler queries
mistral-nemo-instruct-2407Compact Mistral, low latency
mixtral-8x7b-instruct-v0.1MoE model, good for longer contexts
Check the Scaleway documentation for the current model list and regional availability.

Streaming responses

Scaleway’s endpoint supports streaming. Enable it by passing stream=True and iterating over the response:
stream = scaleway.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[...],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)