Documentation Index
Fetch the complete documentation index at: https://developers.lighton.ai/llms.txt
Use this file to discover all available pages before exploring further.
Scaleway’s Generative APIs expose hosted models (Llama, Mistral, and others) through an OpenAI-compatible endpoint. Pair them with LightOn search to build RAG pipelines where the retrieval stays on LightOn’s infrastructure and the generation runs on Scaleway.
The flow is:
- Search LightOn for the passages most relevant to the user’s question.
- Pack those passages into the model’s context window.
- Call the Scaleway model to generate an answer grounded in the retrieved content.
Prerequisites
- A
CONSOLE_API_KEY, available in the Console → API Keys section.
- A Scaleway API key with access to Generative APIs, available in the Scaleway console under IAM → API Keys.
- At least one workspace with indexed documents on LightOn.
Installation
pip install requests openai
The openai package is used here only for its client; Scaleway’s endpoint is fully compatible with it.
Full example
import os
import requests
from openai import OpenAI
CONSOLE_API_KEY = os.environ["CONSOLE_API_KEY"]
SCALEWAY_API_KEY = os.environ["SCALEWAY_API_KEY"]
scaleway = OpenAI(
base_url="https://api.scaleway.ai/v1",
api_key=SCALEWAY_API_KEY,
)
def search(query: str, workspace_id: list[int] | None = None, max_results: int = 5) -> list[dict]:
payload = {"query": query, "max_results": max_results}
if workspace_id:
payload["workspace_id"] = workspace_id
response = requests.post(
"https://api.lighton.ai/api/v3/search",
headers={"Authorization": f"Bearer {CONSOLE_API_KEY}"},
json=payload,
)
response.raise_for_status()
return response.json()["results"]
def answer(question: str, workspace_id: list[int] | None = None, model: str = "llama-3.3-70b-instruct") -> str:
results = search(question, workspace_id=workspace_id)
context = "\n\n".join(
f"[{r['source']['filename']}, p.{r['source']['page_start']}]\n{r['content']}"
for r in results
if r["content"]
)
completion = scaleway.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": (
"You are a helpful assistant. Answer the user's question using only "
"the provided context. If the context does not contain enough information, "
"say so.\n\nContext:\n" + context
),
},
{"role": "user", "content": question},
],
)
return completion.choices[0].message.content
print(answer("What is our data retention policy?"))
Instead of always searching before calling the model, you can expose LightOn search as a tool and let the model decide when to call it. The model issues a lighton_search tool call when it needs context; your code executes the search and feeds the results back; the model then produces a final answer.
import json
import os
import requests
from openai import OpenAI
CONSOLE_API_KEY = os.environ["CONSOLE_API_KEY"]
SCALEWAY_API_KEY = os.environ["SCALEWAY_API_KEY"]
scaleway = OpenAI(
base_url="https://api.scaleway.ai/v1",
api_key=SCALEWAY_API_KEY,
)
SEARCH_TOOL = {
"type": "function",
"function": {
"name": "lighton_search",
"description": (
"Search the company knowledge base for passages relevant to a query. "
"Returns ranked excerpts with their source filename and page numbers."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural-language search query.",
},
"max_results": {
"type": "integer",
"description": "Number of passages to return (1–50, default 5).",
"default": 5,
},
},
"required": ["query"],
},
},
}
def run_search(query: str, max_results: int = 5) -> str:
response = requests.post(
"https://api.lighton.ai/api/v3/search",
headers={"Authorization": f"Bearer {CONSOLE_API_KEY}"},
json={"query": query, "max_results": max_results},
)
response.raise_for_status()
results = response.json()["results"]
passages = [
f"[{r['source']['filename']}, p.{r['source']['page_start']}]\n{r['content']}"
for r in results
if r["content"]
]
return "\n\n".join(passages) if passages else "No results found."
def answer(question: str, model: str = "llama-3.3-70b-instruct") -> str:
messages = [{"role": "user", "content": question}]
while True:
completion = scaleway.chat.completions.create(
model=model,
tools=[SEARCH_TOOL],
messages=messages,
)
choice = completion.choices[0]
if choice.finish_reason == "tool_calls":
messages.append(choice.message)
for call in choice.message.tool_calls:
args = json.loads(call.function.arguments)
result = run_search(**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": result,
})
else:
return choice.message.content
print(answer("What is our data retention policy?"))
The loop handles the case where the model issues multiple search calls in sequence before producing a final answer.
Scoping retrieval to a workspace
Pass workspace_id to limit search to a specific workspace. This is useful in multi-tenant products where each customer’s data lives in a dedicated workspace.
answer("Summarize the onboarding checklist", workspace_id=[42])
Choosing a model
Scaleway’s catalog includes several hosted models. Pass the model name to the model parameter:
| Model | Notes |
|---|
llama-3.3-70b-instruct | Strong reasoning, good default choice |
llama-3.1-8b-instruct | Faster and cheaper, suitable for simpler queries |
mistral-nemo-instruct-2407 | Compact Mistral, low latency |
mixtral-8x7b-instruct-v0.1 | MoE model, good for longer contexts |
Check the Scaleway documentation for the current model list and regional availability.
Streaming responses
Scaleway’s endpoint supports streaming. Enable it by passing stream=True and iterating over the response:
stream = scaleway.chat.completions.create(
model="llama-3.3-70b-instruct",
messages=[...],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)