Context & prompts¶
Context injection¶
Context is additional information injected into the LLM's input for a single response. It's not visible to the user and doesn't persist between turns. This is the UNI equivalent of RAG (Retrieval-Augmented Generation).
@uni.message_context
def inject_weather(event: uni.MessageEvent) -> str | None:
if "weather" in event.message.lower():
return "Current weather: 22°C, sunny in Amsterdam"
return None
Return a string to inject context, or None to skip. The LLM sees the context alongside the user's message.
Preloading¶
The preload callback fires as soon as the user starts speaking, before the message is transcribed. This lets you fetch data in parallel with speech-to-text so it's ready by the time you need it:
@uni.cache(ttl="30m")
def fetch_weather_data():
return {"temp": 22, "conditions": "sunny"}
@uni.message_context(preload=fetch_weather_data)
def inject_weather(event: uni.MessageEvent, weather_data: dict) -> str | None:
if "weather" in event.message.lower():
return weather_data
return None
The return value from the preload function is passed as the second argument to the context handler.
Prompt modifiers¶
Prompt modifiers alter the system prompt or user prompt before they reach the LLM. Use context injection when the content depends on the user's message. Use prompt modifiers when the content should always be present, or when you need to transform the prompt itself.
System prompt¶
Alter the system prompt that shapes UNI's personality and behavior. This runs once daily (or on reset), not per message.
@uni.system_prompt_modifier
def add_emotive_instructions(event: uni.SystemPromptEvent) -> str:
return event.prompt + """
When speaking, you can use these emotive tags:
<laugh>, <sigh>, <gasp>, <whisper>
"""
Ordering
Control execution order with the priority parameter. Lower numbers run earlier. Default: 100.
User prompt¶
Alter user messages after speech-to-text. Useful for prepending metadata like timestamps or speaker identity:
@uni.user_prompt_modifier
def prepend_timestamp(event: uni.UserPromptEvent) -> str:
return datetime.now().isoformat() + " " + event.prompt
User prompt modifiers support the same priority parameter.
LLM prompts¶
Run standalone LLM prompts for background tasks like summarization, classification, or data extraction. These use the lightweight "tasks" model, not the main conversation model.
summary = uni.prompt_llm(
system_prompt="Summarize this article in 2-3 sentences. Answer with only the summary.",
user_prompt=long_article_text
)
Structured output¶
Force JSON output with formatting="json":
result = uni.prompt_llm(
system_prompt="""
Extract key information. Respond with JSON:
{
"summary": "brief summary",
"key_points": ["point 1", "point 2"],
"sentiment": "positive/negative/neutral"
}
""",
user_prompt=document,
formatting="json"
)
print(result["summary"])
Or enforce a specific schema with a Pydantic model:
from pydantic import BaseModel
class WeatherAlert(BaseModel):
severity: str # low, medium, high
title: str
description: str
alert = uni.prompt_llm(
system_prompt="""
Parse this weather alert. Respond with JSON:
{
"severity": "low/medium/high",
"title": "Alert title",
"description": "Detailed description"
}
""",
user_prompt=alert_text,
formatting=WeatherAlert.model_json_schema()
)
Role-based access¶
For finer control, use uni.llm_chat to talk to any configured LLM role ("main", "tasks", "sleep", "embeddings", or plugin-defined roles):
messages = [
{"role": "system", "content": "You write concise haikus about weather."},
{"role": "user", "content": "Describe today's forecast in Seattle."},
]
for chunk in uni.llm_chat("tasks", messages):
print(chunk["message"]["content"], end="")
Embeddings¶
Generate vector embeddings for semantic search:
vector = uni.llm_embeddings("Cozy coffee shop with lots of natural light")
Track which model produced a vector so you can invalidate caches when the model changes:
vector = uni.llm_embeddings(text)
model_id = uni.llm_embeddings_model()
@uni.on_embeddings_model_changed
def handle_model_change(event: uni.EmbeddingsModelChangedEvent):
if event.new_model:
recompute_stored_embeddings()
For data storage, encryption, and other helpers, see Utilities. For tools the LLM can call during conversations, see Tools & cards.