Context & prompts¶
Context injection¶
Context is additional information injected into the LLM's input for a single response. It's not visible to the user and doesn't persist between turns.
@uni.message_context
def inject_weather(event: uni.MessageEvent) -> str | None:
if "weather" in event.message.lower():
return "Current weather: 22°C, sunny in Amsterdam"
return None
Return a string to inject context, or None to skip. The LLM sees the context alongside the user's message.
Preloading¶
The preload callback fires as soon as the user starts speaking, before the message is transcribed. This lets you pre-fetch data so it's ready by the time the user finishes speaking:
@uni.cache(ttl="30m")
def fetch_weather_data():
return {"temp": 22, "conditions": "sunny"}
@uni.message_context(preload=fetch_weather_data)
def inject_weather(event: uni.MessageEvent, weather_data: dict) -> str | None:
if "weather" in event.message.lower():
return weather_data
return None
The return value from the preload function is passed as the second argument to the context handler.
Progressive preloading¶
When a pre-fetch depends on what the user is saying (typically semantic search), return a callable from your preload. UNI calls it on every partial transcription:
def preload_memories():
def update(partial_text: str, current_data):
if len(partial_text) < 10:
return current_data
return search_memories(partial_text)
return update
@uni.message_context(preload=preload_memories)
def inject_memories(event: uni.MessageEvent, results) -> str | None:
if not results:
return None
formatted = "\n".join(f"- {record.content}" for record in results)
return "Possibly relevant memories:\n" + formatted
preload_memories runs once. The returned update runs for each partial with two arguments:
| Argument | Description |
|---|---|
partial_text |
Transcription so far (speculative, may change as the user talks) |
current_data |
Value returned by the previous update call (None on the first) |
Each return value becomes the next current_data. The handler receives the final value once the transcript is confirmed.
Updaters run synchronously on each partial, so keep them fast. A large-corpus semantic search may need its own debounce or threading.
Prompt modifiers¶
Prompt modifiers change the system or user prompt before it reaches the LLM.
System prompt¶
Alter the system prompt that shapes UNI's personality and behavior. This runs once daily (or on reset), not per message.
@uni.system_prompt_modifier
def add_emotive_instructions(event: uni.SystemPromptEvent) -> str:
return event.prompt + """
When speaking, you can use these emotive tags:
<laugh>, <sigh>, <gasp>, <whisper>
"""
Ordering
Control execution order with the priority parameter. Lower numbers run earlier. Default: 100.
User prompt¶
Alter user messages after speech-to-text. Useful for prepending metadata like timestamps or speaker identity:
@uni.user_prompt_modifier
def prepend_timestamp(event: uni.UserPromptEvent) -> str:
return datetime.now().isoformat() + " " + event.prompt
User prompt modifiers support the same priority parameter.
LLM prompts¶
Run standalone LLM prompts for background tasks like summarization, classification, or data extraction. These use the lightweight "tasks" model, not the main conversation model.
summary = uni.prompt_llm(
system_prompt="Summarize this article in 2-3 sentences. Answer with only the summary.",
user_prompt=long_article_text
)
Structured output¶
Force JSON output with formatting="json":
result = uni.prompt_llm(
system_prompt="""
Extract key information. Respond with JSON:
{
"summary": "brief summary",
"key_points": ["point 1", "point 2"],
"sentiment": "positive/negative/neutral"
}
""",
user_prompt=document,
formatting="json"
)
print(result["summary"])
Or enforce a specific schema with a Pydantic model:
from pydantic import BaseModel
class WeatherAlert(BaseModel):
severity: str # low, medium, high
title: str
description: str
alert = uni.prompt_llm(
system_prompt="""
Parse this weather alert. Respond with JSON:
{
"severity": "low/medium/high",
"title": "Alert title",
"description": "Detailed description"
}
""",
user_prompt=alert_text,
formatting=WeatherAlert.model_json_schema()
)
Full control¶
For full chat instead of one-off prompts, use uni.llm_chat to talk to any configured LLM role ("main", "tasks", "sleep", "embeddings", or plugin-defined roles):
messages = [
{"role": "system", "content": "You write concise haikus about weather."},
{"role": "user", "content": "Describe today's forecast in Seattle."},
]
for chunk in uni.llm_chat("tasks", messages):
print(chunk["message"]["content"], end="")
Embeddings¶
Generate vector embeddings for semantic search:
vector = uni.llm_embeddings("Cozy coffee shop with lots of natural light")
Track which model produced a vector so you can invalidate caches when the model changes:
vector = uni.llm_embeddings(text)
model_id = uni.llm_embeddings_model()
@uni.on_embeddings_model_changed
def handle_model_change(event: uni.EmbeddingsModelChangedEvent):
if event.new_model:
recompute_stored_embeddings()
For data storage, encryption, and other helpers, see Utilities. For tools the LLM can call during conversations, see Tools & cards.