Context & prompts¶

Context injection¶

Context is additional information injected into the LLM's input for a single response. It's not visible to the user and doesn't persist between turns. This is the UNI equivalent of RAG (Retrieval-Augmented Generation).

Injecting weather context

@uni.message_context
def inject_weather(event: uni.MessageEvent) -> str | None:
    if "weather" in event.message.lower():
        return "Current weather: 22°C, sunny in Amsterdam"
    return None

Return a string to inject context, or None to skip. The LLM sees the context alongside the user's message.

Preloading¶

The preload callback fires as soon as the user starts speaking, before the message is transcribed. This lets you fetch data in parallel with speech-to-text so it's ready by the time you need it:

Preloading weather data

@uni.cache(ttl="30m")
def fetch_weather_data():
    return {"temp": 22, "conditions": "sunny"}

@uni.message_context(preload=fetch_weather_data)
def inject_weather(event: uni.MessageEvent, weather_data: dict) -> str | None:
    if "weather" in event.message.lower():
        return weather_data
    return None

The return value from the preload function is passed as the second argument to the context handler.

Prompt modifiers¶

Prompt modifiers alter the system prompt or user prompt before they reach the LLM. Use context injection when the content depends on the user's message. Use prompt modifiers when the content should always be present, or when you need to transform the prompt itself.

System prompt¶

Alter the system prompt that shapes UNI's personality and behavior. This runs once daily (or on reset), not per message.

Adding TTS emotive tags

@uni.system_prompt_modifier
def add_emotive_instructions(event: uni.SystemPromptEvent) -> str:
    return event.prompt + """

When speaking, you can use these emotive tags:
<laugh>, <sigh>, <gasp>, <whisper>
"""

Ordering

Control execution order with the priority parameter. Lower numbers run earlier. Default: 100.

User prompt¶

Alter user messages after speech-to-text. Useful for prepending metadata like timestamps or speaker identity:

Prepending a timestamp

@uni.user_prompt_modifier
def prepend_timestamp(event: uni.UserPromptEvent) -> str:
    return datetime.now().isoformat() + " " + event.prompt

User prompt modifiers support the same priority parameter.

LLM prompts¶

Run standalone LLM prompts for background tasks like summarization, classification, or data extraction. These use the lightweight "tasks" model, not the main conversation model.

Summarizing an article

summary = uni.prompt_llm(
    system_prompt="Summarize this article in 2-3 sentences. Answer with only the summary.",
    user_prompt=long_article_text
)

Structured output¶

Force JSON output with formatting="json":

Extracting structured data

result = uni.prompt_llm(
    system_prompt="""
    Extract key information. Respond with JSON:
    {
        "summary": "brief summary",
        "key_points": ["point 1", "point 2"],
        "sentiment": "positive/negative/neutral"
    }
    """,
    user_prompt=document,
    formatting="json"
)

print(result["summary"])

Or enforce a specific schema with a Pydantic model:

Schema-enforced output

from pydantic import BaseModel

class WeatherAlert(BaseModel):
    severity: str  # low, medium, high
    title: str
    description: str

alert = uni.prompt_llm(
    system_prompt="""
    Parse this weather alert. Respond with JSON:
    {
        "severity": "low/medium/high",
        "title": "Alert title",
        "description": "Detailed description"
    }
    """,
    user_prompt=alert_text,
    formatting=WeatherAlert.model_json_schema()
)

Role-based access¶

For finer control, use uni.llm_chat to talk to any configured LLM role ("main", "tasks", "sleep", "embeddings", or plugin-defined roles):

Streaming from a specific role

messages = [
    {"role": "system", "content": "You write concise haikus about weather."},
    {"role": "user", "content": "Describe today's forecast in Seattle."},
]

for chunk in uni.llm_chat("tasks", messages):
    print(chunk["message"]["content"], end="")

Embeddings¶

Generate vector embeddings for semantic search:

Generating embeddings

vector = uni.llm_embeddings("Cozy coffee shop with lots of natural light")

Track which model produced a vector so you can invalidate caches when the model changes:

Tracking the embedding model

vector = uni.llm_embeddings(text)
model_id = uni.llm_embeddings_model()

Reacting to model changes

@uni.on_embeddings_model_changed
def handle_model_change(event: uni.EmbeddingsModelChangedEvent):
    if event.new_model:
        recompute_stored_embeddings()

For data storage, encryption, and other helpers, see Utilities. For tools the LLM can call during conversations, see Tools & cards.