Skip to content

Context & prompts

Context injection

Context is additional information injected into the LLM's input for a single response. It's not visible to the user and doesn't persist between turns.

Injecting weather context
@uni.message_context
def inject_weather(event: uni.MessageEvent) -> str | None:
    if "weather" in event.message.lower():
        return "Current weather: 22°C, sunny in Amsterdam"
    return None

Return a string to inject context, or None to skip. The LLM sees the context alongside the user's message.

Preloading

The preload callback fires as soon as the user starts speaking, before the message is transcribed. This lets you pre-fetch data so it's ready by the time the user finishes speaking:

Preloading weather data
@uni.cache(ttl="30m")
def fetch_weather_data():
    return {"temp": 22, "conditions": "sunny"}

@uni.message_context(preload=fetch_weather_data)
def inject_weather(event: uni.MessageEvent, weather_data: dict) -> str | None:
    if "weather" in event.message.lower():
        return weather_data
    return None

The return value from the preload function is passed as the second argument to the context handler.

Progressive preloading

When a pre-fetch depends on what the user is saying (typically semantic search), return a callable from your preload. UNI calls it on every partial transcription:

Searching memories as the user speaks
def preload_memories():
    def update(partial_text: str, current_data):
        if len(partial_text) < 10:
            return current_data
        return search_memories(partial_text)

    return update

@uni.message_context(preload=preload_memories)
def inject_memories(event: uni.MessageEvent, results) -> str | None:
    if not results:
        return None
    formatted = "\n".join(f"- {record.content}" for record in results)
    return "Possibly relevant memories:\n" + formatted

preload_memories runs once. The returned update runs for each partial with two arguments:

Argument Description
partial_text Transcription so far (speculative, may change as the user talks)
current_data Value returned by the previous update call (None on the first)

Each return value becomes the next current_data. The handler receives the final value once the transcript is confirmed.

Updaters run synchronously on each partial, so keep them fast. A large-corpus semantic search may need its own debounce or threading.

Prompt modifiers

Prompt modifiers change the system or user prompt before it reaches the LLM.

System prompt

Alter the system prompt that shapes UNI's personality and behavior. This runs once daily (or on reset), not per message.

Adding TTS emotive tags
@uni.system_prompt_modifier
def add_emotive_instructions(event: uni.SystemPromptEvent) -> str:
    return event.prompt + """

When speaking, you can use these emotive tags:
<laugh>, <sigh>, <gasp>, <whisper>
"""

Ordering

Control execution order with the priority parameter. Lower numbers run earlier. Default: 100.

User prompt

Alter user messages after speech-to-text. Useful for prepending metadata like timestamps or speaker identity:

Prepending a timestamp
@uni.user_prompt_modifier
def prepend_timestamp(event: uni.UserPromptEvent) -> str:
    return datetime.now().isoformat() + " " + event.prompt

User prompt modifiers support the same priority parameter.

LLM prompts

Run standalone LLM prompts for background tasks like summarization, classification, or data extraction. These use the lightweight "tasks" model, not the main conversation model.

Summarizing an article
summary = uni.prompt_llm(
    system_prompt="Summarize this article in 2-3 sentences. Answer with only the summary.",
    user_prompt=long_article_text
)

Structured output

Force JSON output with formatting="json":

Extracting structured data
result = uni.prompt_llm(
    system_prompt="""
    Extract key information. Respond with JSON:
    {
        "summary": "brief summary",
        "key_points": ["point 1", "point 2"],
        "sentiment": "positive/negative/neutral"
    }
    """,
    user_prompt=document,
    formatting="json"
)

print(result["summary"])

Or enforce a specific schema with a Pydantic model:

Schema-enforced output
from pydantic import BaseModel

class WeatherAlert(BaseModel):
    severity: str  # low, medium, high
    title: str
    description: str

alert = uni.prompt_llm(
    system_prompt="""
    Parse this weather alert. Respond with JSON:
    {
        "severity": "low/medium/high",
        "title": "Alert title",
        "description": "Detailed description"
    }
    """,
    user_prompt=alert_text,
    formatting=WeatherAlert.model_json_schema()
)

Full control

For full chat instead of one-off prompts, use uni.llm_chat to talk to any configured LLM role ("main", "tasks", "sleep", "embeddings", or plugin-defined roles):

Streaming from a specific role
messages = [
    {"role": "system", "content": "You write concise haikus about weather."},
    {"role": "user", "content": "Describe today's forecast in Seattle."},
]

for chunk in uni.llm_chat("tasks", messages):
    print(chunk["message"]["content"], end="")

Embeddings

Generate vector embeddings for semantic search:

Generating embeddings
vector = uni.llm_embeddings("Cozy coffee shop with lots of natural light")

Track which model produced a vector so you can invalidate caches when the model changes:

Tracking the embedding model
vector = uni.llm_embeddings(text)
model_id = uni.llm_embeddings_model()
Reacting to model changes
@uni.on_embeddings_model_changed
def handle_model_change(event: uni.EmbeddingsModelChangedEvent):
    if event.new_model:
        recompute_stored_embeddings()

For data storage, encryption, and other helpers, see Utilities. For tools the LLM can call during conversations, see Tools & cards.