Context & prompts¶

Context injection¶

Context is additional information injected into the LLM's input for a single response. It's not visible to the user and doesn't persist between turns.

Injecting weather context

@uni.message_context
def inject_weather(event: uni.MessageEvent) -> str | None:
    if "weather" in event.message.lower():
        return "Current weather: 22°C, sunny in Amsterdam"
    return None

Return a string to inject context, or None to skip. The LLM sees the context alongside the user's message.

Preloading¶

The preload callback fires as soon as the user starts speaking, before the message is transcribed. This lets you pre-fetch data so it's ready by the time the user finishes speaking:

Preloading weather data

@uni.cache(ttl="30m")
def fetch_weather_data():
    return {"temp": 22, "conditions": "sunny"}

@uni.message_context(preload=fetch_weather_data)
def inject_weather(event: uni.MessageEvent, weather_data: dict) -> str | None:
    if "weather" in event.message.lower():
        return weather_data
    return None

The return value from the preload function is passed as the second argument to the context handler.

Progressive preloading¶

When a pre-fetch depends on what the user is saying (typically semantic search), return a callable from your preload. UNI calls it on every partial transcription:

Searching memories as the user speaks

def preload_memories():
    def update(partial_text: str, current_data):
        if len(partial_text) < 10:
            return current_data
        return search_memories(partial_text)

    return update

@uni.message_context(preload=preload_memories)
def inject_memories(event: uni.MessageEvent, results) -> str | None:
    if not results:
        return None
    formatted = "\n".join(f"- {record.content}" for record in results)
    return "Possibly relevant memories:\n" + formatted

preload_memories runs once. The returned update runs for each partial with two arguments:

Argument	Description
`partial_text`	Transcription so far (speculative, may change as the user talks)
`current_data`	Value returned by the previous `update` call (`None` on the first)

Each return value becomes the next current_data. The handler receives the final value once the transcript is confirmed.

Updaters run synchronously on each partial, so keep them fast. A large-corpus semantic search may need its own debounce or threading.

Prompt modifiers¶

Prompt modifiers change the system or user prompt before it reaches the LLM.

System prompt¶

Alter the system prompt that shapes UNI's personality and behavior. This runs once daily (or on reset), not per message.

Adding TTS emotive tags

@uni.system_prompt_modifier
def add_emotive_instructions(event: uni.SystemPromptEvent) -> str:
    return event.prompt + """

When speaking, you can use these emotive tags:
<laugh>, <sigh>, <gasp>, <whisper>
"""

Ordering

Control execution order with the priority parameter. Lower numbers run earlier. Default: 100.

User prompt¶

Alter user messages after speech-to-text. Useful for prepending metadata like timestamps or speaker identity:

Prepending a timestamp

@uni.user_prompt_modifier
def prepend_timestamp(event: uni.UserPromptEvent) -> str:
    return datetime.now().isoformat() + " " + event.prompt

User prompt modifiers support the same priority parameter.

LLM prompts¶

Run standalone LLM prompts for background tasks like summarization, classification, or data extraction. These use the lightweight "tasks" model, not the main conversation model.

Summarizing an article

summary = uni.llm_prompt(
    system_prompt="Summarize this article in 2-3 sentences. Answer with only the summary.",
    user_prompt=long_article_text
)

Structured output¶

Force JSON output with formatting="json":

Extracting structured data

result = uni.llm_prompt(
    system_prompt="""
    Extract key information. Respond with JSON:
    {
        "summary": "brief summary",
        "key_points": ["point 1", "point 2"],
        "sentiment": "positive/negative/neutral"
    }
    """,
    user_prompt=document,
    formatting="json"
)

print(result["summary"])

Or enforce a specific schema with a Pydantic model:

Schema-enforced output

from pydantic import BaseModel

class WeatherAlert(BaseModel):
    severity: str  # low, medium, high
    title: str
    description: str

alert = uni.llm_prompt(
    system_prompt="""
    Parse this weather alert. Respond with JSON:
    {
        "severity": "low/medium/high",
        "title": "Alert title",
        "description": "Detailed description"
    }
    """,
    user_prompt=alert_text,
    formatting=WeatherAlert.model_json_schema()
)

Full control¶

For full chat instead of one-off prompts, use uni.llm_chat to talk to any configured LLM role ("main", "tasks", "sleep", "embeddings", or plugin-defined roles):

Streaming from a specific role

messages = [
    {"role": "system", "content": "You write concise haikus about weather."},
    {"role": "user", "content": "Describe today's forecast in Seattle."},
]

for chunk in uni.llm_chat("tasks", messages):
    print(chunk["message"]["content"], end="")

Embeddings¶

Generate vector embeddings for semantic search:

Generating embeddings

vector = uni.llm_embed("Cozy coffee shop with lots of natural light")

Track which model produced a vector so you can invalidate caches when the model changes:

Tracking the embedding model

vector = uni.llm_embed(text)
model_id = uni.llm_embed_model()

Reacting to model changes

@uni.on_embed_model_changed
def handle_model_change(event: uni.EmbedModelChangedEvent):
    if event.new_model:
        recompute_stored_embeddings()

For data storage, encryption, and other helpers, see Utilities. For tools the LLM can call during conversations, see Tools & cards.