Voice & appearance¶
Custom TTS¶
Replace the default text-to-speech engine with a custom implementation. The TTS engine should stream audio in real-time.
Extend uni.TtsEngine and register via @uni.tts:
@uni.tts
class MyCustomTts(uni.TtsEngine):
def __init__(self):
self._sample_rate = 24000
self._bytes_per_sample = 2 # 16-bit PCM
@property
def sample_rate(self) -> int:
"""Sample rate in Hz."""
return self._sample_rate
@property
def bytes_per_sample(self) -> int:
"""Bytes per sample (2 for 16-bit PCM)."""
return self._bytes_per_sample
def load(self) -> None:
"""Load resources needed by the TTS engine."""
logger.info("Custom TTS loaded")
def synthesize(self, text: str) -> Iterator[bytes]:
"""Generate and stream audio for the given text."""
for chunk in generate_audio_chunks(text):
yield chunk
def shutdown(self) -> None:
"""Release resources held by the TTS engine."""
pass
Custom avatars¶
Avatars control UNI's visual representation. They can react to TTS audio levels (pulse/lip sync) and support expressions ("happy", "listening", "surprised").
Since avatars render client-side, they're built in JavaScript. The JS module exposes a mount(context) function. On the Python side, use @uni.avatar to register it.
PNG-based avatar example¶
from pathlib import Path
import uni_plugin_sdk as uni
images_dir = Path(__file__).parent / "static" / "images"
@uni.avatar(module="~/avatar.js")
def create_avatar() -> dict[str, Any]:
return {
"expressions": [f.stem for f in images_dir.glob("*.png")]
}
The object returned from create_avatar is passed to your JavaScript module.
Automatic expressions
If expressions is present in the returned object, UNI triggers them automatically during interactions based on sentiment analysis. Use descriptive names.
/**
* @typedef {Object} AvatarContext
* @property {HTMLElement} container - The container element.
* @property {Object} config - Config object from the server.
* @property {string} path - Static directory path of the plugin.
* @property {function(src:string):Promise<void>} loadScript - Load a static script.
* @property {function(eventName: string, callback: Function): void} on - Register event listener.
*/
/**
* @param {AvatarContext} context
* @returns {Promise<Function|undefined>}
*/
export const mount = async (context) => {
const getImagePath = (fileName) => {
return `${context.path}/expressions/${fileName}.png`;
};
const img = document.createElement("img");
img.src = getImagePath("default");
img.style.transition = "transform 0.2s ease-out";
context.container.appendChild(img);
context.on("expression", (name) => (img.src = getImagePath(name)));
context.on("audio", (lv) => (img.style.transform = `scale(${1 + lv * 0.1})`));
await Promise.all(
context.config.expressions.map((name) => {
return new Promise((resolve) => {
const img = new Image();
img.onload = img.onerror = resolve;
img.src = getImagePath(name);
});
})
);
return () => img.remove();
};
Advanced example
Check out the included uni_avatar_live2d plugin for an example with user-provided assets and external libraries.
Expressions¶
Avatars can optionally support expressions (emotes). They're triggered automatically:
| Expression | Trigger |
|---|---|
"default" |
Active by default (otherwise first one is used) |
"sleeping" |
Active during the sleep cycle |
"listening" |
Active while the user is speaking |
| Other names | Auto-fire while UNI speaks (sentiment-based) |
Naming matters
Use descriptive expression names for sentiment analysis to work correctly.
Audio effects¶
Post-process TTS output in real-time with effects like pitch shift or reverb. Multiple effects can be active at once.
Extend uni.AudioEffect and register via @uni.audio_effect:
import numpy as np
@uni.audio_effect
class EchoEffect(uni.AudioEffect):
def __init__(self):
self.delay_samples = 0
self.decay = 0.5
self.buffer: np.ndarray = None
def configure(self, sample_rate: int, bytes_per_sample: int) -> None:
"""Configure according to the TTS audio format."""
self.delay_samples = int(0.2 * sample_rate)
self.buffer = np.zeros(self.delay_samples, dtype=np.int16)
def apply(self, chunk: bytes) -> bytes:
"""Apply effect to a chunk of PCM audio."""
samples = np.frombuffer(chunk, dtype=np.int16)
output = samples.copy()
for i in range(len(samples)):
echo = self.buffer[i % self.delay_samples]
output[i] = np.clip(output[i] + echo * self.decay, -32768, 32767)
self.buffer[i % self.delay_samples] = samples[i]
return output.astype(np.int16).tobytes()
def reset(self) -> None:
"""Reset internal state between audio streams."""
self.buffer.fill(0)
Wake word detection¶
Provide custom wake word detection by extending uni.WakeWordEngine and registering via @uni.wake_word:
@uni.wake_word
class MyWakeWordEngine(uni.WakeWordEngine):
def load(self) -> None:
config = uni.get_config()
words = config.plugin.get("words", str, "").split(",")
sensitivity = config.plugin.get("sensitivity", float, 0.5)
self._detector = load_model(words, sensitivity)
def detect(self, audio_chunk: bytes, sample_rate: int) -> bool:
return self._detector.detected(audio_chunk, sample_rate)
def shutdown(self) -> None:
self._detector.close()