Description
Sidekick combines several focused services into a single real-time pipeline. Each component handles a clear stage in the loop from user speech to synchronized video output. The structure is straightforward to inspect, extend, and replace, so you can adapt the pipeline to your own characters or media sources.
- Groq Llama 3.3 70B for fast, context-aware dialogue generation.
- ElevenLabs TTS for generating speech matched to the character's style.
- Whisper STT (with optional MLX acceleration) for low-latency transcription.
- Decart Lip Sync model for aligning generated audio with video frames.
- Pipecat pipeline coordinating STT -> LLM -> TTS -> video streaming -> lip sync -> WebRTC transport.
- Sidekick WebRTC server managing signaling and pipeline lifecycle.
- Browser WebRTC client sending microphone audio and rendering synchronized video.
- VideoFileStreamer delivering 25 FPS frame loops for consistent lip-sync alignment.
- Async processors handling interruptions, context aggregation, and media flow.
Models
App gallery






