Changelog
2024-11-21
Natural Conversations
The new bot_profile
called natural_conversation_2024_11
enables more natural dialogue with the bot. This profile uses two specialized LLMs working together:
- A “judge” LLM that determines when users have finished speaking
- A conversation LLM that maintains the full context of the dialogue
The judge LLM analyzes both:
- The most recent conversation turn
- The completeness of the user’s speech
This allows users to pause and think without being interrupted. While this approach works well in most scenarios, we’ve implemented a timeout mechanism as a fallback in cases where the judge LLM incorrectly determines speech completion.
We’re actively improving this feature and welcome your feedback! Join our community on Discord to share your experience.
Noise cancellation
You can now enable noise cancellation for the Daily service. This feature eliminates background sounds, including voices, from a participant’s audio stream. By doing so, it enhances the bot’s ability to accurately understand the participant’s voice and prevents interruptions caused by background noise or conversations. Check out the Daily service page for more information.
New TTS Service: Rime
Rime is now available as a TTS service. Learn how to get started by visiting the Rime TTS service page.
New TTS service_option: voice
You can now specify voice
as a service_option
for your TTS service. Using the voice
service_option
initializes the TTS service with the specified voice before joining. With this change, you can still specify voice
as a config
option during runtime.
2024-11-10
Recording your Daily Bots sessions
You can now record your Daily Bots sessions. To learn more about how to get started, check out the Recording guide and the recording API reference docs.
New REST API endpoints for Twilio-connected bots
You can now configure and manage your Twilio bot configurations using new REST API endpoints. Check out the Twilio endpoints reference docs for more information.
New Integrated TTS Services
Deepgram’s Aura and ElevenLabs are now integrated TTS services. You can use them without needing to bring your own key. Check out their docs pages for more information on how to use each one:
New Claude models supported
Anthropic is an integrated LLM service. We now support additional Claude 3.5 models. See the Anthropic LLM service page for more information.
New Daily service options
daily
is listed as a new supported service. Using daily
service_options
will let you enable different features for your bot. The first feature we’ve added is to mute the bot when a third participant joins the call. This turns your bot into a listener that can continue to transcribe the call. Check out the Daily service page for more information.
Fixes and improvements
- Fixed an issue where Anthropic function calling was not working reliably.
- Fixed an issue where the bot’s TTS output could overlap.
- Fixed an issue where PlayHT was reporting incorrect TTFB metrics.
- Fixed an issue where Azure TTS was not initializing to non-English languages correctly.
- Improved the Azure TTS TTS service to support websocket based communication for lower latency TTS.
- Improved the reliablilty and consistency of the started and stopped speaking events.
2024-10-25
New LLM Services
We’ve added two new LLM providers: Google Gemini and Grok. Both providers require you to bring your own key. Check out their docs pages for more information on how to use each one:
New STT Services
We’ve also added two new STT providers: AssemblyAI and Gladia. Both providers require you to bring your own key. Check out their docs pages for more information on how to use each one:
New Service Options available
Service options allow you to configure a service provider at initialization time. To give you more control over the initial state of these services, we’ve added a number of new service_options
to each service.
- STT providers: You can initialize their
language
andmodel
(where available) parameters. - LLM providers: You can initialize their
model
parameter. - TTS providers: You can initialize their
sample_rate
.
See the Service Options page to learn more.
Control the sample rate of TTS services
You can now control the sample rate of TTS services. This is useful if you want to change the quality of the audio output. The default sample rate is 24000. You can change this by setting the sample_rate
service option in the service_options
object. For example:
More Deepgram SST options
We’ve added support for more Deepgram options. You can now configure any of the available body parameters from the Deepgram’s live audio API as service_options
. Refer to the Deepgram STT service page for more information.
2024-10-18
A quick follow-on update after our big deploy from a few days ago. We’ve added a few more TTS services:
2024-10-16
This update is so big, it warranted the creation of the Daily Bots Changelog! You can reference this page in the future to see what changes each time we ship an update to Daily Bots.
In today’s release:
New Text-to-Speech Services
We’ve added several new TTS providers. Take a look at their docs pages for more details on how to use each one:
Service Configuration Options
We’ve added a lot of “pass-through” options for many of the supported services. For example, the OpenAI LLM service now supports configuring things like temperature
, top_p
, and many other configuration options. Refer to the newly-updated service pages in the API reference for more information.
Bot Profiles: OpenAI Realtime API Beta and Other Updates
We’ve added an initial beta of a bot profile that uses OpenAI’s new Realtime API. You can try using OpenAI’s speech-to-speech model in Daily Bots by using the openai_realtime_beta_2024_10
bot profile. The underlying API itself is still very much in beta and changing quite a bit, so expect the bot profile to change too. Bot profiles are documented here.
We’ve also added new bot profiles for voice and vision: voice_2024_10
and vision_2024_10
. These profiles support the just-released RTVI 0.2.0 and offer more events and callbacks. For example, you can now get the bot’s LLM output directly, or the streamed word-by-word speech fed to the TTS service. More information on RTVI 0.2.0 will be available soon; we’re in the process of moving those docs over to the Pipecat docs page.