2024-11-21

Natural Conversations

The new bot_profile called natural_conversation_2024_11 enables more natural dialogue with the bot. This profile uses two specialized LLMs working together:

  • A “judge” LLM that determines when users have finished speaking
  • A conversation LLM that maintains the full context of the dialogue

The judge LLM analyzes both:

  • The most recent conversation turn
  • The completeness of the user’s speech

This allows users to pause and think without being interrupted. While this approach works well in most scenarios, we’ve implemented a timeout mechanism as a fallback in cases where the judge LLM incorrectly determines speech completion.

We’re actively improving this feature and welcome your feedback! Join our community on Discord to share your experience.

Noise cancellation

You can now enable noise cancellation for the Daily service. This feature eliminates background sounds, including voices, from a participant’s audio stream. By doing so, it enhances the bot’s ability to accurately understand the participant’s voice and prevents interruptions caused by background noise or conversations. Check out the Daily service page for more information.

New TTS Service: Rime

Rime is now available as a TTS service. Learn how to get started by visiting the Rime TTS service page.

New TTS service_option: voice

You can now specify voice as a service_option for your TTS service. Using the voice service_option initializes the TTS service with the specified voice before joining. With this change, you can still specify voice as a config option during runtime.

2024-11-10

Recording your Daily Bots sessions

You can now record your Daily Bots sessions. To learn more about how to get started, check out the Recording guide and the recording API reference docs.

New REST API endpoints for Twilio-connected bots

You can now configure and manage your Twilio bot configurations using new REST API endpoints. Check out the Twilio endpoints reference docs for more information.

New Integrated TTS Services

Deepgram’s Aura and ElevenLabs are now integrated TTS services. You can use them without needing to bring your own key. Check out their docs pages for more information on how to use each one:

New Claude models supported

Anthropic is an integrated LLM service. We now support additional Claude 3.5 models. See the Anthropic LLM service page for more information.

New Daily service options

daily is listed as a new supported service. Using daily service_options will let you enable different features for your bot. The first feature we’ve added is to mute the bot when a third participant joins the call. This turns your bot into a listener that can continue to transcribe the call. Check out the Daily service page for more information.

Fixes and improvements

  • Fixed an issue where Anthropic function calling was not working reliably.
  • Fixed an issue where the bot’s TTS output could overlap.
  • Fixed an issue where PlayHT was reporting incorrect TTFB metrics.
  • Fixed an issue where Azure TTS was not initializing to non-English languages correctly.
  • Improved the Azure TTS TTS service to support websocket based communication for lower latency TTS.
  • Improved the reliablilty and consistency of the started and stopped speaking events.

2024-10-25

New LLM Services

We’ve added two new LLM providers: Google Gemini and Grok. Both providers require you to bring your own key. Check out their docs pages for more information on how to use each one:

New STT Services

We’ve also added two new STT providers: AssemblyAI and Gladia. Both providers require you to bring your own key. Check out their docs pages for more information on how to use each one:

New Service Options available

Service options allow you to configure a service provider at initialization time. To give you more control over the initial state of these services, we’ve added a number of new service_options to each service.

  • STT providers: You can initialize their language and model (where available) parameters.
  • LLM providers: You can initialize their model parameter.
  • TTS providers: You can initialize their sample_rate.

See the Service Options page to learn more.

Control the sample rate of TTS services

You can now control the sample rate of TTS services. This is useful if you want to change the quality of the audio output. The default sample rate is 24000. You can change this by setting the sample_rate service option in the service_options object. For example:

{
  "service_options": {
    "azure_tts": {
      "sample_rate": 48000
    }
  }
}

More Deepgram SST options

We’ve added support for more Deepgram options. You can now configure any of the available body parameters from the Deepgram’s live audio API as service_options. Refer to the Deepgram STT service page for more information.

2024-10-18

A quick follow-on update after our big deploy from a few days ago. We’ve added a few more TTS services:


2024-10-16

This update is so big, it warranted the creation of the Daily Bots Changelog! You can reference this page in the future to see what changes each time we ship an update to Daily Bots.

In today’s release:

New Text-to-Speech Services

We’ve added several new TTS providers. Take a look at their docs pages for more details on how to use each one:

Service Configuration Options

We’ve added a lot of “pass-through” options for many of the supported services. For example, the OpenAI LLM service now supports configuring things like temperature, top_p, and many other configuration options. Refer to the newly-updated service pages in the API reference for more information.

Bot Profiles: OpenAI Realtime API Beta and Other Updates

We’ve added an initial beta of a bot profile that uses OpenAI’s new Realtime API. You can try using OpenAI’s speech-to-speech model in Daily Bots by using the openai_realtime_beta_2024_10 bot profile. The underlying API itself is still very much in beta and changing quite a bit, so expect the bot profile to change too. Bot profiles are documented here.

We’ve also added new bot profiles for voice and vision: voice_2024_10 and vision_2024_10. These profiles support the just-released RTVI 0.2.0 and offer more events and callbacks. For example, you can now get the bot’s LLM output directly, or the streamed word-by-word speech fed to the TTS service. More information on RTVI 0.2.0 will be available soon; we’re in the process of moving those docs over to the Pipecat docs page.