2025-01-08

Gemini Multimodal Live: now supports video input

Gemini Multimodal Live now supports video input. You can now pass in either camera video or screen share video from your client to Daily Bots to generate multimodal responses. Ensure that you’re sending only camera or screen share video. To get started:

Rime: new integrated TTS service

Rime is now an integrated TTS service. You can use Rime without needing to bring your own key. Check out the Rime TTS service page to learn how to get started.

Noise cancellation enabled by default

Noise cancellation is now enabled by default for the Daily service at no additional cost. This feature eliminates background sounds, including voices, from a participant’s audio stream. It enhances the bot’s ability to accurately understand the participant’s voice and prevents interruptions caused by background noise or conversations.

2024-12-24

Model support for OpenAI Realtime

OpenAI announced new models for their OpenAI Realtime API. We’ve added support for these models in Daily Bots. When using the openai_realtime_beta_2024_10 bot profile, you can now specify the model you want to use. The default model is gpt-4o-realtime-preview-2024-12-17. Learn more in the Daily Bots docs.

ElevenLabs: new model, more languages

ElevenLabs released their new Flash 2.5 model. Daily Bots now supports this model, which provides latencies between 150 - 200 ms. The eleven_flash_v2_5 model is now the default model for ElevenLabs. You can specify the model in the ElevenLabs configuration.

ElevenLabs also added support for more languages. You can now use ElevenLabs with Arabic, Croatian, Filipino, and Tamil.

See the docs to get started.

PlayHT: reliability improvements, more languages

PlayHT has improved their websocket API to provide more robust connections. We’ve updated our PlayHT integration to take advantage of these improvements. You can now expect more reliable connections and better performance when using PlayHT.

PlayHT also added support for more languages. You can now use PlayHT with Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian, Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa.

Bugfixes

  • Fixed an issue that could cause the bot to stop talking if there was a user interruption before getting any audio from the TTS service.

2024-12-11

Gemini Multimodal Live 🎉

We’ve partnered with Google in support of their Gemini Multimodal Live API. This new bot profile, gemini_multimodal_live_2024_12, allows you to use the Gemini Multimodal Live API with your Daily Bots. The Gemini Multimodal Live API is a speech-to-speech API that can generate text, audio, and video responses. You can use this bot profile to create bots that can generate multimodal responses in real-time.

Along with this new bot profile comes a new speech-to-speech LLM model, gemini_live. You can use the gemini_live model along with the gemini_multimodal_live_2024_12 bot profile to generate multimodal responses using the Gemini Multimodal Live API.

Learn more at the Gemini Live service page.

Gemini 2.0 support

More Gemini news! We’ve updated the Gemini LLM to support Gemini 2.0 Flash. This is now the default LLM model. To set it explicitly, just pass in models/gemini-2.0-flash-exp to the model parameter for the gemini service.

Learn more at the Gemini service page.

New LLM Service: Azure

We’ve added Azure’s OpenAI LLM service to Daily Bots. Check out the Azure LLM service page to learn how to get started.

2024-11-21

Natural Conversations

The new bot_profile called natural_conversation_2024_11 enables more natural dialogue with the bot. This profile uses two specialized LLMs working together:

  • A “judge” LLM that determines when users have finished speaking
  • A conversation LLM that maintains the full context of the dialogue

The judge LLM analyzes both:

  • The most recent conversation turn
  • The completeness of the user’s speech

This allows users to pause and think without being interrupted. While this approach works well in most scenarios, we’ve implemented a timeout mechanism as a fallback in cases where the judge LLM incorrectly determines speech completion.

We’re actively improving this feature and welcome your feedback! Join our community on Discord to share your experience.

Noise cancellation

You can now enable noise cancellation for the Daily service. This feature eliminates background sounds, including voices, from a participant’s audio stream. By doing so, it enhances the bot’s ability to accurately understand the participant’s voice and prevents interruptions caused by background noise or conversations. Check out the Daily service page for more information.

New TTS Service: Rime

Rime is now available as a TTS service. Learn how to get started by visiting the Rime TTS service page.

New TTS service_option: voice

You can now specify voice as a service_option for your TTS service. Using the voice service_option initializes the TTS service with the specified voice before joining. With this change, you can still specify voice as a config option during runtime.

2024-11-10

Recording your Daily Bots sessions

You can now record your Daily Bots sessions. To learn more about how to get started, check out the Recording guide and the recording API reference docs.

New REST API endpoints for Twilio-connected bots

You can now configure and manage your Twilio bot configurations using new REST API endpoints. Check out the Twilio endpoints reference docs for more information.

New Integrated TTS Services

Deepgram’s Aura and ElevenLabs are now integrated TTS services. You can use them without needing to bring your own key. Check out their docs pages for more information on how to use each one:

New Claude models supported

Anthropic is an integrated LLM service. We now support additional Claude 3.5 models. See the Anthropic LLM service page for more information.

New Daily service options

daily is listed as a new supported service. Using daily service_options will let you enable different features for your bot. The first feature we’ve added is to mute the bot when a third participant joins the call. This turns your bot into a listener that can continue to transcribe the call. Check out the Daily service page for more information.

Fixes and improvements

  • Fixed an issue where Anthropic function calling was not working reliably.
  • Fixed an issue where the bot’s TTS output could overlap.
  • Fixed an issue where PlayHT was reporting incorrect TTFB metrics.
  • Fixed an issue where Azure TTS was not initializing to non-English languages correctly.
  • Improved the Azure TTS TTS service to support websocket based communication for lower latency TTS.
  • Improved the reliablilty and consistency of the started and stopped speaking events.

2024-10-25

New LLM Services

We’ve added two new LLM providers: Google Gemini and Grok. Both providers require you to bring your own key. Check out their docs pages for more information on how to use each one:

New STT Services

We’ve also added two new STT providers: AssemblyAI and Gladia. Both providers require you to bring your own key. Check out their docs pages for more information on how to use each one:

New Service Options available

Service options allow you to configure a service provider at initialization time. To give you more control over the initial state of these services, we’ve added a number of new service_options to each service.

  • STT providers: You can initialize their language and model (where available) parameters.
  • LLM providers: You can initialize their model parameter.
  • TTS providers: You can initialize their sample_rate.

See the Service Options page to learn more.

Control the sample rate of TTS services

You can now control the sample rate of TTS services. This is useful if you want to change the quality of the audio output. The default sample rate is 24000. You can change this by setting the sample_rate service option in the service_options object. For example:

{
  "service_options": {
    "azure_tts": {
      "sample_rate": 48000
    }
  }
}

More Deepgram SST options

We’ve added support for more Deepgram options. You can now configure any of the available body parameters from the Deepgram’s live audio API as service_options. Refer to the Deepgram STT service page for more information.

2024-10-18

A quick follow-on update after our big deploy from a few days ago. We’ve added a few more TTS services:


2024-10-16

This update is so big, it warranted the creation of the Daily Bots Changelog! You can reference this page in the future to see what changes each time we ship an update to Daily Bots.

In today’s release:

New Text-to-Speech Services

We’ve added several new TTS providers. Take a look at their docs pages for more details on how to use each one:

Service Configuration Options

We’ve added a lot of “pass-through” options for many of the supported services. For example, the OpenAI LLM service now supports configuring things like temperature, top_p, and many other configuration options. Refer to the newly-updated service pages in the API reference for more information.

Bot Profiles: OpenAI Realtime API Beta and Other Updates

We’ve added an initial beta of a bot profile that uses OpenAI’s new Realtime API. You can try using OpenAI’s speech-to-speech model in Daily Bots by using the openai_realtime_beta_2024_10 bot profile. The underlying API itself is still very much in beta and changing quite a bit, so expect the bot profile to change too. Bot profiles are documented here.

We’ve also added new bot profiles for voice and vision: voice_2024_10 and vision_2024_10. These profiles support the just-released RTVI 0.2.0 and offer more events and callbacks. For example, you can now get the bot’s LLM output directly, or the streamed word-by-word speech fed to the TTS service. More information on RTVI 0.2.0 will be available soon; we’re in the process of moving those docs over to the Pipecat docs page.