Overview

The DailyVoiceClient is the component you will primarily interface with for interacting with your Daily Bot. This client is part of the RTVI suite of client libraries, all of which follow the same API design detailed below. The purpose of this component is to:

  • Provides a start() method that handshakes / authenticates with your bot service.
  • Configures your bot services.
  • Manages your media connections.
  • Provides methods, callbacks and events for interfacing with your bot

The DailyVoiceClient is an extention of the RTVI VoiceClient, using Daily’s transport for voice and video communication with the bot. A Daily transport is required for Daily Bots. This page provides details on the most common parameters, methods, and callbacks you will use with the DailyVoiceClient along with specific API expectations required for Daily Bots. For full reference material and installation instructions, visit the docs for your corresponding library.

Source Docs

API Reference

Constructor Parameters

baseUrl
string
required

Handshake URL to your hosted endpoint that triggers authentication, transport session creation and bot instantiation.

The VoiceClient will send a JSON POST request to this URL and pass the local configuration (config) as a body param.

The DailyVoiceClient expects this endpoint to return the response from the Daily Bots endpoint to establish the connection. Doing so will then established the connection automatically.

services
Object <{ [key: string]: string }>
required

A key value object service registration, each representing an available service (such as OpenAI, ElevenLabs, or a local LLM etc) that is made available in your bot file.

While the keys provided can technically be anything, the Daily Bots endpoint expects an "llm" and "tts", each with a value matching what’s currently supported.

Example:

{
  "llm": "anthropic",
  "tts": "cartesia"
}
config
Array <VoiceClientConfigOption[]>

Pipeline configuration object for your registered services. Must contain a valid VoiceClientConfig array.

Client config is passed to the bot at startup, and can be overriden in your server-side endpoint (where sensitive information can be provided, such as API keys.)

See configuration

callbacks
{ callback:()=>void }

An array of callback functions. See: callbacks

enableMic
boolean
default: "true"

Enable user’s local microphone device.

enableCamera
boolean
default: "false"

Enable user’s local webcam device.

customHeaders
Object <{ [key: string]: string }>

Custom HTTP headers to include in the initial start web request to the baseUrl.

customBodyParams
Object

Pass through custom request properties to base URL as part of the start method.

Methods

initDevices()

This method initializes the media device selection and allows user’s to test and switch media devices before starting the conversation.

start()

Sets up and starts the conversation.

disconnect()

Stops the conversation and tears down all network connections.

enableMic(enable: boolean)

Enables or disables the user’s microphone, based on provided enable.

enableCam(enable: boolean)

Enables or disables the user’s camera, based on provided enable.

tracks()

Returns all available MediaStreamTracks for the user and bot.

// Return type
{
  local: {
    audio?: MediaStreamTrack;
    video?: MediaStreamTrack;
  },
  bot?: {
    audio?: MediaStreamTrack;
    video?: MediaStreamTrack;
  }
}

Callbacks

botReady: Bot is connected and ready to receive messages
transcript: STT transcript (both local and remote) flagged with partial, final or sentence
config: Bot configuration
error: Bot initialization error
errorResponse: Error response from the bot in response to an action
configAvailable: Configuration options available on the bot
configUpdated: Configuration options have changed successfully
configError: Configuration options have changed failed
actionsAvailable: Actions available on the bot
metrics: RTVI reporting metrics
userTranscription: Local user speech to text
ttsText: Bot speech to text
userStartedSpeaking: User started speaking
userStoppedSpeaking: User stopped speaking
botStartedSpeaking: Bot started speaking
botStoppedSpeaking: Bot stopped speaking

Example Code