The DailyVoiceClient takes a list of services along with JSON configuration used to set up those services. As an open source component, the services and their configurations is vastly flexible. However, Daily Bots has a pre-defined set of services that can be used and keys to configure them.

The full set of configuration options is detailed in the RTVI API Reference. This page will cover the format Daily Bots expects for the services and configuration.

Services

For the voice to voice and vision and voice bot profiles, Daily Bots expects the services to contain the following keys:

{
  "services": {
    "tts": "<tts service name>",
    "llm": "<llm service name>"
  }
}

For the list of accepted service name values, see the supported services page. The list will be expanded as more services are added.

Configuration

The configuration object is where you set various options for your services. This should be sent as a part of your dailyVoiceClient constructor but can also be updated at any time using updateConfig().

Note that the config is a list. This is because when you set up your configuration it is important to understand that order matters and that the configuration will be applied in the order it has been defined. For example, if you want to configure a TTS service with a voice and an LLM service with some new prompting, you will want to specify the TTS service first so the new voice gets applied before the prompting. The same ordering concept applies to service options. The configuration order makes things deterministic for all RTVI implementations.

For Daily Bots you must provide a configuration options for both your llm and tts service. The general format for the configuration object is as follows:

{
  "config": [
    {
      "service": <service>, // "tts" || "llm"
      "options": [
        {
          "name": "<option name>",
          "value": "<option value>"
        }
      ]
    },
    {
      "service": <service>, // "tts" || "llm"
      "options": [
        {
          "name": "<option name>",
          "value": "<option value>"
        }
      ]
    }
  ]
}

TTS Options

For TTS services, there’s typically only one option set: the voice. The value for the voice is defined by the service you are using.

voice
string
required

The voice you want your TTS service to use. This is a required field. For Cartesia, you can view the list of available voices and get their ids here. An account is required. Once you get to the playground, click “Library” -> “Default Voices”.

Example:

{
  "name": "voice",
  "value": "79a125e8-cd45-4c13-8a67-188112f4dd22"
}

LLM Options

For LLM services, the options and the format of those options are generally defined by the service and model you are using. Each option expects a "name" and "value" field, allowing it the bot to dynamically apply the option to any llm and model. For Daily Bots, there are a few options that are required or commonly used detailed below:

model
string
required

The model you want to use for the LLM service. This is a required field. When using an integrated service without an API key, the model string must match one of the options outlined in the supported services page.

Example:

{
  "name": "model",
  "value": "claude-3-5-sonnet-20240620"
}
initial_messages
Array[LLM messages]
required

The initial set of messages to prompt the LLM service with. This is a required field when providing the first configuration. For subsequent configuration updates, use the “messages” option. The format of the messages is defined by the service and model you are using, but generally contains setting a role and content.

Examples:

messages
Array[LLM messages]
required

Same as initial_messages, but used for subsequent configuration updates. This is a required field when providing a configuration update. For the first configuration, use the “initial_messages” option.

run_on_config
bool

run_on_config is a boolean field that forces the bot to talk first. Without this setting, the bot will not begin speaking until the user does. This is an optional field and defaults to false.

IMPORTANT: This field typically should be listed last in your configuration to ensure the bot does not start speaking before it receives its initial messages. Otherwise, fun bot hallucinations may occur.

enable_prompt_caching
bool

Currently only works with Anthropic Claude 3.5 Sonnet and Claude 3 Haiku

Setting this field to true will enable Anthropic’s prompt caching feature. This feature allows the bot to remember the last prompt it received and use it as a starting point for the next prompt. This is an optional field and defaults to false.

tools
Array[Tool Definition]

Currently only works with Anthropic and OpenAI

This field describes to the LLM all the tools it has access to and how to call them. This feature is also referred to as “function calling”. The format for describing each tool is highly dependent on the service, but typically require you to give your tool a name, a description and an object detailing the set of parameters your tool expects to receive. For more information on setting up tool calling, see our full tutorial. This is an optional field and defaults to an empty array.

Examples: