Tidying up your Next.js project

The default project template includes additional files that we don’t need for this tutorial.

Let’s delete the following files:

  • public/next.svg
  • public/vercel.svg

We’ll remove the placeholder content from /app/page.tsx:

It is important to note the "use-client"; directive at the very top of the file. This instructs Next.js to only render this code on the client side (in the browser). This is important because we will be using the Daily Voice Client, which is a browser-based client.

Creating the Voice Client

The RTVI Daily package includes a DailyVoiceClient class that we can use to connect to our bot.

Let’s create a new instance of the DailyVoiceClient in our Home component and provide it with some initial configuration:

baseUrl

The baseUrl option is the URL that the Daily Voice Client will use to request an authentication bundle (a Daily room URL and token) from the Daily Bots API, and start a bot in the process. In this case, we are using the /api route we just created (/app/api/route.ts).

services

The services map defines which providers you’d like to use for each function of your bot, in this case the LLM and text-to-speech (TTS) services.

In the above example, we are using Together with Llama 3.1 70B and Cartesia for TTS.

You can read more about which services we currently support, as well as how to provide your own API keys here.

config

The config array contains the configuration for each service and the options you’d like to pass to them.

In the above example, we are setting the voice for the TTS service to 79a125e8-cd45-4c13-8a67-188112f4dd22 and the model for the LLM service to meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo.

The service string must match that of a service you have defined in your services map.

Why is config an array?

Initially, it might seem a little strange to use an array vs. a map for the config object.

The RTVI standard is designed with flexibility in mind, and passing an array allows you to define the order in which you want your config options to run.

Daily Bots are built on Pipecat, an open-source pipeline orchestration framework that allows you to chain together multiple services.

Pipecat works by dispatching ‘frames’ in a specific order, and the order in which you define options in the config array will determine the sequence in which they are dispatched.

This allows for a high degree of flexibility in how you structure your bot’s logic.

Remember: the order in which you define your service and config options in the array is important!

We also define { name: "run_on_config", value: true },

This will instruct the bot to run the LLM service as soon as it connects. If you’d prefer to start the conversation, you can remove this line (or set it to false).

To emphasize the importance of config ordering, let’s consider run_on_config as an example.

If we were to move the run_on_config option above initial_messages in our config array, the Pipecat bot would consume this and run the LLM service before configuring the initial messages. This would result in a garbled response from the LLM, as it has no context messages to work with.

Adding an audio component

We need a way to hear our bot once we have connected. The realtime-ai-react package includes a handy VoiceClientAudio component that will subscribe to the bot’s incoming audio track.

In order to use this, we’ll need to wrap our project inside a VoiceClientProvider context.

Using this context means we can now use the hooks and components provided by realtime-ai-react to interact with our bot.

Starting the session

Now that we have our DailyVoiceClient instance set up, we can start the session by calling the start method.

Let’s create a new App.tsx component in app/ and add the following code:

And update our page.tsx to use this component, making sure to wrap it in the VoiceClientProvider:

Calling the start() method

We’ll add a button to connect to the Daily Bot (and disable it once we’re connected):

Click the button, and you should hear your bot say hello!

Please ensure you accept your browser’s permissions to use your microphone.

Why is start() async?

The start method can be invoked either asynchronously or synchronously. If called with await, the returned promise will resolve once the bot enters into a ‘ready’ state (indicating that the bot is connected and ready to receive input).

Calling it asynchronously also allows us to try / catch any errors that may occur during the connection process.

If you’d rather not use asynchronous code, you can invoke start without await and use the various events and callbacks provided by RTVI.

Wrapping up

Let’s add a bit more functionality to our app. We’ll add an onscreen transcription of the bot’s responses, and disable the start button once we’re connected.