Together AI

Supported service: llm
Key: together
Integrated: Yes

Service options

model

string

default:"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"

The name of the model to query. Available integrated models:

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
meta-llama/Llama-3.2-3B-Instruct-Turbo
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo

{
  "service_options": {
    "together": {
      "model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
    }
  }
}

Configuration options

model

string

The name of the model to query. Available models:

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
meta-llama/Llama-3.2-3B-Instruct-Turbo
meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo

{
  "name": "model",
  "value": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
}

frequency_penalty

float

A number between -2.0 and 2.0 where a positive value decreases the likelihood of repeating tokens that have already been mentioned.

{
  "name": "frequency_penalty",
  "value": 0.7
}

max_tokens

integer

The maximum number of tokens to generate.

{
  "name": "max_tokens",
  "value": 4096
}

presence_penalty

float

A number between -2.0 and 2.0 where a positive value increases the likelihood of a model talking about new topics.

{
  "name": "presence_penalty",
  "value": 1.1
}

temperature

float

A decimal number from 0-1 that determines the degree of randomness in the response. A temperature less than 1 favors more correctness and is appropriate for question answering or summarization. A value closer to 1 introduces more randomness in the output.

{
  "name": "temperature",
  "value": 0.9
}

top_k

integer

An integer that’s used to limit the number of choices for the next predicted word or token. It specifies the maximum number of tokens to consider at each step, based on their probability of occurrence. This technique helps to speed up the generation process and can improve the quality of the generated text by focusing on the most likely options.

{
  "name": "top_k",
  "value": 42
}

top_p

float

A percentage (also called the nucleus parameter) that’s used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. It specifies a probability threshold below which all less likely tokens are filtered out. This technique helps maintain diversity and generate more fluent and natural-sounding text.

{
  "name": "top_p",
  "value": 0.5
}

extra

object

A dictionary that can contain any additional parameters supported by Together AI that you want to pass to the API. Refer to the Together AI docs for more information on each of these configuration options.

{
  "name": "extra",
  "value": {
    "repetition_penalty": 12,
    "min_p": 0.9
  }
}

Function Calling

Together AI’s models, including their versions of Llama 3.1, support function calling using the OpenAI interface. For examples of how to use that approach, see the OpenAI service page or refer to the function calling tutorial for more information.

Client Reference

Server Reference

Services

Recording

Phone Numbers

Twilio Websocket

Service options

Configuration options

Function Calling

Client Reference

Server Reference

Services

Recording

Phone Numbers

Twilio Websocket

​Service options

​Configuration options

​Function Calling

Service options

Configuration options

Function Calling