• Supported service: llm
  • Key: together
  • Integrated: Yes

Service options

model
string
default: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"

The name of the model to query. Available integrated models:

  • meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
  • meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
  • meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
  • meta-llama/Llama-3.2-3B-Instruct-Turbo
  • meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
  • meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
{
  "service_options": {
    "together": {
      "model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
    }
  }
}

Configuration options

model
string

The name of the model to query. Available models:

  • meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
  • meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
  • meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
  • meta-llama/Llama-3.2-3B-Instruct-Turbo
  • meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
  • meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
{
  "name": "model",
  "value": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
}
frequency_penalty
float

A number between -2.0 and 2.0 where a positive value decreases the likelihood of repeating tokens that have already been mentioned.

{
  "name": "frequency_penalty",
  "value": 0.7
}
max_tokens
integer

The maximum number of tokens to generate.

{
  "name": "max_tokens",
  "value": 4096
}
presence_penalty
float

A number between -2.0 and 2.0 where a positive value increases the likelihood of a model talking about new topics.

{
  "name": "presence_penalty",
  "value": 1.1
}
temperature
float

A decimal number from 0-1 that determines the degree of randomness in the response. A temperature less than 1 favors more correctness and is appropriate for question answering or summarization. A value closer to 1 introduces more randomness in the output.

{
  "name": "temperature",
  "value": 0.9
}
top_k
integer

An integer that’s used to limit the number of choices for the next predicted word or token. It specifies the maximum number of tokens to consider at each step, based on their probability of occurrence. This technique helps to speed up the generation process and can improve the quality of the generated text by focusing on the most likely options.

{
  "name": "top_k",
  "value": 42
}
top_p
float

A percentage (also called the nucleus parameter) that’s used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. It specifies a probability threshold below which all less likely tokens are filtered out. This technique helps maintain diversity and generate more fluent and natural-sounding text.

{
  "name": "top_p",
  "value": 0.5
}
extra
object

A dictionary that can contain any additional parameters supported by Together AI that you want to pass to the API. Refer to the Together AI docs for more information on each of these configuration options.

{
  "name": "extra",
  "value": {
    "repetition_penalty": 12,
    "min_p": 0.9
  }
}

Function Calling

Together AI’s models, including their versions of Llama 3.1, support function calling using the OpenAI interface. For examples of how to use that approach, see the OpenAI service page or refer to the function calling tutorial for more information.