Chat Completions API

The chat completions endpoint enables multi-turn conversations with KrosAI’s language models. This is ideal for chatbots, virtual assistants, and interactive applications.

Create Chat Completion

POST /v1/chat/completions

Request Body

messages
array
required

Array of messages comprising the conversation history

model
string
required

The ID of the model to use. Currently supported: KrosMLingual1.0.1

max_tokens
integer
default:"100"

The maximum number of tokens to generate

temperature
number
default:"0.7"

Controls randomness in the output. Values between 0 and 1.

Message Object

role
string
required

The role of the message author. Must be one of: system, user, or assistant

content
string
required

The content of the message

Example Request

{
  "model": "KrosMLingual1.0.1",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that translates English to Yoruba."
    },
    {
      "role": "user",
      "content": "Translate: I love you"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 50
}

Example Response

{
  "id": "chatcmpl-456def",
  "object": "chat.completion",
  "created": 1677649420,
  "model": "KrosMLingual1.0.1",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Mo nife re"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 3,
    "total_tokens": 23
  }
}

The chat completion API maintains conversation context across multiple messages.

Error Responses

400: Bad Request
object

Invalid request parameters or message format

401: Unauthorized
object

Invalid or missing API key

429: Too Many Requests
object

Rate limit exceeded

Best Practices

  1. System Messages: Use system messages to set the behavior and context for your assistant.
  2. Message History: Keep message history concise to stay within token limits.
  3. Temperature: Use lower temperature (0.2-0.4) for more focused, deterministic responses.
  4. Rate Limits: Implement proper error handling for rate limits.