Skip to main content

v1.70.1-stable - Gemini Realtime API Support

Krrish Dholakia
Ishaan Jaffer

New Models / Updated Models​

  • Gemini (VertexAI + Google AI Studio)
    • /chat/completion
      • Handle audio input - PR
      • Fixes maximum recursion depth issue when using deeply nested response schemas with Vertex AI by Increasing DEFAULT_MAX_RECURSE_DEPTH from 10 to 100 in constants. PR
      • Capture reasoning tokens in streaming mode - PR
  • Google AI Studio
    • /realtime
      • Gemini Multimodal Live API support
      • Audio input/output support, optional param mapping, accurate usage calculation - PR
  • VertexAI
    • /chat/completion
      • Fix llama streaming error - where model response was nested in returned streaming chunk - PR
  • Ollama
    • /chat/completion
      • structure responses fix - PR
  • Bedrock
    • /chat/completion
      • Handle thinking_blocks when assistant.content is None - PR
      • Fixes to only allow accepted fields for tool json schema - PR
      • Add bedrock sonnet prompt caching cost information
      • Mistral Pixtral support - PR
      • Tool caching support - PR
    • /messages
      • allow using dynamic AWS Params - PR
  • Nvidia NIM
    • /chat/completion [NEED DOCS ON SUPPORTED PARAMS]
      • Add tools, tool_choice, parallel_tool_calls support - PR
  • Novita AI
    • New Provider added for /chat/completion routes - PR
  • Azure
  • Cohere
    • /embeddings
      • Migrate embedding to use /v2/embed - adds support for output_dimensions param - PR
  • Anthropic
  • VLLM
    • /chat/completion
      • Support embedding input as list of integers - PR [NEEDS DOCS]
  • OpenAI
    • /chat/completion
      • Fix - b64 file data input handling - PR
      • Add ‘supports_pdf_input’ to all vision models - PR

LLM API Endpoints​

  • Responses API
    • Fix delete API support - PR
  • Rerank API
    • /v2/rerank now registered as ‘llm_api_route’ - enabling non-admins to call it - PR
  • Realtime API
    • Gemini Multimodal Live API support - PR

Spend Tracking Improvements​

  • /chat/completion, /messages
    • Anthropic - web search tool cost tracking - PR
    • Groq - update model max tokens + cost information - PR
  • /audio/transcription
    • Azure - Add gpt-4o-mini-tts pricing - PR
    • Proxy - Fix tracking spend by tag - PR
  • /embeddings
    • Azure AI - Add cohere embed v4 pricing - PR

Management Endpoints / UI​

Logging / Alerting Integrations​

  • StandardLoggingPayload
    • Log any x- headers in requester metadata - PR [NEEDS DOCS]
    • Guardrail tracing now in standard logging payload - PR [NEEDS DOCS]
  • Generic API Logger
    • Support passing application/json header
  • Arize Phoenix
    • fix: URL encode OTEL_EXPORTER_OTLP_TRACES_HEADERS for Phoenix Integration - PR
    • add guardrail tracing to OTEL, Arize phoenix - PR
  • PagerDuty
    • Pagerduty is now a free feature - PR
  • Alerting
    • Sending slack alerts on virtual key/user/team updates is now free - PR

Guardrails​

  • Guardrails
    • New /apply_guardrail endpoint for directly testing a guardrail - PR [NEEDS DOCS]
  • Lakera
    • /v2 endpoints support - PR
  • Presidio
    • Fixes handling of message content on presidio guardrail integration - PR
    • Allow specifying PII Entities Config - PR
  • Aim Security
    • Support for anonymization in AIM Guardrails - PR

Performance / Loadbalancing / Reliability improvements​

General Proxy Improvements​

  • Authentication
    • Handle Bearer $LITELLM_API_KEY in x-litellm-api-key custom header PR
  • New Enterprise pip package - litellm-enterprise - fixes issue where enterprise folder was not found when using pip package
  • Proxy CLI
    • Add models import command - PR
  • OpenWebUI
    • Configure LiteLLM to Parse User Headers from Open Web UI
  • LiteLLM Proxy w/ LiteLLM SDK
    • Option to force/always use the litellm proxy when calling via LiteLLM SDK

New Contributors​