Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.spn.wtf/llms.txt

Use this file to discover all available pages before exploring further.

Agent Options

The repository contains two voice-agent implementations.
FileStackBest use
agent.pySarvam STT + Sarvam-M LLM + Sarvam TTSIndian-language pipeline with separate speech components
agent_realtime.pyOpenAI RealtimeLow-latency all-in-one realtime conversation
Run one agent service at a time for the calling flow you want.

Sarvam Agent

Command:
python agent.py start
Pipeline:
Lead audio
  -> Sarvam STT
  -> Sarvam-M LLM
  -> Sarvam TTS
  -> LiveKit audio response
Key settings:
SARVAM_API_KEY=your_sarvam_api_key
STT_LANGUAGE=en-IN
STT_MODEL=saarika:v2.5
STT_FLUSH_SIGNAL=true
MIN_ENDPOINTING_DELAY_SECONDS=0.65
Qualification fields:
  • machine_interest
  • location
  • budget
  • first_product
  • units_per_day
  • local_demand
  • new_or_expand
  • partnership
  • operators_needed
  • own_brand
  • seriousness
  • factory_visit
  • video_demo

OpenAI Realtime Agent

Command:
python agent_realtime.py start
Pipeline:
Lead audio
  -> OpenAI Realtime session
  -> model handles listening, reasoning, and speaking
  -> LiveKit audio response
Key settings:
OPENAI_API_KEY=your_openai_api_key
OPENAI_REALTIME_MODEL=gpt-4o-realtime-preview
OPENAI_REALTIME_TRANSCRIBE_MODEL=whisper-1
OPENAI_REALTIME_TEMPERATURE=0.5
CALL_MAX_DURATION_SECONDS=240
Qualification fields:
  • product_interest: carbon cleaning, battery regeneration, car wash, franchise, or unsure
  • location
  • budget: below 10 lakhs, 10-25 lakhs, or >25 lakhs

Save Behavior

Both agents:
  1. Build a transcript/history during the call.
  2. Extract structured qualification from the transcript.
  3. Remove null values.
  4. Save the JSON to calls.qualification.
  5. Mark the call called.
  6. Hang up or finish the room.
If the participant disconnects first, the agent attempts a partial save. If no useful answers exist, it marks the call failed and can schedule a retry.

Language Behavior

agent_realtime.py contains detection and reply guidance for English and Indian-language user intent, including local-language and romanized-language handling. It also normalizes saved qualification values back to English-only structured data so downstream CRM payloads stay consistent.

Call Limits

OpenAI Realtime mode watches maximum call age:
CALL_MAX_DURATION_SECONDS=240
When the maximum duration is reached, the agent saves available qualification data and hangs up.