If I were to setup a VoiceAI infrastructure with today’s tools, here’s how I’d do it: I would use someone like Twilio or Telnex for the Telephony Edge. I’d use LiveKit for the real time session information and media bridge and VoxGraph.ai for the prompt management. The Voice AI provider is focused on the business logic and the marketing to their specific niche. There is still an issue of phone number management if you are handling inbound calls.



High Level Call Flow:
Carrier gets the call
→ LiveKit runs the realtime session
→ The AI Voice Provider decides what to say/do
→ AI stack turns speech into action and action into speech
→ Business systems complete the job
→ VoxGraph measures whether the whole thing is actually good
Detailed Call Flow:
call.started
caller.identified
prompt.version.loaded
stt.partial_received
llm.response_started
tts.playback_started
booking.lookup_started
booking.completed
human_escalation_requested
call.ended
post_call_summary_generated