The voice agent that holds when the pilot ends.
The demo is forgiving. Real call volume isn't. Most voice AI pilots die quietly in week three when the queue bugs, the cold-start latency, and the silent hallucinations all show up at once.
A voice agent you can ship. An eval layer that keeps it honest.
End-to-end turn latency
Swap STT, LLM, TTS per engagement
Runs inside your environment
Monitored and evaluated
Most voice AI works in the demo and breaks in the call.
The demo conditions are forgiving. Real call volume isn't. Six places the pilot breaks once production starts.
Turn detection is brittle
The caller interrupts, the agent talks over, the call becomes a standoff. Most stacks use voice activity detection meant for a single speaker.
The agent hallucinates
A TTS bolted onto an LLM invents balances, order numbers, and policy details. The customer believes them. The complaint arrives three days later.
Escalation is a coin flip
The agent does not know when to hand off. The customer gives up before they reach a human and the containment metric still calls it a win.
You cannot see the turn
Session recordings, yes. Per-turn reasoning, latency per stage, escalation rationale, tool-call accuracy — no. You cannot debug what you cannot see.
400ms feels like a dropped call
Every extra hop in the pipeline reads as a frozen agent. The customer fills the silence, the turn-taking collapses, the recording becomes unusable for evals.
The pilot holds. The real day does not
Queue bugs, cold-start latency, missing rate-limit handling. The integration tests missed all of it because they tested the trace, not the conversation.
The models are not the hard part. Shipping a production loop around them is.
Hope-based deployment is not an option for a live call.
Every Lisa deployment ships with an eval layer that tests the whole conversation, not just the trace. Built for voice, wired into the CI loop, runs in your environment.
Reliability, tested semantically.
Tests that pass on "Please enter your phone number" still pass on "Could you provide your contact number?" The simulator matches intent, not string. Change the prompt, the tests hold.
A firewall on every tool call.
An agent can call a CRM endpoint, get a 200 OK, and write the wrong data. Seven layers of validation sit on every tool output — exact match, regex, JSON schema, numeric tolerance, semantic equivalence, LLM-as-judge. The wrong data never reaches the customer or the system of record.
Cost per conversation, not per token.
Token counts don't tell you whether the business goal was hit. Per-conversation cost aggregated across turns, waste detection on redundant tool calls, model-swap ROI measured in unit economics. You get the number the CFO reads, not the one the engineers debug with.
The parts that make Lisa specific
Not a voice model with a phone number bolted on. A production stack built for live turn-taking, modular at every layer.
Not a chatbot with a phone number
Real-time voice system from the start. Turn-aware recognition, live reasoning, and speech synthesis in one session loop.
Knows when to stop
Escalates when automation should end. The goal is customer resolution, not containment metrics.
Modular, not monolithic
Swap STT, TTS, or LLM providers without touching the rest of the system. Adopt better models as they ship.
Observable from day one
Voice-session monitoring and agent quality evaluation built in.
of calls resolved without a human
Run Lisa against your own call script.
Bring a real call script. We configure Lisa against it, run it through the eval layer, and hand you the trace for every turn before anything ships.
Voice AI Orchestration — The Details
Pipecat orchestrates every turn. Every module — telephony edge, STT, LLM, TTS — is swappable. Adopt a better model the week it ships without rewriting the loop.
The call hits the telephony edge and the audio stream is handed to Pipecat. The edge is swappable — any SIP or media-stream provider that speaks RTP.
Pipecat routes the live audio to the STT module. Speech is transcribed in real time and turn boundaries are detected — start of turn, end of turn, eager end of turn. The STT provider is swappable.
The LLM module reasons over the request. If Lisa can answer directly, she does. If the request needs live data — account balance, order status, claim details — she calls the matching tool. The LLM is swappable.
The TTS module synthesizes Lisa's response into natural speech, streamed back through the call in real time. Pipecat handles barge-in and interruption. The TTS provider is swappable.
If the customer has another question, the call continues as another turn. When satisfied, the call ends. If escalation is needed, Lisa hands off cleanly. Session state is deleted on teardown.
Pipecat manages turn awareness for live conversation — start of turn, end of turn, eager end of turn, turn resumed. Configurable thresholds for end-of-turn detection and response timing make the interaction feel natural rather than robotic.
When automation should stop, Lisa hands off cleanly and the customer reaches a person without re-explaining. When a call ends, the live session is closed and temporary state is deleted. Runs on-premises, Azure, AWS, or GCP — inside the customer boundary. Pricing matched to scale and use case.