Show HN: Vogent – Better Building Blocks for Voice AI

(vogent.ai)

25 points by jag729 a day ago | 7 comments

Hi HN! Excited to share some stuff we’ve been building.

We spent the last year building voice agents to automate individual call tasks for companies with large call centers. The STT-LLM-TTS-VAD cycle is mostly solved at this point, but the last-mile problem for making these agents performant was frustrating. We ended up building a lot of band-aids, and we decided to put them together into their own end-to-end product.

Vogent is a platform for building and serving Voice AI agents, with a focus on providing higher-level building blocks that make it easy to get a voice agent working quickly. You can check out the docs at https://docs.vogent.ai

It supports the typical design process of a voice agent (choosing/prompting a model, selecting a voice, and hosting on a phone number or accessing via API), but it has additional pieces that make voice agents performant quickly, like (among other things):

- A drag-and-drop agent builder

Under the hood, this involves feeding the model only context relevant to the goals of the current node (e.g., asking a particular question and probing conversationally for the answer), while giving it the ability to call a function with the outcome once the goal is achieved to transition to the appropriate next node. This makes it easy to build voice agents that need the structure of a multi-step talk track with the flexibility of accomplishing each task conversationally.

- Voices that are trained to spell

Off-the-shelf voices (e.g. Eleven, Cartesia) sound much more artificial when they spell. It might sound like an edge case, but this killed almost every engagement we had. We ended up recruiting Upworkers with different accents, having them spell a few thousand phrases, and training our own voices by modifying open-source architectures. Choose “Carlos” for a spelling-optimized voice right now; we’re adding a lot more soon.

- An IVR detection model

This detector uses the audio stream to predict whether a line came from an IVR or a human, and switches between different LLMs based on the result (so you can have independent IVR navigation and conversational models).

- Model versioning and counterfactuals

Vogent enables model versioning and testing against past dials within the product.

Any feedback would be appreciated. Please also feel free to join our Discord: https://discord.gg/JmThYcyG

dvaun a day ago | prev | next |

I keep seeing similar aesthetics for landing pages and startup sites. Is there some common template that’s become popular for this purpose?

jag729 a day ago | root | parent |

I used Framer for this; I’m not too sure where the trend for this particular style started, but my guess is Cursor’s landing page (unless they were in turn emulating someone else)