Overview of real-time voice agent design optimization (preview)

[This article is prerelease documentation and is subject to change.]

The articles in this guide walk you through the best practices for building real-time voice agents using Microsoft technologies. These articles provide a practical guide for people who need to design and build real-time agents (RTA) and want a way to choose the right approach based on customer journey.

Apart from this guide, Microsoft developed templates based on real-world customer implementations that you can find on Dynamics 365 Contact Center Forward Deployed Engineering GitHub.

Important

  • This is a preview feature.
  • Preview features aren’t meant for production use and might have restricted functionality. These features are subject to supplemental terms of use, and are available before an official release so that customers can get early access and provide feedback.

Select the orchestration and voice mode

When you build an AI agent, start with one principle:

  • Every agent's design begins by deciding how the conversation is controlled.

For voice agents, there's a second, equally important decision:

  • How speech is handled end-to-end.

Conversation orchestration and speech mode shape cost, latency, flexibility, compliance, and operational complexity.

The following diagram summarizes the selection choices.

Screenshot of orchestration model selection with Classic, Hybrid, and Generative options, plus Basic and Streaming speech mode panels.