Introduction
Speech transcription and synthesis are useful capabilities in many scenarios, including:
- Documenting spoken conversations in calls and meetings.
- Generating captions for videos or presentations.
- Creating audible user interfaces to improve application accessibility.
- Developing hands-free AI assistants that read text messages or emails aloud.
In this module, we'll explore how to use speech-capable generative AI models in Microsoft Foundry to convert speech to text and text to speech.
Note
We recognize that different people like to learn in different ways. You can choose to complete this module in video-based format or you can read the content as text and images. The text contains greater detail than the videos, so in some cases you might want to refer to it as supplemental material to the video presentation.