An Azure service that integrates speech processing into apps and services.
Hi Sankar Ramakrishnan, Prathap
During Voice Live API interactions, you are charged
- Text tokens
- Audio input tokens
- Audio output tokens
- Cached audio and text context
- Any Azure Speech components used (standard or custom avatar, audio models etc.)
Cost will differ based on scenarios.
Scenario without custom text to speech model or avatars
you can use speech pricing guide to estimate the cost incurred.
Attached pricing in one of supported region (East US)
Example Scenario for usage in East US.
| Token Type | Monthly Tokens |
| Text Input | 30,000,000 |
| Text Output | 50,000,000 |
| Audio Input | 100,000,000 |
| Audio Output | 120,000,000 |
Cached split (30%)
- Cached Text Input: 9M
- Cached Audio Input: 30M
Cost Calculation
- Text Input (non‑cached): 21M × $4.40 = $92.40
- Cached Text Input: 9M × $1.375 = $12.38
- Text Output: 50M × $17.60 = $880.00
Total Text Cost: $984.78
Please take a minute to accept this answer if you found it helpful.
Thank you.