Share via

can some one help, how to config voicelive sdk to recieve animation blendshapes and viseme_id

Dadong Hu 0 Reputation points
2026-02-25T17:09:02.8933333+00:00

it try to add this but no animation data recieve.

  modalities: ["text", "audio", 'animation'],

  outputAudioTimestampYypes: ["word"],  

animation: {

    modelName: "default",

    outputs: ["viseme_id"]

  },
Azure AI Speech
Azure AI Speech

An Azure service that integrates speech processing into apps and services.

{count} votes

2 answers

Sort by: Most helpful
  1. Anshika Varshney 7,995 Reputation points Microsoft External Staff Moderator
    2026-02-26T15:32:17.4266667+00:00

    Hi Dadong Hu,

    I’ve seen this issue come up when trying to wire Voice Live SDK into a React application, especially for the first time. In most cases it’s not a single bug, but a combination of setup and environment gaps. Below are the steps that usually resolve it.

    Things to verify first

    1. SDK availability for React Voice Live doesn’t provide a full “drop‑in” React SDK in the same way as some other Azure client libraries. For JavaScript/TypeScript, the official support is currently via preview SDKs and samples, and most working React apps use:
      • A WebSocket‑based integration
        • Or a small Node.js proxy to handle authentication securely This is consistent with the official Voice Live SDK documentation and samples. [learn.microsoft.com], [github.com]
    2. Do not expose keys in the browser If you are trying to connect directly from React using your Azure key, the connection will fail or behave unpredictably.
      • Browser apps should not call Voice Live directly with keys
      • Use a backend (Node/Express, Azure Functions, etc.) as a proxy that creates the Voice Live session This is a common root cause for connection or auth errors in React setups. [iloveagents.ai]

    Troubleshooting steps:

    1. Start from an official sample Before integrating into React, validate your Azure resource and credentials by running one of the official samples (JavaScript/TypeScript or Node) from the Voice Live samples repository.
      • This confirms your resource, endpoint, and model are correct
      • It removes React/browser variables from the equation [github.com]
    2. Validate endpoint and model Double‑check:
      • Endpoint format: https://<resource-name>.cognitiveservices.azure.com
      • Model name is supported by Voice Live (for example gpt-4o or supported realtime models) Mis‑matched endpoint or unsupported model is a very common failure point. [learn.microsoft.com]
    3. Add a backend proxy If you’re calling from React:
      • Create a small backend service to open the Voice Live WebSocket
      • Let React communicate only with your backend This pattern is explicitly recommended for browser‑based Voice Live apps and avoids CORS and key‑exposure issues.
    4. Check logs during session creation Enable logging on both:
      • Backend proxy
      • Browser console Look for:
      • Authentication failures
      • WebSocket close codes
      • Model or modality errors (audio/text mismatch)
    5. Confirm audio configuration Make sure:
      • Input/output audio format matches what Voice Live expects (for example PCM16)
      • Microphone permissions are granted in the browser Audio format mismatches can cause silent failures even when the connection succeeds.

    After validating the setup with an official sample and moving the Voice Live connection behind a backend proxy, react integrations usually work as expected. This approach has resolved the issue in most similar cases I’ve seen.

    If the problem still persists, sharing:

    • The exact error message
    • Where the failure happens (session creation vs audio streaming)
    • Whether you’re calling Voice Live directly or via a backend

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thankyou!

    0 comments No comments

  2. Q&A Assist
    2026-02-25T17:09:11.0933333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To configure the Voice Live SDK to receive animation blendshapes and viseme IDs, ensure that your configuration is set up correctly. Based on the context provided, here's how you can do it:

    1. Modalities: Make sure you include animation in the modalities array, which you have done correctly:
         modalities: ["text", "audio", "animation"],
      
    2. Output Audio Timestamp Types: Ensure that you have the correct spelling for outputAudioTimestampTypes, as it seems there is a typo in your configuration (outputAudioTimestampYypes). It should be:
         outputAudioTimestampTypes: ["word"],
      
    3. Animation Configuration: Your animation configuration looks good, but ensure that the modelName is correctly set to a valid model that supports the outputs you want:
         animation: {
             modelName: "default",
             outputs: ["viseme_id"]
         },
      
    4. Check for Responses: After setting this up, you should receive viseme alignment in the response when the audio is generated. The service will return messages like response.animation_viseme.delta and response.animation_viseme.done when all viseme messages are returned.

    If you are still not receiving animation data, double-check the service's documentation for any additional requirements or constraints regarding the animation outputs and ensure that your session is properly established with the Voice Live API.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.