Text to Speech API を使用する

4 分

Speech to Text API と同様に、Foundry Tools の Azure Speech には、音声合成用のテキスト読み上げ API が用意されています。

音声認識と同様に、実際には、ほとんどの対話型音声対応アプリケーションは Azure Speech SDK を使用して構築されます。

音声合成を実装するパターンは、音声認識のパターンと似ています。

SpeechConfig と AudioConfig から SpeechSynthesizer オブジェクトを作成し、その SpeakTextAsync メソッドを使用して Speech API を呼び出す方法を示す図。

SpeechConfig オブジェクトを使用して、Azure Speech リソースへの接続に必要な情報をカプセル化します。具体的には、その場所とキー。
必要に応じて、 AudioConfig を使用して、合成する音声の出力デバイスを定義します。既定では、これは既定のシステムスピーカーですが、オーディオファイルを指定することも、この値を null 値に明示的に設定することで、直接返されるオーディオストリームオブジェクトを処理することもできます。
SpeechConfig と AudioConfig を使用して SpeechSynthesizer オブジェクトを作成します。このオブジェクトは、 Text to Speech API のプロキシクライアントです。
SpeechSynthesizer オブジェクトのメソッドを使用して、基になる API 関数を呼び出します。たとえば、 SpeakTextAsync() メソッドは、Azure Speech サービスを使用してテキストを音声オーディオに変換します。
Azure Speech サービスからの応答を処理します。 SpeakTextAsync メソッドの場合、結果は次のプロパティを含む SpeechSynthesisResult オブジェクトになります。
- AudioData
- プロパティ
- 理由
- ResultId

音声が正常に合成されると、 Reason プロパティは SynthesizingAudioCompleted 列挙体に設定され、 AudioData プロパティにはオーディオストリームが含まれます ( AudioConfig によっては、スピーカーまたはファイルに自動的に送信されている可能性があります)。

例 - 音声としてテキストを合成する

次の Python の例では、Foundry Tools で Azure Speech を使用して、テキストから音声出力を生成します。

import azure.cognitiveservices.speech as speechsdk

# Speech config encapsulates the connection to the resource
speech_config = speechsdk.SpeechConfig(subscription=KEY, endpoint=ENDPOINT)

# Audio output config determines where to send the audio stream (defaults to speaker)
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

# Use speech synthesizer to synthesize text as speech
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config,
                                                 audio_config=audio_config)
text = "My voice is my password!"
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

# Did it succeeed?
if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    # Yes!
    print("Speech synthesized for text [{}]".format(text))
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
    # No - Ty to find out why not
    cancellation_details = speech_synthesis_result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print("Error details: {}".format(cancellation_details.error_details))

フィードバック

このページはお役に立ちましたか?