チュートリアル: 音声をテキストに変換してメモを取るアプリを作成する - Foundry Local

パッケージをインストールする

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/cs/tutorial-voice-to-text

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

dotnet add package Microsoft.AI.Foundry.Local.WinML
dotnet add package OpenAI

dotnet add package Microsoft.AI.Foundry.Local
dotnet add package OpenAI

GitHub リポジトリの C# サンプルは、事前構成済みのプロジェクトです。最初からビルドする場合は、Foundry Local を使用して C# プロジェクトを設定する方法の詳細については、 Foundry Local SDK リファレンスを参照してください。

オーディオファイルを文字起こしする

この手順では、音声テキスト変換モデルを読み込み、オーディオファイルを文字起こしします。 Foundry Local SDK では、 whisper モデルエイリアスを使用して、ハードウェアに最適なささやきバリアントを選択します。

Program.csを開き、その内容を次のコードに置き換えて SDK を初期化し、音声モデルを読み込み、オーディオファイルを文字起こしします。

// Load the speech-to-text model
var speechModel = await catalog.GetModelAsync("whisper-tiny")
    ?? throw new Exception("Speech model not found");

await speechModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading speech model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await speechModel.LoadAsync();
Console.WriteLine("Speech model loaded.");

// Transcribe the audio file
var audioClient = await speechModel.GetAudioClientAsync();
var transcriptionText = new StringBuilder();

Console.WriteLine("\nTranscription:");
var audioResponse = audioClient
    .TranscribeAudioStreamingAsync("meeting-notes.wav", ct);
await foreach (var chunk in audioResponse)
{
    Console.Write(chunk.Text);
    transcriptionText.Append(chunk.Text);
}
Console.WriteLine();

// Unload the speech model to free memory
await speechModel.UnloadAsync();

GetAudioClientAsync メソッドは、オーディオ操作用のクライアントを返します。 TranscribeAudioStreamingAsyncメソッドは、使用可能になると文字起こしチャンクをストリーミングします。テキストを蓄積して、次の手順でチャットモデルに渡すことができます。

注

"meeting-notes.wav" を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

文字起こしの要約

次に、チャットモデルを使用して、生の文字起こしを構造化されたノートに整理します。 qwen2.5-0.5b モデルを読み込み、クリーンで要約されたメモを生成するようにモデルに指示するシステムプロンプトを使用して、文字起こしをコンテキストとして送信します。

文字起こし手順の後に、次のコードを追加します。

// Load the chat model for summarization
var chatModel = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Chat model not found");

await chatModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading chat model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await chatModel.LoadAsync();
Console.WriteLine("Chat model loaded.");

// Summarize the transcription into organized notes
var chatClient = await chatModel.GetChatClientAsync();
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a note-taking assistant. Summarize " +
                  "the following transcription into organized, " +
                  "concise notes with bullet points."
    },
    new ChatMessage
    {
        Role = "user",
        Content = transcriptionText.ToString()
    }
};

var chatResponse = await chatClient.CompleteChatAsync(messages, ct);
var summary = chatResponse.Choices[0].Message.Content;
Console.WriteLine($"\nSummary:\n{summary}");

// Clean up
await chatModel.UnloadAsync();
Console.WriteLine("\nDone. Models unloaded.");

システムプロンプトは、モデルの出力形式を設定します。 "箇条書きの整理された簡潔なノート" を生成するように指示することで、生の言い換えではなく構造化されたコンテンツを取得します。

完全なアプリに結合する

Program.csの内容を、オーディオファイルを文字起こしし、文字起こしを要約する次の完全なコードに置き換えます。

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;
using System.Text;

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(
        Microsoft.Extensions.Logging.LogLevel.Information
    );
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

var catalog = await mgr.GetCatalogAsync();

// Load the speech-to-text model
var speechModel = await catalog.GetModelAsync("whisper-tiny")
    ?? throw new Exception("Speech model not found");

await speechModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading speech model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await speechModel.LoadAsync();
Console.WriteLine("Speech model loaded.");

// Transcribe the audio file
var audioClient = await speechModel.GetAudioClientAsync();
var transcriptionText = new StringBuilder();

Console.WriteLine("\nTranscription:");
var audioResponse = audioClient
    .TranscribeAudioStreamingAsync("meeting-notes.wav", ct);
await foreach (var chunk in audioResponse)
{
    Console.Write(chunk.Text);
    transcriptionText.Append(chunk.Text);
}
Console.WriteLine();

// Unload the speech model to free memory
await speechModel.UnloadAsync();

// Load the chat model for summarization
var chatModel = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Chat model not found");

await chatModel.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading chat model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await chatModel.LoadAsync();
Console.WriteLine("Chat model loaded.");

// Summarize the transcription into organized notes
var chatClient = await chatModel.GetChatClientAsync();
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a note-taking assistant. Summarize " +
                  "the following transcription into organized, " +
                  "concise notes with bullet points."
    },
    new ChatMessage
    {
        Role = "user",
        Content = transcriptionText.ToString()
    }
};

var chatResponse = await chatClient.CompleteChatAsync(messages, ct);
var summary = chatResponse.Choices[0].Message.Content;
Console.WriteLine($"\nSummary:\n{summary}");

// Clean up
await chatModel.UnloadAsync();
Console.WriteLine("\nDone. Models unloaded.");

注

"meeting-notes.wav" を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

ノートテイカーを起動します。

dotnet run

次のような出力が表示されます。

Downloading speech model: 100.00%
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

アプリケーションは、最初にストリーミング出力でオーディオコンテンツを文字起こしし、蓄積されたテキストをチャットモデルに渡します。これにより、キーポイントが抽出され、構造化されたノートに整理されます。

パッケージをインストールする

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/js/tutorial-voice-to-text

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

npm install foundry-local-sdk-winml openai

npm install foundry-local-sdk openai

オーディオファイルを文字起こしする

この手順では、音声テキスト変換モデルを読み込み、オーディオファイルを文字起こしします。 Foundry Local SDK では、 whisper モデルエイリアスを使用して、ハードウェアに最適なささやきバリアントを選択します。

app.jsという名前のファイルを作成します。

SDK を初期化し、音声モデルを読み込み、オーディオファイルを文字起こしする次のコードを追加します。

// Load the speech-to-text model
const speechModel = await manager.catalog.getModel('whisper-tiny');
await speechModel.download((progress) => {
    process.stdout.write(
        `\rDownloading speech model: ${progress.toFixed(2)}%`
    );
});
console.log('\nSpeech model downloaded.');

await speechModel.load();
console.log('Speech model loaded.');

// Transcribe the audio file
const audioClient = speechModel.createAudioClient();
const transcription = await audioClient.transcribe(
    path.join(__dirname, 'meeting-notes.wav')
);
console.log(`\nTranscription:\n${transcription.text}`);

// Unload the speech model to free memory
await speechModel.unload();

createAudioClient メソッドは、オーディオ操作用のクライアントを返します。 transcribe メソッドは、ファイルパスを受け取り、文字起こしされたコンテンツを含むtext プロパティを持つオブジェクトを返します。

注

'./meeting-notes.wav' を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

文字起こしの要約

次に、チャットモデルを使用して、生の文字起こしを構造化されたノートに整理します。 qwen2.5-0.5b モデルを読み込み、クリーンで要約されたメモを生成するようにモデルに指示するシステムプロンプトを使用して、文字起こしをコンテキストとして送信します。

文字起こし手順の後に、次のコードを追加します。

// Load the chat model for summarization
const chatModel = await manager.catalog.getModel('qwen2.5-0.5b');
await chatModel.download((progress) => {
    process.stdout.write(
        `\rDownloading chat model: ${progress.toFixed(2)}%`
    );
});
console.log('\nChat model downloaded.');

await chatModel.load();
console.log('Chat model loaded.');

// Summarize the transcription into organized notes
const chatClient = chatModel.createChatClient();
const messages = [
    {
        role: 'system',
        content: 'You are a note-taking assistant. Summarize ' +
                 'the following transcription into organized, ' +
                 'concise notes with bullet points.'
    },
    {
        role: 'user',
        content: transcription.text
    }
];

const response = await chatClient.completeChat(messages);
const summary = response.choices[0]?.message?.content;
console.log(`\nSummary:\n${summary}`);

// Clean up
await chatModel.unload();
console.log('\nDone. Models unloaded.');

システムプロンプトは、モデルの出力形式を設定します。 "箇条書きの整理された簡潔なノート" を生成するように指示することで、生の言い換えではなく構造化されたコンテンツを取得します。

完全なアプリに結合する

app.jsという名前のファイルを作成し、オーディオファイルを文字起こしし、文字起こしを要約する次の完全なコードを追加します。

import { FoundryLocalManager } from 'foundry-local-sdk';
import { fileURLToPath } from 'url';
import path from 'path';

const __dirname = path.dirname(fileURLToPath(import.meta.url));

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Load the speech-to-text model
const speechModel = await manager.catalog.getModel('whisper-tiny');
await speechModel.download((progress) => {
    process.stdout.write(
        `\rDownloading speech model: ${progress.toFixed(2)}%`
    );
});
console.log('\nSpeech model downloaded.');

await speechModel.load();
console.log('Speech model loaded.');

// Transcribe the audio file
const audioClient = speechModel.createAudioClient();
const transcription = await audioClient.transcribe(
    path.join(__dirname, 'meeting-notes.wav')
);
console.log(`\nTranscription:\n${transcription.text}`);

// Unload the speech model to free memory
await speechModel.unload();

// Load the chat model for summarization
const chatModel = await manager.catalog.getModel('qwen2.5-0.5b');
await chatModel.download((progress) => {
    process.stdout.write(
        `\rDownloading chat model: ${progress.toFixed(2)}%`
    );
});
console.log('\nChat model downloaded.');

await chatModel.load();
console.log('Chat model loaded.');

// Summarize the transcription into organized notes
const chatClient = chatModel.createChatClient();
const messages = [
    {
        role: 'system',
        content: 'You are a note-taking assistant. Summarize ' +
                 'the following transcription into organized, ' +
                 'concise notes with bullet points.'
    },
    {
        role: 'user',
        content: transcription.text
    }
];

const response = await chatClient.completeChat(messages);
const summary = response.choices[0]?.message?.content;
console.log(`\nSummary:\n${summary}`);

// Clean up
await chatModel.unload();
console.log('\nDone. Models unloaded.');

注

'./meeting-notes.wav' を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

ノートテイカーを起動します。

node app.js

次のような出力が表示されます。

Downloading speech model: 100.00%
Speech model downloaded.
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model downloaded.
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

アプリケーションは最初にオーディオコンテンツを文字起こしし、そのテキストをチャットモデルに渡し、キーポイントを抽出して構造化されたノートに整理します。

パッケージをインストールする

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/python/tutorial-voice-to-text

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

pip install foundry-local-sdk-winml openai

pip install foundry-local-sdk openai

オーディオファイルを文字起こしする

この手順では、音声テキスト変換モデルを読み込み、オーディオファイルを文字起こしします。 Foundry Local SDK では、 whisper モデルエイリアスを使用して、ハードウェアに最適なささやきバリアントを選択します。

app.pyという名前のファイルを作成します。

SDK を初期化し、音声モデルを読み込み、オーディオファイルを文字起こしする次のコードを追加します。

# Load the speech-to-text model
speech_model = manager.catalog.get_model("whisper-tiny")
speech_model.download(
    lambda progress: print(
        f"\rDownloading speech model: {progress:.2f}%",
        end="",
        flush=True,
    )
)
print()
speech_model.load()
print("Speech model loaded.")

# Transcribe the audio file
audio_client = speech_model.get_audio_client()
transcription = audio_client.transcribe("meeting-notes.wav")
print(f"\nTranscription:\n{transcription.text}")

# Unload the speech model to free memory
speech_model.unload()

get_audio_client メソッドは、オーディオ操作用のクライアントを返します。 transcribe メソッドは、ファイルパスを受け取り、文字起こしされたコンテンツを含むtext プロパティを持つオブジェクトを返します。

注

"meeting-notes.wav" を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

文字起こしの要約

次に、チャットモデルを使用して、生の文字起こしを構造化されたノートに整理します。 qwen2.5-0.5b モデルを読み込み、クリーンで要約されたメモを生成するようにモデルに指示するシステムプロンプトを使用して、文字起こしをコンテキストとして送信します。

文字起こし手順の後に、次のコードを追加します。

# Load the chat model for summarization
chat_model = manager.catalog.get_model("qwen2.5-0.5b")
chat_model.download(
    lambda progress: print(
        f"\rDownloading chat model: {progress:.2f}%",
        end="",
        flush=True,
    )
)
print()
chat_model.load()
print("Chat model loaded.")

# Summarize the transcription into organized notes
client = chat_model.get_chat_client()
messages = [
    {
        "role": "system",
        "content": "You are a note-taking assistant. "
                   "Summarize the following transcription "
                   "into organized, concise notes with "
                   "bullet points.",
    },
    {"role": "user", "content": transcription.text},
]

response = client.complete_chat(messages)
summary = response.choices[0].message.content
print(f"\nSummary:\n{summary}")

# Clean up
chat_model.unload()
print("\nDone. Models unloaded.")

システムプロンプトは、モデルの出力形式を設定します。 "箇条書きの整理された簡潔なノート" を生成するように指示することで、生の言い換えではなく構造化されたコンテンツを取得します。

完全なアプリに結合する

app.pyという名前のファイルを作成し、オーディオファイルを文字起こしし、文字起こしを要約する次の完全なコードを追加します。

from foundry_local_sdk import Configuration, FoundryLocalManager


def main():
    # Initialize the Foundry Local SDK
    config = Configuration(app_name="foundry_local_samples")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance

    # Download and register all execution providers.
    current_ep = ""
    def ep_progress(ep_name: str, percent: float):
        nonlocal current_ep
        if ep_name != current_ep:
            if current_ep:
                print()
            current_ep = ep_name
        print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

    manager.download_and_register_eps(progress_callback=ep_progress)
    if current_ep:
        print()

    # Load the speech-to-text model
    speech_model = manager.catalog.get_model("whisper-tiny")
    speech_model.download(
        lambda progress: print(
            f"\rDownloading speech model: {progress:.2f}%",
            end="",
            flush=True,
        )
    )
    print()
    speech_model.load()
    print("Speech model loaded.")

    # Transcribe the audio file
    audio_client = speech_model.get_audio_client()
    transcription = audio_client.transcribe("meeting-notes.wav")
    print(f"\nTranscription:\n{transcription.text}")

    # Unload the speech model to free memory
    speech_model.unload()

    # Load the chat model for summarization
    chat_model = manager.catalog.get_model("qwen2.5-0.5b")
    chat_model.download(
        lambda progress: print(
            f"\rDownloading chat model: {progress:.2f}%",
            end="",
            flush=True,
        )
    )
    print()
    chat_model.load()
    print("Chat model loaded.")

    # Summarize the transcription into organized notes
    client = chat_model.get_chat_client()
    messages = [
        {
            "role": "system",
            "content": "You are a note-taking assistant. "
                       "Summarize the following transcription "
                       "into organized, concise notes with "
                       "bullet points.",
        },
        {"role": "user", "content": transcription.text},
    ]

    response = client.complete_chat(messages)
    summary = response.choices[0].message.content
    print(f"\nSummary:\n{summary}")

    # Clean up
    chat_model.unload()
    print("\nDone. Models unloaded.")


if __name__ == "__main__":
    main()

注

"meeting-notes.wav" を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

ノートテイカーを起動します。

python app.py

次のような出力が表示されます。

Downloading speech model: 100.00%
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

アプリケーションは最初にオーディオコンテンツを文字起こしし、そのテキストをチャットモデルに渡し、キーポイントを抽出して構造化されたノートに整理します。

パッケージをインストールする

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/rust/tutorial-voice-to-text

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

cargo add foundry-local-sdk --features winml
cargo add tokio --features full
cargo add tokio-stream anyhow

cargo add foundry-local-sdk
cargo add tokio --features full
cargo add tokio-stream anyhow

オーディオファイルを文字起こしする

この手順では、音声テキスト変換モデルを読み込み、オーディオファイルを文字起こしします。 Foundry Local SDK では、 whisper モデルエイリアスを使用して、ハードウェアに最適なささやきバリアントを選択します。

src/main.rsを開き、その内容を次のコードに置き換えて SDK を初期化し、音声モデルを読み込み、オーディオファイルを文字起こしします。

// Load the speech-to-text model
let speech_model = manager
    .catalog()
    .get_model("whisper-tiny")
    .await?;

if !speech_model.is_cached().await? {
    println!("Downloading speech model...");
    speech_model
        .download(Some(|progress: f64| {
            print!("\r  {progress:.1}%");
            io::stdout().flush().ok();
        }))
        .await?;
    println!();
}

speech_model.load().await?;
println!("Speech model loaded.");

// Transcribe the audio file
let audio_client = speech_model.create_audio_client();
let transcription = audio_client
    .transcribe("meeting-notes.wav")
    .await?;
println!("\nTranscription:\n{}", transcription.text);

// Unload the speech model to free memory
speech_model.unload().await?;

create_audio_client メソッドは、オーディオ操作用のクライアントを返します。 transcribe メソッドは、ファイルパスを受け取り、文字起こしされたコンテンツを含むtext フィールドを持つオブジェクトを返します。

注

"meeting-notes.wav" を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

文字起こしの要約

次に、チャットモデルを使用して、生の文字起こしを構造化されたノートに整理します。 qwen2.5-0.5b モデルを読み込み、クリーンで要約されたメモを生成するようにモデルに指示するシステムプロンプトを使用して、文字起こしをコンテキストとして送信します。

文字起こし手順の後に、 main 関数内に次のコードを追加します。

// Load the chat model for summarization
let chat_model = manager
    .catalog()
    .get_model("qwen2.5-0.5b")
    .await?;

if !chat_model.is_cached().await? {
    println!("Downloading chat model...");
    chat_model
        .download(Some(|progress: f64| {
            print!("\r  {progress:.1}%");
            io::stdout().flush().ok();
        }))
        .await?;
    println!();
}

chat_model.load().await?;
println!("Chat model loaded.");

// Summarize the transcription into organized notes
let client = chat_model
    .create_chat_client()
    .temperature(0.7)
    .max_tokens(512);

let messages: Vec<ChatCompletionRequestMessage> = vec![
    ChatCompletionRequestSystemMessage::from(
        "You are a note-taking assistant. Summarize \
         the following transcription into organized, \
         concise notes with bullet points.",
    )
    .into(),
    ChatCompletionRequestUserMessage::from(
        transcription.text.as_str(),
    )
    .into(),
];

let response = client
    .complete_chat(&messages, None)
    .await?;
let summary = response.choices[0]
    .message
    .content
    .as_deref()
    .unwrap_or("");
println!("\nSummary:\n{}", summary);

// Clean up
chat_model.unload().await?;
println!("\nDone. Models unloaded.");

システムプロンプトは、モデルの出力形式を設定します。 "箇条書きの整理された簡潔なノート" を生成するように指示することで、生の言い換えではなく構造化されたコンテンツを取得します。

完全なアプリに結合する

src/main.rsの内容を、オーディオファイルを文字起こしし、文字起こしを要約する次の完全なコードに置き換えます。

use foundry_local_sdk::{
    ChatCompletionRequestMessage,
    ChatCompletionRequestSystemMessage,
    ChatCompletionRequestUserMessage,
    FoundryLocalConfig, FoundryLocalManager,
};
use std::io::{self, Write};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the Foundry Local SDK
    let manager = FoundryLocalManager::create(
        FoundryLocalConfig::new("note-taker"),
    )?;

    // Download and register all execution providers.
    manager
        .download_and_register_eps_with_progress(None, {
            let mut current_ep = String::new();
            move |ep_name: &str, percent: f64| {
                if ep_name != current_ep {
                    if !current_ep.is_empty() {
                        println!();
                    }
                    current_ep = ep_name.to_string();
                }
                print!("\r  {:<30}  {:5.1}%", ep_name, percent);
                io::stdout().flush().ok();
            }
        })
        .await?;
    println!();

    // Load the speech-to-text model
    let speech_model = manager
        .catalog()
        .get_model("whisper-tiny")
        .await?;

    if !speech_model.is_cached().await? {
        println!("Downloading speech model...");
        speech_model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    speech_model.load().await?;
    println!("Speech model loaded.");

    // Transcribe the audio file
    let audio_client = speech_model.create_audio_client();
    let transcription = audio_client
        .transcribe("meeting-notes.wav")
        .await?;
    println!("\nTranscription:\n{}", transcription.text);

    // Unload the speech model to free memory
    speech_model.unload().await?;

    // Load the chat model for summarization
    let chat_model = manager
        .catalog()
        .get_model("qwen2.5-0.5b")
        .await?;

    if !chat_model.is_cached().await? {
        println!("Downloading chat model...");
        chat_model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    chat_model.load().await?;
    println!("Chat model loaded.");

    // Summarize the transcription into organized notes
    let client = chat_model
        .create_chat_client()
        .temperature(0.7)
        .max_tokens(512);

    let messages: Vec<ChatCompletionRequestMessage> = vec![
        ChatCompletionRequestSystemMessage::from(
            "You are a note-taking assistant. Summarize \
             the following transcription into organized, \
             concise notes with bullet points.",
        )
        .into(),
        ChatCompletionRequestUserMessage::from(
            transcription.text.as_str(),
        )
        .into(),
    ];

    let response = client
        .complete_chat(&messages, None)
        .await?;
    let summary = response.choices[0]
        .message
        .content
        .as_deref()
        .unwrap_or("");
    println!("\nSummary:\n{}", summary);

    // Clean up
    chat_model.unload().await?;
    println!("\nDone. Models unloaded.");

    Ok(())
}

注

"meeting-notes.wav" を、オーディオファイルへのパスに置き換えます。サポートされている形式には、WAV、MP3、およびFLAC が含まれます。

ノートテイカーを起動します。

cargo run

次のような出力が表示されます。

Downloading speech model: 100.00%
Speech model loaded.

Transcription:
OK so let's get started with the weekly sync. First, the backend
API is nearly done. Sarah finished the authentication endpoints
yesterday. We still need to add rate limiting before we go to
staging. On the frontend, the dashboard redesign is about seventy
percent complete. Jake, can you walk us through the new layout?
Great. The charts look good. I think we should add a filter for
date range though. For testing, we have about eighty percent code
coverage on the API. We need to write integration tests for the
new auth flow before Friday. Let's plan to do a full regression
test next Tuesday before the release. Any blockers? OK, sounds
like we are in good shape. Let's wrap up.

Downloading chat model: 100.00%
Chat model loaded.

Summary:
- **Backend API**: Authentication endpoints complete. Rate limiting
  still needed before staging deployment.
- **Frontend**: Dashboard redesign 70% complete. New chart layout
  reviewed. Action item: add a date range filter.
- **Testing**: API code coverage at 80%. Integration tests for the
  auth flow due Friday. Full regression test scheduled for next
  Tuesday before release.
- **Status**: No blockers reported. Team is on track.

Done. Models unloaded.

アプリケーションは最初にオーディオコンテンツを文字起こしし、そのテキストをチャットモデルに渡し、キーポイントを抽出して構造化されたノートに整理します。

次の方法で共有

チュートリアル: 音声からテキストへのメモ取りツールを作る

前提条件

パッケージをインストールする

サンプル リポジトリ

オーディオ ファイルを文字起こしする

文字起こしの要約

完全なアプリに結合する

パッケージをインストールする

サンプル リポジトリ

オーディオ ファイルを文字起こしする

文字起こしの要約

完全なアプリに結合する

パッケージをインストールする

サンプル リポジトリ

オーディオ ファイルを文字起こしする

文字起こしの要約

完全なアプリに結合する

パッケージをインストールする

サンプル リポジトリ

オーディオ ファイルを文字起こしする

文字起こしの要約

完全なアプリに結合する

リソースをクリーンアップする

関連するコンテンツ

フィードバック

その他のリソース

サンプルリポジトリ

オーディオファイルを文字起こしする

サンプルリポジトリ

オーディオファイルを文字起こしする

サンプルリポジトリ

オーディオファイルを文字起こしする

サンプルリポジトリ

オーディオファイルを文字起こしする