チュートリアル: Foundry Local を使用してマルチターンチャットアシスタントを構築する

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/cs/tutorial-chat-assistant

パッケージをインストールする

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

dotnet add package Microsoft.AI.Foundry.Local.WinML
dotnet add package OpenAI

dotnet add package Microsoft.AI.Foundry.Local
dotnet add package OpenAI

GitHub リポジトリの C# サンプルは、事前構成済みのプロジェクトです。最初からビルドする場合は、Foundry Local を使用して C# プロジェクトを設定する方法の詳細については、 Foundry Local SDK リファレンスを参照してください。

カタログを参照してモデルを選択する

Foundry Local SDK には、使用可能なすべてのモデルを一覧表示するモデルカタログが用意されています。この手順では、SDK を初期化し、チャットアシスタントのモデルを選択します。

Program.csを開き、その内容を次のコードに置き換えて SDK を初期化し、モデルを選択します。

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

// Select and load a model from the catalog
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Model not found");

await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await model.LoadAsync();
Console.WriteLine("Model loaded and ready.");

// Get a chat client
var chatClient = await model.GetChatClientAsync();

GetModelAsync メソッドは、モデルエイリアスを受け入れます。これは、カタログ内の特定のモデルにマップされる短いフレンドリ名です。 DownloadAsync メソッドは、モデルの重みをローカルキャッシュにフェッチし、LoadAsyncモデルを推論の準備にします。

システムプロンプトを定義する

システムプロンプトは、アシスタントの個性と動作を設定します。これは会話履歴の最初のメッセージであり、モデルは会話全体でそれを参照します。

アシスタントの応答を整形するシステムプロンプトを追加します。

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

ヒント

さまざまなシステムプロンプトを試して、アシスタントの動作を変更します。たとえば、海賊、教師、またはドメインの専門家として対応するように指示できます。

マルチターン会話を実装する

チャットアシスタントは、複数の交換でコンテキストを維持する必要があります。これを実現するには、すべてのメッセージ (システム、ユーザー、アシスタント) の一覧を保持し、各要求で完全なリストを送信します。モデルでは、この履歴を使用してコンテキストに関連する応答が生成されます。

次のメッセージ交換ループを追加します。

コンソールからユーザー入力を読み取ります。
履歴にユーザーメッセージを追加します。
完全な履歴をモデルに送信します。
アシスタントの応答を次のターンの履歴に追加します。

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

CompleteChatAsyncの各呼び出しは、メッセージの完全な履歴を受け取ります。これは、モデルが前のターンを "記憶" する方法です。呼び出し間の状態は格納されません。

ストリーミング応答を追加する

ストリーミングでは、生成された各トークンが出力されるため、アシスタントの応答性が向上します。 CompleteChatAsync呼び出しをCompleteChatStreamingAsyncに置き換えて、応答トークンをトークンでストリーミングします。

ストリーミングを使用するように会話ループを更新します。

// Stream the response token by token
Console.Write("Assistant: ");
var fullResponse = string.Empty;
var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
    var content = chunk.Choices[0].Message.Content;
    if (!string.IsNullOrEmpty(content))
    {
        Console.Write(content);
        Console.Out.Flush();
        fullResponse += content;
    }
}
Console.WriteLine("\n");

ストリーミングバージョンでは完全な応答が蓄積されるため、ストリームの完了後に会話履歴に追加できます。

完成したコード

Program.csの内容を次の完全なコードに置き換えます。

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "foundry_local_samples",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Download and register all execution providers.
var currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
    if (epName != currentEp)
    {
        if (currentEp != "") Console.WriteLine();
        currentEp = epName;
    }
    Console.Write($"\r  {epName.PadRight(30)}  {percent,6:F1}%");
});
if (currentEp != "") Console.WriteLine();

// Select and load a model from the catalog
var catalog = await mgr.GetCatalogAsync();
var model = await catalog.GetModelAsync("qwen2.5-0.5b")
    ?? throw new Exception("Model not found");

await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f) Console.WriteLine();
});

await model.LoadAsync();
Console.WriteLine("Model loaded and ready.");

// Get a chat client
var chatClient = await model.GetChatClientAsync();

// Start the conversation with a system prompt
var messages = new List<ChatMessage>
{
    new ChatMessage
    {
        Role = "system",
        Content = "You are a helpful, friendly assistant. Keep your responses " +
                  "concise and conversational. If you don't know something, say so."
    }
};

Console.WriteLine("\nChat assistant ready! Type 'quit' to exit.\n");

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userInput) ||
        userInput.Equals("quit", StringComparison.OrdinalIgnoreCase) ||
        userInput.Equals("exit", StringComparison.OrdinalIgnoreCase))
    {
        break;
    }

    // Add the user's message to conversation history
    messages.Add(new ChatMessage { Role = "user", Content = userInput });

    // Stream the response token by token
    Console.Write("Assistant: ");
    var fullResponse = string.Empty;
    var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
    await foreach (var chunk in streamingResponse)
    {
        var content = chunk.Choices[0].Message.Content;
        if (!string.IsNullOrEmpty(content))
        {
            Console.Write(content);
            Console.Out.Flush();
            fullResponse += content;
        }
    }
    Console.WriteLine("\n");

    // Add the complete response to conversation history
    messages.Add(new ChatMessage { Role = "assistant", Content = fullResponse });
}

// Clean up - unload the model
await model.UnloadAsync();
Console.WriteLine("Model unloaded. Goodbye!");

チャットアシスタントを実行します。

dotnet run

次のような出力が表示されます。

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

アシスタントが前のターンのコンテキストをどのように記憶しているかに注目してください。「なぜ他の生き物にとって重要なのか」と尋ねると、まだ光合成について話していることがわかります。

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/js/tutorial-chat-assistant

パッケージをインストールする

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

npm install foundry-local-sdk-winml openai

npm install foundry-local-sdk openai

カタログを参照してモデルを選択する

Foundry Local SDK には、使用可能なすべてのモデルを一覧表示するモデルカタログが用意されています。この手順では、SDK を初期化し、チャットアシスタントのモデルを選択します。

index.jsという名前のファイルを作成します。

次のコードを追加して SDK を初期化し、モデルを選択します。

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Select and load a model from the catalog
const model = await manager.catalog.getModel('qwen2.5-0.5b');

await model.download((progress) => {
    process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
});
console.log('\nModel downloaded.');

await model.load();
console.log('Model loaded and ready.');

// Create a chat client
const chatClient = model.createChatClient();

getModel メソッドは、モデルエイリアスを受け入れます。これは、カタログ内の特定のモデルにマップされる短いフレンドリ名です。 download メソッドは、モデルの重みをローカルキャッシュにフェッチし、loadモデルを推論の準備にします。

システムプロンプトを定義する

システムプロンプトは、アシスタントの個性と動作を設定します。これは会話履歴の最初のメッセージであり、モデルは会話全体でそれを参照します。

アシスタントの応答を整形するシステムプロンプトを追加します。

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

ヒント

さまざまなシステムプロンプトを試して、アシスタントの動作を変更します。たとえば、海賊、教師、またはドメインの専門家として対応するように指示できます。

マルチターン会話を実装する

チャットアシスタントは、複数の交換でコンテキストを維持する必要があります。これを実現するには、すべてのメッセージ (システム、ユーザー、アシスタント) の一覧を保持し、各要求で完全なリストを送信します。モデルでは、この履歴を使用してコンテキストに関連する応答が生成されます。

次のメッセージ交換ループを追加します。

コンソールからユーザー入力を読み取ります。
履歴にユーザーメッセージを追加します。
完全な履歴をモデルに送信します。
アシスタントの応答を次のターンの履歴に追加します。

while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

completeChatの各呼び出しは、メッセージの完全な履歴を受け取ります。これは、モデルが前のターンを "記憶" する方法です。呼び出し間の状態は格納されません。

ストリーミング応答を追加する

ストリーミングでは、生成された各トークンが出力されるため、アシスタントの応答性が向上します。 completeChat呼び出しをcompleteStreamingChatに置き換えて、応答トークンをトークンでストリーミングします。

ストリーミングを使用するように会話ループを更新します。

// Stream the response token by token
process.stdout.write('Assistant: ');
let fullResponse = '';
for await (const chunk of chatClient.completeStreamingChat(messages)) {
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) {
        process.stdout.write(content);
        fullResponse += content;
    }
}
console.log('\n');

ストリーミングバージョンでは完全な応答が蓄積されるため、ストリームの完了後に会話履歴に追加できます。

完成したコード

index.jsという名前のファイルを作成し、次の完全なコードを追加します。

import { FoundryLocalManager } from 'foundry-local-sdk';
import * as readline from 'readline';

// Initialize the Foundry Local SDK
const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});

// Download and register all execution providers.
let currentEp = '';
await manager.downloadAndRegisterEps((epName, percent) => {
    if (epName !== currentEp) {
        if (currentEp !== '') process.stdout.write('\n');
        currentEp = epName;
    }
    process.stdout.write(`\r  ${epName.padEnd(30)}  ${percent.toFixed(1).padStart(5)}%`);
});
if (currentEp !== '') process.stdout.write('\n');

// Select and load a model from the catalog
const model = await manager.catalog.getModel('qwen2.5-0.5b');

await model.download((progress) => {
    process.stdout.write(`\rDownloading model: ${progress.toFixed(2)}%`);
});
console.log('\nModel downloaded.');

await model.load();
console.log('Model loaded and ready.');

// Create a chat client
const chatClient = model.createChatClient();

// Start the conversation with a system prompt
const messages = [
    {
        role: 'system',
        content: 'You are a helpful, friendly assistant. Keep your responses ' +
                 'concise and conversational. If you don\'t know something, say so.'
    }
];

// Set up readline for console input
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

const askQuestion = (prompt) => new Promise((resolve) => rl.question(prompt, resolve));

console.log('\nChat assistant ready! Type \'quit\' to exit.\n');

while (true) {
    const userInput = await askQuestion('You: ');
    if (userInput.trim().toLowerCase() === 'quit' ||
        userInput.trim().toLowerCase() === 'exit') {
        break;
    }

    // Add the user's message to conversation history
    messages.push({ role: 'user', content: userInput });

    // Stream the response token by token
    process.stdout.write('Assistant: ');
    let fullResponse = '';
    for await (const chunk of chatClient.completeStreamingChat(messages)) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
            fullResponse += content;
        }
    }
    console.log('\n');

    // Add the complete response to conversation history
    messages.push({ role: 'assistant', content: fullResponse });
}

// Clean up - unload the model
await model.unload();
console.log('Model unloaded. Goodbye!');
rl.close();

チャットアシスタントを実行します。

node index.js

次のような出力が表示されます。

Downloading model: 100.00%
Model downloaded.
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

アシスタントが前のターンのコンテキストをどのように記憶しているかに注目してください。「なぜ他の生き物にとって重要なのか」と尋ねると、まだ光合成について話していることがわかります。

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/python/tutorial-chat-assistant

パッケージをインストールする

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

pip install foundry-local-sdk-winml openai

pip install foundry-local-sdk openai

カタログを参照してモデルを選択する

Foundry Local SDK には、使用可能なすべてのモデルを一覧表示するモデルカタログが用意されています。この手順では、SDK を初期化し、チャットアシスタントのモデルを選択します。

main.pyという名前のファイルを作成します。

次のコードを追加して SDK を初期化し、モデルを選択します。

# Initialize the Foundry Local SDK
config = Configuration(app_name="foundry_local_samples")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

# Download and register all execution providers.
current_ep = ""
def ep_progress(ep_name: str, percent: float):
    nonlocal current_ep
    if ep_name != current_ep:
        if current_ep:
            print()
        current_ep = ep_name
    print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

manager.download_and_register_eps(progress_callback=ep_progress)
if current_ep:
    print()

# Select and load a model from the catalog
model = manager.catalog.get_model("qwen2.5-0.5b")
model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
print()
model.load()
print("Model loaded and ready.")

# Get a chat client
client = model.get_chat_client()

get_model メソッドは、モデルエイリアスを受け入れます。これは、カタログ内の特定のモデルにマップされる短いフレンドリ名です。 download メソッドは、モデルの重みをローカルキャッシュにフェッチし、loadモデルを推論の準備にします。

システムプロンプトを定義する

システムプロンプトは、アシスタントの個性と動作を設定します。これは会話履歴の最初のメッセージであり、モデルは会話全体でそれを参照します。

アシスタントの応答を整形するシステムプロンプトを追加します。

# Start the conversation with a system prompt
messages = [
    {
        "role": "system",
        "content": "You are a helpful, friendly assistant. Keep your responses "
                   "concise and conversational. If you don't know something, say so."
    }
]

ヒント

さまざまなシステムプロンプトを試して、アシスタントの動作を変更します。たとえば、海賊、教師、またはドメインの専門家として対応するように指示できます。

マルチターン会話を実装する

チャットアシスタントは、複数の交換でコンテキストを維持する必要があります。これを実現するには、すべてのメッセージ (システム、ユーザー、アシスタント) の一覧を保持し、各要求で完全なリストを送信します。モデルでは、この履歴を使用してコンテキストに関連する応答が生成されます。

次のメッセージ交換ループを追加します。

コンソールからユーザー入力を読み取ります。
履歴にユーザーメッセージを追加します。
完全な履歴をモデルに送信します。
アシスタントの応答を次のターンの履歴に追加します。

while True:
    user_input = input("You: ")
    if user_input.strip().lower() in ("quit", "exit"):
        break

    # Add the user's message to conversation history
    messages.append({"role": "user", "content": user_input})

    # Stream the response token by token
    print("Assistant: ", end="", flush=True)
    full_response = ""
    for chunk in client.complete_streaming_chat(messages):
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
            full_response += content
    print("\n")

    # Add the complete response to conversation history
    messages.append({"role": "assistant", "content": full_response})

complete_chatの各呼び出しは、メッセージの完全な履歴を受け取ります。これは、モデルが前のターンを "記憶" する方法です。呼び出し間の状態は格納されません。

ストリーミング応答を追加する

ストリーミングでは、生成された各トークンが出力されるため、アシスタントの応答性が向上します。 complete_chat呼び出しをcomplete_streaming_chatに置き換えて、応答トークンをトークンでストリーミングします。

ストリーミングを使用するように会話ループを更新します。

# Stream the response token by token
print("Assistant: ", end="", flush=True)
full_response = ""
for chunk in client.complete_streaming_chat(messages):
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
        full_response += content
print("\n")

ストリーミングバージョンでは完全な応答が蓄積されるため、ストリームの完了後に会話履歴に追加できます。

完成したコード

main.pyという名前のファイルを作成し、次の完全なコードを追加します。

from foundry_local_sdk import Configuration, FoundryLocalManager


def main():
    # Initialize the Foundry Local SDK
    config = Configuration(app_name="foundry_local_samples")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance

    # Download and register all execution providers.
    current_ep = ""
    def ep_progress(ep_name: str, percent: float):
        nonlocal current_ep
        if ep_name != current_ep:
            if current_ep:
                print()
            current_ep = ep_name
        print(f"\r  {ep_name:<30}  {percent:5.1f}%", end="", flush=True)

    manager.download_and_register_eps(progress_callback=ep_progress)
    if current_ep:
        print()

    # Select and load a model from the catalog
    model = manager.catalog.get_model("qwen2.5-0.5b")
    model.download(lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True))
    print()
    model.load()
    print("Model loaded and ready.")

    # Get a chat client
    client = model.get_chat_client()

    # Start the conversation with a system prompt
    messages = [
        {
            "role": "system",
            "content": "You are a helpful, friendly assistant. Keep your responses "
                       "concise and conversational. If you don't know something, say so."
        }
    ]

    print("\nChat assistant ready! Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ")
        if user_input.strip().lower() in ("quit", "exit"):
            break

        # Add the user's message to conversation history
        messages.append({"role": "user", "content": user_input})

        # Stream the response token by token
        print("Assistant: ", end="", flush=True)
        full_response = ""
        for chunk in client.complete_streaming_chat(messages):
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        print("\n")

        # Add the complete response to conversation history
        messages.append({"role": "assistant", "content": full_response})

    # Clean up - unload the model
    model.unload()
    print("Model unloaded. Goodbye!")


if __name__ == "__main__":
    main()

チャットアシスタントを実行します。

python main.py

次のような出力が表示されます。

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

アシスタントが前のターンのコンテキストをどのように記憶しているかに注目してください。「なぜ他の生き物にとって重要なのか」と尋ねると、まだ光合成について話していることがわかります。

サンプルリポジトリ

この記事の完全なサンプルコードは、Foundry Local GitHub リポジトリで入手できます。リポジトリを複製し、サンプルに移動するには、次を使用します。

git clone https://github.com/microsoft/Foundry-Local.git
cd Foundry-Local/samples/rust/tutorial-chat-assistant

パッケージをインストールする

Windowsで開発または出荷する場合は、Windows タブを選択します。Windows パッケージは、Windows ML ランタイムと統合され、同じ API サーフェス領域に幅広いハードウェアアクセラレーションを提供します。

Windows
クロスプラットフォーム

cargo add foundry-local-sdk --features winml
cargo add tokio --features full
cargo add tokio-stream anyhow

cargo add foundry-local-sdk
cargo add tokio --features full
cargo add tokio-stream anyhow

カタログを参照してモデルを選択する

Foundry Local SDK には、使用可能なすべてのモデルを一覧表示するモデルカタログが用意されています。この手順では、SDK を初期化し、チャットアシスタントのモデルを選択します。

src/main.rsを開き、その内容を次のコードに置き換えて SDK を初期化し、モデルを選択します。

// Initialize the Foundry Local SDK
let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;

// Download and register all execution providers.
manager
    .download_and_register_eps_with_progress(None, {
        let mut current_ep = String::new();
        move |ep_name: &str, percent: f64| {
            if ep_name != current_ep {
                if !current_ep.is_empty() {
                    println!();
                }
                current_ep = ep_name.to_string();
            }
            print!("\r  {:<30}  {:5.1}%", ep_name, percent);
            io::stdout().flush().ok();
        }
    })
    .await?;
println!();

// Select and load a model from the catalog
let model = manager.catalog().get_model("qwen2.5-0.5b").await?;

if !model.is_cached().await? {
    println!("Downloading model...");
    model
        .download(Some(|progress: f64| {
            print!("\r  {progress:.1}%");
            io::stdout().flush().ok();
        }))
        .await?;
    println!();
}

model.load().await?;
println!("Model loaded and ready.");

// Create a chat client
let client = model.create_chat_client().temperature(0.7).max_tokens(512);

get_model メソッドは、モデルエイリアスを受け入れます。これは、カタログ内の特定のモデルにマップされる短いフレンドリ名です。 download メソッドは、モデルの重みをローカルキャッシュにフェッチし、loadモデルを推論の準備にします。

システムプロンプトを定義する

システムプロンプトは、アシスタントの個性と動作を設定します。これは会話履歴の最初のメッセージであり、モデルは会話全体でそれを参照します。

アシスタントの応答を整形するシステムプロンプトを追加します。

// Start the conversation with a system prompt
let mut messages: Vec<ChatCompletionRequestMessage> = vec![
    ChatCompletionRequestSystemMessage::from(
        "You are a helpful, friendly assistant. Keep your responses \
         concise and conversational. If you don't know something, say so.",
    )
    .into(),
];

ヒント

さまざまなシステムプロンプトを試して、アシスタントの動作を変更します。たとえば、海賊、教師、またはドメインの専門家として対応するように指示できます。

マルチターン会話を実装する

チャットアシスタントは、複数の交換でコンテキストを維持する必要があります。これを実現するには、すべてのメッセージ (システム、ユーザー、アシスタント) のベクトルを保持し、各要求で完全なリストを送信します。モデルでは、この履歴を使用してコンテキストに関連する応答が生成されます。

次のメッセージ交換ループを追加します。

コンソールからユーザー入力を読み取ります。
履歴にユーザーメッセージを追加します。
完全な履歴をモデルに送信します。
アシスタントの応答を次のターンの履歴に追加します。

loop {
    print!("You: ");
    io::stdout().flush()?;

    let mut input = String::new();
    stdin.lock().read_line(&mut input)?;
    let input = input.trim();

    if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
        break;
    }

    // Add the user's message to conversation history
    messages.push(ChatCompletionRequestUserMessage::from(input).into());

    // Stream the response token by token
    print!("Assistant: ");
    io::stdout().flush()?;
    let mut full_response = String::new();
    let mut stream = client.complete_streaming_chat(&messages, None).await?;
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        if let Some(choice) = chunk.choices.first() {
            if let Some(ref content) = choice.delta.content {
                print!("{content}");
                io::stdout().flush()?;
                full_response.push_str(content);
            }
        }
    }
    println!("\n");

    // Add the complete response to conversation history
    let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
        serde_json::json!({"role": "assistant", "content": full_response}),
    )?;
    messages.push(assistant_msg);
}

complete_chatの各呼び出しは、メッセージの完全な履歴を受け取ります。これは、モデルが前のターンを "記憶" する方法です。呼び出し間の状態は格納されません。

ストリーミング応答を追加する

ストリーミングでは、生成された各トークンが出力されるため、アシスタントの応答性が向上します。 complete_chat呼び出しをcomplete_streaming_chatに置き換えて、応答トークンをトークンでストリーミングします。

ストリーミングを使用するように会話ループを更新します。

// Stream the response token by token
print!("Assistant: ");
io::stdout().flush()?;
let mut full_response = String::new();
let mut stream = client.complete_streaming_chat(&messages, None).await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(choice) = chunk.choices.first() {
        if let Some(ref content) = choice.delta.content {
            print!("{content}");
            io::stdout().flush()?;
            full_response.push_str(content);
        }
    }
}
println!("\n");

ストリーミングバージョンでは完全な応答が蓄積されるため、ストリームの完了後に会話履歴に追加できます。

完成したコード

src/main.rsの内容を次の完全なコードに置き換えます。

use foundry_local_sdk::{
    ChatCompletionRequestMessage,
    ChatCompletionRequestSystemMessage, ChatCompletionRequestUserMessage,
    FoundryLocalConfig, FoundryLocalManager,
};
use std::io::{self, BufRead, Write};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the Foundry Local SDK
    let manager = FoundryLocalManager::create(FoundryLocalConfig::new("chat-assistant"))?;

    // Download and register all execution providers.
    manager
        .download_and_register_eps_with_progress(None, {
            let mut current_ep = String::new();
            move |ep_name: &str, percent: f64| {
                if ep_name != current_ep {
                    if !current_ep.is_empty() {
                        println!();
                    }
                    current_ep = ep_name.to_string();
                }
                print!("\r  {:<30}  {:5.1}%", ep_name, percent);
                io::stdout().flush().ok();
            }
        })
        .await?;
    println!();

    // Select and load a model from the catalog
    let model = manager.catalog().get_model("qwen2.5-0.5b").await?;

    if !model.is_cached().await? {
        println!("Downloading model...");
        model
            .download(Some(|progress: f64| {
                print!("\r  {progress:.1}%");
                io::stdout().flush().ok();
            }))
            .await?;
        println!();
    }

    model.load().await?;
    println!("Model loaded and ready.");

    // Create a chat client
    let client = model.create_chat_client().temperature(0.7).max_tokens(512);

    // Start the conversation with a system prompt
    let mut messages: Vec<ChatCompletionRequestMessage> = vec![
        ChatCompletionRequestSystemMessage::from(
            "You are a helpful, friendly assistant. Keep your responses \
             concise and conversational. If you don't know something, say so.",
        )
        .into(),
    ];

    println!("\nChat assistant ready! Type 'quit' to exit.\n");

    let stdin = io::stdin();
    loop {
        print!("You: ");
        io::stdout().flush()?;

        let mut input = String::new();
        stdin.lock().read_line(&mut input)?;
        let input = input.trim();

        if input.eq_ignore_ascii_case("quit") || input.eq_ignore_ascii_case("exit") {
            break;
        }

        // Add the user's message to conversation history
        messages.push(ChatCompletionRequestUserMessage::from(input).into());

        // Stream the response token by token
        print!("Assistant: ");
        io::stdout().flush()?;
        let mut full_response = String::new();
        let mut stream = client.complete_streaming_chat(&messages, None).await?;
        while let Some(chunk) = stream.next().await {
            let chunk = chunk?;
            if let Some(choice) = chunk.choices.first() {
                if let Some(ref content) = choice.delta.content {
                    print!("{content}");
                    io::stdout().flush()?;
                    full_response.push_str(content);
                }
            }
        }
        println!("\n");

        // Add the complete response to conversation history
        let assistant_msg: ChatCompletionRequestMessage = serde_json::from_value(
            serde_json::json!({"role": "assistant", "content": full_response}),
        )?;
        messages.push(assistant_msg);
    }

    // Clean up - unload the model
    model.unload().await?;
    println!("Model unloaded. Goodbye!");

    Ok(())
}

チャットアシスタントを実行します。

cargo run

次のような出力が表示されます。

Downloading model: 100.00%
Model loaded and ready.

Chat assistant ready! Type 'quit' to exit.

You: What is photosynthesis?
Assistant: Photosynthesis is the process plants use to convert sunlight, water, and carbon
dioxide into glucose and oxygen. It mainly happens in the leaves, inside structures
called chloroplasts.

You: Why is it important for other living things?
Assistant: It's essential because photosynthesis produces the oxygen that most living things
breathe. It also forms the base of the food chain — animals eat plants or eat other
animals that depend on plants for energy.

You: quit
Model unloaded. Goodbye!

アシスタントが前のターンのコンテキストをどのように記憶しているかに注目してください。「なぜ他の生き物にとって重要なのか」と尋ねると、まだ光合成について話していることがわかります。

次の方法で共有

チュートリアル: Foundry Local を使用してマルチターン チャット アシスタントを構築する

前提条件

サンプル リポジトリ

パッケージをインストールする

カタログを参照してモデルを選択する

システム プロンプトを定義する

マルチターン会話を実装する

ストリーミング応答を追加する

完成したコード

サンプル リポジトリ

パッケージをインストールする

カタログを参照してモデルを選択する

システム プロンプトを定義する

マルチターン会話を実装する

ストリーミング応答を追加する

完成したコード

サンプル リポジトリ

パッケージをインストールする

カタログを参照してモデルを選択する

システム プロンプトを定義する

マルチターン会話を実装する

ストリーミング応答を追加する

完成したコード

サンプル リポジトリ

パッケージをインストールする

カタログを参照してモデルを選択する

システム プロンプトを定義する

マルチターン会話を実装する

ストリーミング応答を追加する

完成したコード

リソースをクリーンアップする

関連するコンテンツ

フィードバック

その他のリソース

サンプルリポジトリ

システムプロンプトを定義する

サンプルリポジトリ

システムプロンプトを定義する

サンプルリポジトリ

システムプロンプトを定義する

サンプルリポジトリ

システムプロンプトを定義する