Edit

Share via


Use Foundry Local native chat completions API

Important

  • Foundry Local is available in preview. Public preview releases provide early access to features that are in active deployment.
  • Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).

The native chat completions API enables you to run chat completions directly in-process, without starting a REST web server.

In this article, you create a console app that downloads a local model, generates a streaming chat response, and then unloads the model.

This article explains how to use the native chat completions API in the Foundry Local SDK.

Prerequisites

  • .NET 9.0 SDK or later installed.
  • Azure role-based access control (RBAC): Not applicable.

Samples repository

You can find the sample in this article in the Foundry Local SDK Samples GitHub repository.

Set up project

Use Foundry Local in your C# project by following these Windows-specific or Cross-Platform (macOS/Linux/Windows) instructions:

  1. Create a new C# project and navigate into it:
    dotnet new console -n app-name
    cd app-name
    
  2. Open and edit the app-name.csproj file to:
    <Project Sdk="Microsoft.NET.Sdk">
    
      <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net9.0-windows10.0.26100</TargetFramework>
        <RootNamespace>app-name</RootNamespace>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>enable</Nullable>
        <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained>
        <WindowsPackageType>None</WindowsPackageType>
        <EnableCoreMrtTooling>false</EnableCoreMrtTooling>
      </PropertyGroup>
    
      <PropertyGroup Condition="'$(RuntimeIdentifier)'==''">
        <RuntimeIdentifier>$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
      </PropertyGroup>
    
      <ItemGroup>
        <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.9.0" />
        <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" />
        <PackageReference Include="OpenAI" Version="2.5.0" />
      </ItemGroup>
    
    </Project>
    
  3. Create a nuget.config file in the project root with the following content so that the packages restore correctly:
    <?xml version="1.0" encoding="utf-8"?>
    <configuration>
      <packageSources>
        <clear />
        <add key="nuget.org" value="https://api.nuget.org/v3/index.json" />
        <add key="ORT" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/nuget/v3/index.json" />
      </packageSources>
      <packageSourceMapping>
        <packageSource key="nuget.org">
          <package pattern="*" />
        </packageSource>
        <packageSource key="ORT">
          <package pattern="*Foundry*" />
        </packageSource>
      </packageSourceMapping>
    </configuration>
    

Note

The Microsoft.AI.Foundry.Local NuGet package targets net8.0. With .NET's forward compatibility, it works seamlessly in projects targeting .NET 9, .NET 10, and later—no other configuration needed. The SDK uses only .NET 8 APIs and contains no framework-specific code paths, so behavior is identical regardless of which runtime your app targets. We target .NET 8 as it's the current Long Term Support (LTS) release with the broadest install base.

Use native chat completions API

The following example demonstrates how to use the native chat completions API in Foundry Local. The code includes the following steps:

  1. Initializes a FoundryLocalManager instance with a Configuration.

  2. Gets a Model object from the model catalog using an alias.

    Note

    Foundry Local automatically selects the best variant for the model based on the available hardware of the host machine.

  3. Downloads and loads the model variant.

  4. Uses the native chat completions API to generate a response.

  5. Unloads the model.

Copy and paste the following code into a C# file named Program.cs:

using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;

CancellationToken ct = CancellationToken.None;

var config = new Configuration
{
    AppName = "app-name",
    LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

// Initialize the singleton instance.
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;

// Get the model catalog
var catalog = await mgr.GetCatalogAsync();

// Get a model using an alias
var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");

// Download the model (the method skips download if already cached)
await model.DownloadAsync(progress =>
{
    Console.Write($"\rDownloading model: {progress:F2}%");
    if (progress >= 100f)
    {
        Console.WriteLine();
    }
});

// Load the model
await model.LoadAsync();

// Get a chat client
var chatClient = await model.GetChatClientAsync();

// Create a chat message
List<ChatMessage> messages = new()
{
    new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
};

var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
    Console.Write(chunk.Choices[0].Message.Content);
    Console.Out.Flush();
}
Console.WriteLine();

// Tidy up - unload the model
await model.UnloadAsync();

References:

Optional: list model aliases available on your device

If you don't know which model alias to use, list the models available for your hardware.

// List available models and aliases
Console.WriteLine("Available models for your hardware:");
var models = await catalog.ListModelsAsync();
foreach (var availableModel in models)
{
    foreach (var variant in availableModel.Variants)
    {
        Console.WriteLine($"  - Alias: {variant.Alias}");
    }
}

References:

Run the code by using the following command:

For x64 Windows, use the following command:

dotnet run -r:win-x64

For arm64 Windows, use the following command:

dotnet run -r:win-arm64

Troubleshooting

  • Build errors referencing net9.0: Install the .NET 9.0 SDK, then rebuild the app.
  • Model not found: Run the optional model listing snippet to find an alias available on your device, then update the alias passed to GetModelAsync.
  • Slow first run: Model downloads can take time the first time you run the app.

Prerequisites

Samples repository

You can find the sample in this article in the Foundry Local SDK Samples GitHub repository.

Set up project

Use Foundry Local in your JavaScript project by following these Windows-specific or Cross-Platform (macOS/Linux/Windows) instructions:

  1. Create a new JavaScript project:
    mkdir app-name
    cd app-name
    npm init -y
    npm pkg set type=module
    
  2. Install the Foundry Local SDK package:
    npm install --winml foundry-local-sdk
    npm install openai
    

Use native chat completions API

The following example demonstrates how to use the native chat completions API in Foundry Local. The benefit of using the native chat completions API is there's no need for a REST web server running and therefore it provides a simplified deployment. The code includes the following steps:

  1. Initializes a FoundryLocalManager instance with a configuration.
  2. Gets a Model object from the model catalog using an alias.
  3. Downloads and loads the model variant.
  4. Uses the native chat completions API to generate a response.
  5. Unloads the model.

Copy and paste the following code into a JavaScript file named app.js:

import { FoundryLocalManager } from 'foundry-local-sdk';

// Initialize the Foundry Local SDK
console.log('Initializing Foundry Local SDK...');

const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});
console.log('✓ SDK initialized successfully');

// Get the model object
const modelAlias = 'qwen2.5-0.5b'; // Using an available model from the list above
const model = await manager.catalog.getModel(modelAlias);

// Download the model
console.log(`\nDownloading model ${modelAlias}...`);
await model.download((progress) => {
    process.stdout.write(`\rDownloading... ${progress.toFixed(2)}%`);
});
console.log('\n✓ Model downloaded');

// Load the model
console.log(`\nLoading model ${modelAlias}...`);
await model.load();
console.log('✓ Model loaded');

// Create chat client
console.log('\nCreating chat client...');
const chatClient = model.createChatClient();
console.log('✓ Chat client created');

// Example chat completion
console.log('\nTesting chat completion...');
const completion = await chatClient.completeChat([
    { role: 'user', content: 'Why is the sky blue?' }
]);

console.log('\nChat completion result:');
console.log(completion.choices[0]?.message?.content);

// Example streaming completion
console.log('\nTesting streaming completion...');
await chatClient.completeStreamingChat(
    [{ role: 'user', content: 'Write a short poem about programming.' }],
    (chunk) => {
        const content = chunk.choices?.[0]?.message?.content;
        if (content) {
            process.stdout.write(content);
        }
    }
);
console.log('\n');

// Unload the model
console.log('Unloading model...');
await model.unload();
console.log(`✓ Model unloaded`);

Run the code

Run the code by using the following command:

node app.js