Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
- Foundry Local is available in preview. Public preview releases provide early access to features that are in active deployment.
- Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
The native chat completions API enables you to run chat completions directly in-process, without starting a REST web server.
In this article, you create a console app that downloads a local model, generates a streaming chat response, and then unloads the model.
This article explains how to use the native chat completions API in the Foundry Local SDK.
Prerequisites
- .NET 9.0 SDK or later installed.
- Azure role-based access control (RBAC): Not applicable.
Samples repository
You can find the sample in this article in the Foundry Local SDK Samples GitHub repository.
Set up project
Use Foundry Local in your C# project by following these Windows-specific or Cross-Platform (macOS/Linux/Windows) instructions:
- Create a new C# project and navigate into it:
dotnet new console -n app-name cd app-name - Open and edit the
app-name.csprojfile to:<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net9.0-windows10.0.26100</TargetFramework> <RootNamespace>app-name</RootNamespace> <ImplicitUsings>enable</ImplicitUsings> <Nullable>enable</Nullable> <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained> <WindowsPackageType>None</WindowsPackageType> <EnableCoreMrtTooling>false</EnableCoreMrtTooling> </PropertyGroup> <PropertyGroup Condition="'$(RuntimeIdentifier)'==''"> <RuntimeIdentifier>$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.9.0" /> <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" /> <PackageReference Include="OpenAI" Version="2.5.0" /> </ItemGroup> </Project> - Create a
nuget.configfile in the project root with the following content so that the packages restore correctly:<?xml version="1.0" encoding="utf-8"?> <configuration> <packageSources> <clear /> <add key="nuget.org" value="https://api.nuget.org/v3/index.json" /> <add key="ORT" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/nuget/v3/index.json" /> </packageSources> <packageSourceMapping> <packageSource key="nuget.org"> <package pattern="*" /> </packageSource> <packageSource key="ORT"> <package pattern="*Foundry*" /> </packageSource> </packageSourceMapping> </configuration>
Note
The Microsoft.AI.Foundry.Local NuGet package targets net8.0. With .NET's forward compatibility, it works seamlessly in projects targeting .NET 9, .NET 10, and later—no other configuration needed. The SDK uses only .NET 8 APIs and contains no framework-specific code paths, so behavior is identical regardless of which runtime your app targets. We target .NET 8 as it's the current Long Term Support (LTS) release with the broadest install base.
Use native chat completions API
The following example demonstrates how to use the native chat completions API in Foundry Local. The code includes the following steps:
Initializes a
FoundryLocalManagerinstance with aConfiguration.Gets a
Modelobject from the model catalog using an alias.Note
Foundry Local automatically selects the best variant for the model based on the available hardware of the host machine.
Downloads and loads the model variant.
Uses the native chat completions API to generate a response.
Unloads the model.
Copy and paste the following code into a C# file named Program.cs:
using Microsoft.AI.Foundry.Local;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
using Microsoft.Extensions.Logging;
CancellationToken ct = CancellationToken.None;
var config = new Configuration
{
AppName = "app-name",
LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information
};
using var loggerFactory = LoggerFactory.Create(builder =>
{
builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();
// Initialize the singleton instance.
await FoundryLocalManager.CreateAsync(config, logger);
var mgr = FoundryLocalManager.Instance;
// Get the model catalog
var catalog = await mgr.GetCatalogAsync();
// Get a model using an alias
var model = await catalog.GetModelAsync("qwen2.5-0.5b") ?? throw new Exception("Model not found");
// Download the model (the method skips download if already cached)
await model.DownloadAsync(progress =>
{
Console.Write($"\rDownloading model: {progress:F2}%");
if (progress >= 100f)
{
Console.WriteLine();
}
});
// Load the model
await model.LoadAsync();
// Get a chat client
var chatClient = await model.GetChatClientAsync();
// Create a chat message
List<ChatMessage> messages = new()
{
new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
};
var streamingResponse = chatClient.CompleteChatStreamingAsync(messages, ct);
await foreach (var chunk in streamingResponse)
{
Console.Write(chunk.Choices[0].Message.Content);
Console.Out.Flush();
}
Console.WriteLine();
// Tidy up - unload the model
await model.UnloadAsync();
References:
Optional: list model aliases available on your device
If you don't know which model alias to use, list the models available for your hardware.
// List available models and aliases
Console.WriteLine("Available models for your hardware:");
var models = await catalog.ListModelsAsync();
foreach (var availableModel in models)
{
foreach (var variant in availableModel.Variants)
{
Console.WriteLine($" - Alias: {variant.Alias}");
}
}
References:
Run the code by using the following command:
For x64 Windows, use the following command:
dotnet run -r:win-x64
For arm64 Windows, use the following command:
dotnet run -r:win-arm64
Troubleshooting
- Build errors referencing
net9.0: Install the .NET 9.0 SDK, then rebuild the app. Model not found: Run the optional model listing snippet to find an alias available on your device, then update the alias passed toGetModelAsync.- Slow first run: Model downloads can take time the first time you run the app.
Prerequisites
- Node.js 20 or later installed.
Samples repository
You can find the sample in this article in the Foundry Local SDK Samples GitHub repository.
Set up project
Use Foundry Local in your JavaScript project by following these Windows-specific or Cross-Platform (macOS/Linux/Windows) instructions:
- Create a new JavaScript project:
mkdir app-name cd app-name npm init -y npm pkg set type=module - Install the Foundry Local SDK package:
npm install --winml foundry-local-sdk npm install openai
Use native chat completions API
The following example demonstrates how to use the native chat completions API in Foundry Local. The benefit of using the native chat completions API is there's no need for a REST web server running and therefore it provides a simplified deployment. The code includes the following steps:
- Initializes a
FoundryLocalManagerinstance with a configuration. - Gets a
Modelobject from the model catalog using an alias. - Downloads and loads the model variant.
- Uses the native chat completions API to generate a response.
- Unloads the model.
Copy and paste the following code into a JavaScript file named app.js:
import { FoundryLocalManager } from 'foundry-local-sdk';
// Initialize the Foundry Local SDK
console.log('Initializing Foundry Local SDK...');
const manager = FoundryLocalManager.create({
appName: 'foundry_local_samples',
logLevel: 'info'
});
console.log('✓ SDK initialized successfully');
// Get the model object
const modelAlias = 'qwen2.5-0.5b'; // Using an available model from the list above
const model = await manager.catalog.getModel(modelAlias);
// Download the model
console.log(`\nDownloading model ${modelAlias}...`);
await model.download((progress) => {
process.stdout.write(`\rDownloading... ${progress.toFixed(2)}%`);
});
console.log('\n✓ Model downloaded');
// Load the model
console.log(`\nLoading model ${modelAlias}...`);
await model.load();
console.log('✓ Model loaded');
// Create chat client
console.log('\nCreating chat client...');
const chatClient = model.createChatClient();
console.log('✓ Chat client created');
// Example chat completion
console.log('\nTesting chat completion...');
const completion = await chatClient.completeChat([
{ role: 'user', content: 'Why is the sky blue?' }
]);
console.log('\nChat completion result:');
console.log(completion.choices[0]?.message?.content);
// Example streaming completion
console.log('\nTesting streaming completion...');
await chatClient.completeStreamingChat(
[{ role: 'user', content: 'Write a short poem about programming.' }],
(chunk) => {
const content = chunk.choices?.[0]?.message?.content;
if (content) {
process.stdout.write(content);
}
}
);
console.log('\n');
// Unload the model
console.log('Unloading model...');
await model.unload();
console.log(`✓ Model unloaded`);
Run the code
Run the code by using the following command:
node app.js