Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
As conversations grow, the token count of the chat history can exceed model context windows or drive up costs. Compaction strategies reduce the size of conversation history while preserving important context, so agents can continue functioning over long-running interactions.
Important
The compaction framework is currently experimental. To use it, you will need to add #pragma warning disable MAAI001.
Important
The compaction framework is currently experimental in Python. Import strategies from agent_framework._compaction.
Why compaction matters
Every call to an LLM includes the full conversation history. Without compaction:
- Token limits — Conversations eventually exceed the model's context window, causing errors.
- Cost — Larger prompts consume more tokens, increasing API costs.
- Latency — More input tokens means slower response times.
Compaction solves these problems by selectively removing, collapsing, or summarizing older portions of the conversation.
Core concepts
Applicability: In-memory history agents only
Compaction applies only to agents that manage their own conversation history in memory. Agents that rely on service-managed context or conversation state do not benefit from compaction because the service already handles context management. Examples of service-managed agents include:
- Foundry Agents — context is managed server-side by the Azure AI Foundry service.
- Responses API with store enabled (the default) — conversation state is stored and managed by the OpenAI service.
- Copilot Studio agents — conversation context is maintained by the Copilot Studio service.
For these agent types, configuring a compaction strategy has no effect. Compaction is only relevant when the agent maintains its own in-memory message list and passes the full history to the model on each call.
Compaction operates on a MessageIndex — a structured view of the flat message list that groups messages into atomic units called MessageGroup instances. Each group tracks its message count, byte count, and estimated token count.
Message groups
A MessageGroup represents logically related messages that must be kept or removed together. For example, an assistant message containing tool calls and its corresponding tool result messages form an atomic group — removing one without the other would cause LLM API errors.
Each group has a MessageGroupKind:
| Kind | Description |
|---|---|
System |
One or more system messages. Always preserved during compaction. |
User |
A single user message that starts a new turn. |
AssistantText |
A plain assistant text response (no tool calls). |
ToolCall |
An assistant message with tool calls and the corresponding tool result messages, treated as an atomic unit. |
Summary |
A condensed message produced by summarization compaction. |
Triggers
A CompactionTrigger is a delegate that evaluates whether compaction should proceed based on current MessageIndex metrics:
public delegate bool CompactionTrigger(MessageIndex index);
The CompactionTriggers class provides common factory methods:
| Trigger | Fires when |
|---|---|
CompactionTriggers.Always |
Every time (unconditionally). |
CompactionTriggers.Never |
Never (disables compaction). |
CompactionTriggers.TokensExceed(maxTokens) |
Included token count exceeds the threshold. |
CompactionTriggers.MessagesExceed(maxMessages) |
Included message count exceeds the threshold. |
CompactionTriggers.TurnsExceed(maxTurns) |
Included user turn count exceeds the threshold. |
CompactionTriggers.GroupsExceed(maxGroups) |
Included group count exceeds the threshold. |
CompactionTriggers.HasToolCalls() |
At least one non-excluded tool call group exists. |
Combine triggers with CompactionTriggers.All(...) (logical AND) or CompactionTriggers.Any(...) (logical OR):
// Compact only when there are tool calls AND tokens exceed 2000
CompactionTrigger trigger = CompactionTriggers.All(
CompactionTriggers.HasToolCalls(),
CompactionTriggers.TokensExceed(2000));
Trigger vs. target
Every strategy has two predicates:
- Trigger — Controls when compaction begins. If the trigger returns
false, the strategy is skipped entirely. - Target — Controls when compaction stops. Strategies incrementally exclude groups and re-evaluate the target after each step, stopping as soon as the target returns
true.
When no target is specified, it defaults to the inverse of the trigger — compaction stops as soon as the trigger condition would no longer fire.
Compaction operates on a flat list of Message objects. Messages are annotated with lightweight group metadata, and strategies mutate those annotations in place to mark groups as excluded before the message list is projected to the model.
Message groups
Messages are grouped into atomic units. Each group is assigned a GroupKind:
| Kind | Description |
|---|---|
system |
System messages. Always preserved during compaction. |
user |
A single user message. |
assistant_text |
A plain assistant text response (no function calls). |
tool_call |
An assistant message with function calls plus the corresponding tool result messages, treated as an atomic unit. |
Compaction strategies
A CompactionStrategy is a protocol — any async callable that accepts a list[Message] and mutates it in place, returning True when it changed anything:
class CompactionStrategy(Protocol):
async def __call__(self, messages: list[Message]) -> bool: ...
Tokenizer
Token-aware strategies accept a TokenizerProtocol implementation. The built-in CharacterEstimatorTokenizer uses a 4-character-per-token heuristic:
from agent_framework._compaction import CharacterEstimatorTokenizer
tokenizer = CharacterEstimatorTokenizer()
Pass a custom tokenizer when you need accurate token counts for a specific model's encoding.
Compaction strategies
All strategies inherit from the abstract CompactionStrategy base class. Each strategy preserves system messages and respects a MinimumPreserved floor that protects the most-recent non-system groups from removal.
Compaction strategies are imported from agent_framework._compaction.
TruncationCompactionStrategy
TruncationStrategy
The most straightforward approach: removes the oldest non-system message groups until the target condition is met.
- Respects atomic group boundaries (tool call and result messages are removed together).
- Best for hard token-budget backstops.
MinimumPreserveddefaults to32.
// Drop oldest groups when tokens exceed 32K, keeping at least 10 recent groups
TruncationCompactionStrategy truncation = new(
trigger: CompactionTriggers.TokensExceed(0x8000),
minimumPreserved: 10);
- When a
tokenizeris provided, the metric is token count; otherwise it is included message count. preserve_systemdefaults toTrue.
from agent_framework._compaction import CharacterEstimatorTokenizer, TruncationStrategy
# Exclude oldest groups when tokens exceed 32 000, trimming to 16 000
truncation = TruncationStrategy(
max_n=32_000,
compact_to=16_000,
tokenizer=CharacterEstimatorTokenizer(),
)
SlidingWindowCompactionStrategy
SlidingWindowStrategy
Removes older conversation content to keep only the most recent window of exchanges, respecting logical conversation units rather than arbitrary message counts. System messages are preserved throughout.
- Best for bounding conversation length predictably.
Removes the oldest user turns and their associated response groups, operating on logical turn boundaries rather than individual groups.
- A turn starts with a user message and includes all subsequent assistant and tool-call groups until the next user message.
MinimumPreserveddefaults to1(preserves at least the most recent non-system group).
// Keep only the last 4 user turns
SlidingWindowCompactionStrategy slidingWindow = new(
trigger: CompactionTriggers.TurnsExceed(4));
Keeps only the most recent keep_last_groups non-system groups, excluding everything older.
preserve_systemdefaults toTrue.
from agent_framework._compaction import SlidingWindowStrategy
# Keep only the last 20 non-system groups
sliding_window = SlidingWindowStrategy(keep_last_groups=20)
ToolResultCompactionStrategy
Collapses older tool-call groups into compact summary messages, preserving a readable trace without the full message overhead.
- Does not touch user messages or plain assistant responses.
- Best as a first-pass strategy to reclaim space from verbose tool results.
- Replaces multi-message tool call groups (assistant call + tool results) with a short summary like
[Tool calls: get_weather, search_docs]. MinimumPreserveddefaults to2, ensuring the current turn's tool interactions remain visible.
// Collapse old tool results when tokens exceed 512
ToolResultCompactionStrategy toolCompaction = new(
trigger: CompactionTriggers.TokensExceed(0x200));
- Collapses into compact summary messages such as
[Tool results: get_weather: sunny, 18°C]. - The most recent
keep_last_tool_call_groupstool-call groups are left untouched.
from agent_framework._compaction import ToolResultCompactionStrategy
# Collapse all but the newest tool-call group
tool_result = ToolResultCompactionStrategy(keep_last_tool_call_groups=1)
SummarizationCompactionStrategy
SummarizationStrategy
Uses an LLM to summarize older portions of the conversation, replacing them with a single summary message.
- A default prompt preserves key facts, decisions, user preferences, and tool call outcomes.
- Requires a separate LLM client for summarization — a smaller, faster model is recommended.
- Best for preserving conversational context while significantly reducing token count.
- You can provide a custom summarization prompt.
- Protects system messages and the most recent
MinimumPreservednon-system groups (default:4). - Sends the older messages to a separate
IChatClientwith a summarization prompt, then inserts the summary as aMessageGroupKind.Summarygroup.
// Summarize older messages when tokens exceed 1280, keeping the last 4 groups
SummarizationCompactionStrategy summarization = new(
chatClient: summarizerChatClient,
trigger: CompactionTriggers.TokensExceed(0x500),
minimumPreserved: 4);
You can provide a custom summarization prompt:
SummarizationCompactionStrategy summarization = new(
chatClient: summarizerChatClient,
trigger: CompactionTriggers.TokensExceed(0x500),
summarizationPrompt: "Summarize the key decisions and user preferences only.");
- Triggers when included non-system message count exceeds
target_count + threshold. - Retains the newest
target_countmessages; summarizes everything older. - Requires a
SupportsChatGetResponseclient.
from agent_framework._compaction import SummarizationStrategy
# Summarize when non-system message count exceeds 6, retaining the 4 newest
summarization = SummarizationStrategy(
client=summarizer_client,
target_count=4,
threshold=2,
)
Provide a custom summarization prompt:
summarization = SummarizationStrategy(
client=summarizer_client,
target_count=4,
prompt="Summarize the key decisions and user preferences only.",
)
PipelineCompactionStrategy
Composes multiple strategies into a sequential pipeline. Each strategy operates on the result of the previous one, enabling layered compaction from gentle to aggressive.
- The pipeline's own trigger is
CompactionTriggers.Always— each child strategy evaluates its own trigger independently. - Strategies execute in order, so put the gentlest strategies first.
PipelineCompactionStrategy pipeline = new(
new ToolResultCompactionStrategy(CompactionTriggers.TokensExceed(0x200)),
new SummarizationCompactionStrategy(summarizerChatClient, CompactionTriggers.TokensExceed(0x500)),
new SlidingWindowCompactionStrategy(CompactionTriggers.TurnsExceed(4)),
new TruncationCompactionStrategy(CompactionTriggers.TokensExceed(0x8000)));
This pipeline:
- Collapses old tool results (gentle).
- Summarizes older conversation spans (moderate).
- Keeps only the last 4 user turns (aggressive).
- Drops oldest groups if still over budget (emergency backstop).
SelectiveToolCallCompactionStrategy
Fully excludes older tool-call groups, keeping only the last keep_last_tool_call_groups.
- Does not touch user or plain assistant messages.
- Best when tool chatter dominates token usage and the full tool history is not needed.
from agent_framework._compaction import SelectiveToolCallCompactionStrategy
# Keep only the most recent tool-call group
selective_tool = SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=1)
TokenBudgetComposedStrategy
Composes multiple strategies into a sequential pipeline driven by a token budget. Each child strategy runs in order, stopping early once the budget is satisfied. A built-in fallback excludes the oldest groups if the strategies alone cannot reach the target.
- Strategies execute in order; place the gentlest strategies first.
early_stop=True(the default) stops as soon as the token budget is satisfied.
from agent_framework._compaction import (
CharacterEstimatorTokenizer,
SelectiveToolCallCompactionStrategy,
SlidingWindowStrategy,
SummarizationStrategy,
TokenBudgetComposedStrategy,
ToolResultCompactionStrategy,
)
tokenizer = CharacterEstimatorTokenizer()
pipeline = TokenBudgetComposedStrategy(
token_budget=16_000,
tokenizer=tokenizer,
strategies=[
ToolResultCompactionStrategy(keep_last_tool_call_groups=1),
SummarizationStrategy(client=summarizer_client, target_count=4, threshold=2),
SlidingWindowStrategy(keep_last_groups=20),
],
)
This pipeline:
- Collapses old tool results (gentle).
- Summarizes older conversation spans (moderate).
- Keeps only the last 20 groups (aggressive).
- Falls back to oldest-first exclusion if still over budget (emergency backstop).
Using compaction with an agent
Wrap a compaction strategy in a CompactionProvider and register it as an AIContextProvider. Pass either a single strategy or a PipelineCompactionStrategy to the constructor.
Registering with the builder API
Register the provider on the ChatClientBuilder using UseAIContextProviders. The provider runs inside the tool-calling loop, compacting messages before each LLM call.
IChatClient agentChatClient = openAIClient.GetChatClient(deploymentName).AsIChatClient();
IChatClient summarizerChatClient = openAIClient.GetChatClient(deploymentName).AsIChatClient();
PipelineCompactionStrategy compactionPipeline =
new(
new ToolResultCompactionStrategy(CompactionTriggers.TokensExceed(0x200)),
new SummarizationCompactionStrategy(summarizerChatClient, CompactionTriggers.TokensExceed(0x500)),
new SlidingWindowCompactionStrategy(CompactionTriggers.TurnsExceed(4)),
new TruncationCompactionStrategy(CompactionTriggers.TokensExceed(0x8000)));
AIAgent agent =
agentChatClient
.AsBuilder()
.UseAIContextProviders(new CompactionProvider(compactionPipeline))
.BuildAIAgent(
new ChatClientAgentOptions
{
Name = "ShoppingAssistant",
ChatOptions = new()
{
Instructions = "You are a helpful shopping assistant.",
Tools = [AIFunctionFactory.Create(LookupPrice)],
},
});
AgentSession session = await agent.CreateSessionAsync();
Console.WriteLine(await agent.RunAsync("What's the price of a laptop?", session));
Tip
Use a smaller, cheaper model (such as gpt-4o-mini) for the summarization chat client to reduce costs while maintaining summary quality.
If only one strategy is needed, pass it directly to CompactionProvider without wrapping it in a PipelineCompactionStrategy:
agentChatClient
.AsBuilder()
.UseAIContextProviders(new CompactionProvider(
new SlidingWindowCompactionStrategy(CompactionTriggers.TurnsExceed(20))))
.BuildAIAgent(...);
Registering through ChatClientAgentOptions
The provider can also be specified directly on ChatClientAgentOptions.AIContextProviders:
AIAgent agent = agentChatClient
.AsBuilder()
.BuildAIAgent(new ChatClientAgentOptions
{
AIContextProviders = [new CompactionProvider(compactionPipeline)]
});
Note
When registered through ChatClientAgentOptions, the CompactionProvider is not engaged during the tool-calling loop. Agent-level context providers run before chat history is stored, so any synthetic summary messages produced by CompactionProvider can become part of the persisted history when using ChatHistoryProvider. To compact only the in-flight request context while preserving the original stored history, register the provider on the ChatClientBuilder via UseAIContextProviders(...) instead.
Ad-hoc compaction
CompactionProvider.CompactAsync applies a strategy to an arbitrary message list without an active agent session:
IEnumerable<ChatMessage> compacted = await CompactionProvider.CompactAsync(
new TruncationCompactionStrategy(CompactionTriggers.TokensExceed(8000)),
existingMessages);
CompactionProvider is a context provider that applies compaction strategies before and after each agent run. Add it alongside a history provider in the agent's context_providers list.
before_strategy— runs before the model call, compacting messages already loaded into the context.after_strategy— runs after the model call, compacting the messages stored by the history provider so the next turn starts smaller.history_source_id— thesource_idof the history provider whose stored messagesafter_strategyshould compact (defaults to"in_memory").
Registering with an agent
from agent_framework import Agent, CompactionProvider, InMemoryHistoryProvider
from agent_framework._compaction import (
CharacterEstimatorTokenizer,
SlidingWindowStrategy,
SummarizationStrategy,
TokenBudgetComposedStrategy,
ToolResultCompactionStrategy,
)
tokenizer = CharacterEstimatorTokenizer()
pipeline = TokenBudgetComposedStrategy(
token_budget=16_000,
tokenizer=tokenizer,
strategies=[
ToolResultCompactionStrategy(keep_last_tool_call_groups=1),
SummarizationStrategy(client=summarizer_client, target_count=4, threshold=2),
SlidingWindowStrategy(keep_last_groups=20),
],
)
history = InMemoryHistoryProvider()
compaction = CompactionProvider(
before_strategy=pipeline,
history_source_id=history.source_id,
)
agent = Agent(
client=client,
name="ShoppingAssistant",
instructions="You are a helpful shopping assistant.",
context_providers=[history, compaction],
)
session = agent.create_session()
print(await agent.run("What's the price of a laptop?", session=session))
Tip
Use a smaller, cheaper model (such as gpt-4o-mini) for the summarization client to reduce costs while maintaining summary quality.
If only one strategy is needed, pass it directly as before_strategy:
compaction = CompactionProvider(
before_strategy=SlidingWindowStrategy(keep_last_groups=20),
history_source_id=history.source_id,
)
Compacting persisted history after each run
Use after_strategy to compact the messages stored by the history provider so that future turns begin with a reduced context:
compaction = CompactionProvider(
before_strategy=SlidingWindowStrategy(keep_last_groups=20),
after_strategy=ToolResultCompactionStrategy(keep_last_tool_call_groups=1),
history_source_id=history.source_id,
)
Ad-hoc compaction
apply_compaction applies a strategy to an arbitrary message list outside an active agent session:
from agent_framework._compaction import apply_compaction, TruncationStrategy, CharacterEstimatorTokenizer
tokenizer = CharacterEstimatorTokenizer()
compacted = await apply_compaction(
messages,
strategy=TruncationStrategy(
max_n=8_000,
compact_to=4_000,
tokenizer=tokenizer,
),
tokenizer=tokenizer,
)
Choosing a strategy
| Strategy | Aggressiveness | Preserves context | Requires LLM | Best for |
|---|---|---|---|---|
ToolResultCompactionStrategy |
Low | High — only collapses tool results | No | Reclaiming space from verbose tool output |
SummarizationCompactionStrategy |
Medium | Medium — replaces history with a summary | Yes | Long conversations where context matters |
SlidingWindowCompactionStrategy |
High | Low — drops entire turns | No | Hard turn-count limits |
TruncationCompactionStrategy |
High | Low — drops oldest groups | No | Emergency token-budget backstops |
PipelineCompactionStrategy |
Configurable | Depends on child strategies | Depends | Layered compaction with multiple fallbacks |
| Strategy | Aggressiveness | Preserves context | Requires LLM | Best for |
|---|---|---|---|---|
ToolResultCompactionStrategy |
Low | High — collapses tool results into summary messages | No | Reclaiming space from verbose tool output |
SelectiveToolCallCompactionStrategy |
Low–Medium | Medium — fully excludes old tool-call groups | No | Removing tool history when results are no longer needed |
SummarizationStrategy |
Medium | Medium — replaces history with a summary | Yes | Long conversations where context matters |
SlidingWindowStrategy |
High | Low — drops oldest groups | No | Hard group-count limits |
TruncationStrategy |
High | Low — drops oldest groups | No | Emergency message- or token-budget backstops |
TokenBudgetComposedStrategy |
Configurable | Depends on child strategies | Depends | Layered compaction with a token-budget goal and multiple fallbacks |