Evaluate agents for Microsoft 365 Copilot in Copilot Studio

Important

Some of the functionality described in this release plan has not been released. Delivery timelines may change and projected functionality may not be released (see Microsoft policy). Learn more: What's new and planned

Enabled for	Public preview	General availability
Admins, makers, marketers, or analysts, automatically	Jul 2026	-

Business value

Agents for Microsoft 365 Copilot evaluation enables enterprise-grade validation of declarative agents used in real and critical business workflows, and moves teams from manual, ad-hoc testing to a scalable and standardized evaluation practice. With this feature you can reduce production risk by detecting quality, correctness, and behavioral issues before release, and improve release confidence and iterations through automated, repeatable, and explainable evaluation processes.

Feature details

Evaluation for Agents for Microsoft 365 Copilot (also referred to as declarative agents) provides a comprehensive evaluation framework tailored for declarative agents across development, testing, and production readiness stages.

It enables structured validation of quality, behavior, and reliability using automated and repeatable workflows.

Declarative agent evaluation includes the following capabilities:

Evaluation setup and inputs

Analysts can configure what data should be used in tests:

Create and manage custom test data.
Create evaluation inputs from existing conversations.

Evaluation execution

Analysts can configure the scope or type of evaluation execution:

Automated evaluation runs
Full conversation (multi-turn) evaluation
Authenticated evaluation context

Evaluation methodology (graders)

Analysts can access these graders:

Set-level grading framework
Similarity grader
Semantic meaning comparison
Keyword match
Custom grader with configurable labels
AI-based quality graders
Tool and topic invocation grader

They can also use multiple graders per input.

Analysis and storytelling

Analysts can also use reporting tools and analysis features:

Aggregated result analysis
Drill-down views for per-test inspection
Activity map visualization
Capture and analyze user reactions

Progress tracking and comparison

Analysts can manage results and runs contextually with:

Run-to-run comparisons
Evaluation results export

Geographic areas

Visit the Explore Feature Geography report for Microsoft Azure areas where this feature is planned or available.

Language availability

Visit the Explore Feature Language report for information on this feature's availability.

Feedback

Was this page helpful?

Last updated on 2026-03-18