Evaluate agents for Microsoft 365 Copilot in Copilot Studio

Important

Some of the functionality described in this release plan has not been released. Delivery timelines may change and projected functionality may not be released (see Microsoft policy). Learn more: What's new and planned

Enabled for Public preview General availability
Admins, makers, marketers, or analysts, automatically Jul 2026 -

Business value

Agents for Microsoft 365 Copilot evaluation enables enterprise-grade validation of declarative agents used in real and critical business workflows, and moves teams from manual, ad-hoc testing to a scalable and standardized evaluation practice. With this feature you can reduce production risk by detecting quality, correctness, and behavioral issues before release, and improve release confidence and iterations through automated, repeatable, and explainable evaluation processes.

Feature details

Evaluation for Agents for Microsoft 365 Copilot (also referred to as declarative agents) provides a comprehensive evaluation framework tailored for declarative agents across development, testing, and production readiness stages.

It enables structured validation of quality, behavior, and reliability using automated and repeatable workflows.

Declarative agent evaluation includes the following capabilities:

Evaluation setup and inputs

Analysts can configure what data should be used in tests:

  • Create and manage custom test data.
  • Create evaluation inputs from existing conversations.

Evaluation execution

Analysts can configure the scope or type of evaluation execution:

  • Automated evaluation runs
  • Full conversation (multi-turn) evaluation
  • Authenticated evaluation context

Evaluation methodology (graders)

Analysts can access these graders:

  • Set-level grading framework
  • Similarity grader
  • Semantic meaning comparison
  • Keyword match
  • Custom grader with configurable labels
  • AI-based quality graders
  • Tool and topic invocation grader

They can also use multiple graders per input.

Analysis and storytelling

Analysts can also use reporting tools and analysis features:

  • Aggregated result analysis
  • Drill-down views for per-test inspection
  • Activity map visualization
  • Capture and analyze user reactions

Progress tracking and comparison

Analysts can manage results and runs contextually with:

  • Run-to-run comparisons
  • Evaluation results export

Geographic areas

Visit the Explore Feature Geography report for Microsoft Azure areas where this feature is planned or available.

Language availability

Visit the Explore Feature Language report for information on this feature's availability.