Nota
L'accesso a questa pagina richiede l'autorizzazione. È possibile provare ad accedere o modificare le directory.
L'accesso a questa pagina richiede l'autorizzazione. È possibile provare a modificare le directory.
The Microsoft 365 Copilot Agent Evaluations CLI (@microsoft/m365-copilot-eval) helps you test, measure, and improve the quality of your agents through automated prompt evaluation and AI-based scoring. This quickstart walks you through installing the Agent Evaluations tool, configuring your environment, creating your first dataset, and running an evaluation.
Note
The Agent Evaluations CLI is currently in preview. Features and functionality are subject to change.
Prerequisites
Before you begin, make sure that you have:
- A Microsoft 365 Copilot agent deployed to your tenant.
- Node.js 24.12.0 or later (use
node --versionto check). - Access to an Azure OpenAI in Foundry Models resource with GPT-4o-mini deployed.
- Microsoft Entra admin consent granted for the Agent Evaluations CLI in your
tenant. If you aren't a tenant admin, ask your admin to grant consent before
you run
runevalsfor the first time. For more information, see Grant admin consent. - Your tenant ID, Azure OpenAI endpoint, and API key. If you don't have these values, see Get values for environment variables.
Note
This quickstart assumes you're using a Windows development environment. Authentication support for other operating systems is coming soon.
Step 1: Install the CLI
Install the Agent Evaluations CLI globally by using npm:
npm install -g @microsoft/m365-copilot-eval
Verify the installation:
runevals --version
After installation, the runevals command is available globally on your system.
Step 2: Set up your project structure
Run the evaluation tool from your Microsoft 365 agent project directory (where your agent code lives), not from the evaluations tool repository.
cd /path/to/your-agent-project
Your agent project should include the following files and folders:
my-agent/
├── .env.local # Agent configuration (Agents Toolkit projects)
├── .env.local.user # Secrets — never committed
├── evals/
│ └── evals.json # Your test dataset (auto-discovered)
└── .evals/
└── <generated reports> # Results written here (YYYY-MM-DD_HH-MM-SS.html)
You create the evals/evals.json dataset in Step 4. The .evals/ report folder is created automatically on first run.
Step 3: Configure environment variables
Choose the option that matches your project type.
Tip
If you built your agent by using Microsoft 365 Agents Toolkit, you already have .env.local with your agent configuration. Create .env.local.user in your project root for secrets.
Microsoft 365 Agents Toolkit projects
Add secrets to .env.local.user:
# .env.local.user (NOT checked in — secrets go here)
AZURE_AI_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
AZURE_AI_API_KEY="your-api-key-here"
TENANT_ID="your-tenant-id-here"
AZURE_AI_API_VERSION="2024-12-01-preview" # default
AZURE_AI_MODEL_NAME="gpt-4o-mini" # default
Add .env.local.user to your .gitignore:
# User-specific secrets — never commit
.env.local.user
env/.env.local.user
Step 4: Create your first dataset
Create evals/evals.json with a small set of prompts and expected responses. This example uses the simplest valid schema for single-turn evaluations.
{
"schemaVersion": "1.0.0",
"items": [
{
"prompt": "What is Microsoft 365?",
"expected_response": "Microsoft 365 is a cloud-based productivity suite that includes Office apps, cloud services, and device management."
},
{
"prompt": "How do I share a file in Microsoft Teams?",
"expected_response": "To share a file in Teams, you can upload it to a channel or chat, or share it from OneDrive with specific permissions."
}
]
}
Tip
If you skip this step, the tool offers to generate a starter file with sample prompts the first time you run runevals.
For full dataset schema, categories, and advanced patterns, see Create evaluation test suites.
Step 5: Run your first evaluation
For Agents Toolkit projects (automatically uses .env.local and .env.local.user):
runevals
For non-Agents Toolkit projects:
runevals --env dev
Step 6: Confirm successful setup
A successful run produces:
A completion message in the terminal similar to the following message.
M365 Copilot Agent Evaluations CLI Loading environment: dev Agent ID: T_my-agent.declarativeAgent Using prompts file: ./evals/evals.json Running evaluations... Evals completed successfully! Results saved to: ./.evals/2026-04-22_14-30-45.htmlAn HTML report saved to
./.evals/YYYY-MM-DD_HH-MM-SS.htmlthat opens automatically in your browser.
The report includes scores for each prompt.
| Evaluator | Type | Scale | Default Threshold | Default |
|---|---|---|---|---|
| Relevance | LLM-based | 1-5 | 3 | Yes |
| Coherence | LLM-based | 1-5 | 3 | Yes |
| Groundedness | LLM-based | 1-5 | 3 | No |
| Similarity | LLM-based | 1-5 | 3 | No |
| Citations | Count-based | >= 0 | 1 | No |
| ExactMatch | String match | boolean | N/A | No |
| PartialMatch | String match | 0.0-1.0 | 0.5 | No |
If you don't see these results, see Troubleshooting.