Valutatori personalizzati (classico)

Visualizzazione attualmente:Versione del portale - Foundry (versione classica)Passare alla versione per il nuovo portale foundry

Note

Il Microsoft Foundry SDK per la valutazione e il portale foundry sono disponibili in anteprima pubblica, ma le API sono disponibili a livello generale per la valutazione del modello e del set di dati (la valutazione dell'agente rimane in anteprima pubblica). I Azure AI Evaluation SDK e gli analizzatori contrassegnati (anteprima) in questo articolo sono attualmente disponibili in anteprima pubblica ovunque.

Gli analizzatori predefiniti offrono un modo semplice per monitorare la qualità delle generazioni dell'applicazione. Per personalizzare le valutazioni, è possibile creare analizzatori basati su codice o basati su prompt.

Code-based evaluators

Non serve un grande modello linguistico per certe metriche di valutazione. Gli valutatori basati su codice ti danno la flessibilità di definire metriche basate su funzioni o classi richiamabili. Puoi costruire il tuo valutatore basato su codice, ad esempio, creando una semplice classe Python che calcola la lunghezza di una risposta in answer_length.py sotto la directory answer_len/, come nel seguente esempio.

Esempio di valutatore basato su codice: Lunghezza della risposta

class AnswerLengthEvaluator:
    def __init__(self):
        pass
    # A class is made callable by implementing the special method __call__
    def __call__(self, *, answer: str, **kwargs):
        return {"answer_length": len(answer)}

Esegui il valutatore su una riga di dati importando una classe chiamabile:

from answer_len.answer_length import AnswerLengthEvaluator

answer_length_evaluator = AnswerLengthEvaluator()
answer_length = answer_length_evaluator(answer="What is the speed of light?")

Output valutatore basato su codice: Lunghezza della risposta

{"answer_length":27}

Prompt-based evaluators

Per costruire il tuo valutatore di grandi modelli linguistici basato su prompt o un annotatore assistito dall'IA, crea un valutatore personalizzato basato su un file Prompty .

Prompty è un file con l'estensione .prompty per sviluppare template di prompt. L'asset Prompty è un file di sconto con un materiale frontale modificato. Il materiale frontale è in formato YAML. Contiene campi di metadati che definiscono la configurazione del modello e gli input attesi del Prompty.

Per misurare la cordialità di una risposta, crea un valutatore personalizzato chiamato FriendlinessEvaluator:

Esempio di valutatore basato su prompt: valutatore di cordialità

Per prima cosa, crea un friendliness.prompty file che definisca la metrica di amicizia e la sua rubrica di valutazione:

---
name: Friendliness Evaluator
description: Friendliness Evaluator to measure warmth and approachability of answers.
model:
  api: chat
  configuration:
    type: azure_openai
    azure_endpoint: ${env:AZURE_OPENAI_ENDPOINT}
    azure_deployment: gpt-4o-mini
  parameters:
    model:
    temperature: 0.1
inputs:
  response:
    type: string
outputs:
  score:
    type: int
  explanation:
    type: string
---

system:
Friendliness assesses the warmth and approachability of the answer. Rate the friendliness of the response between one to five stars using the following scale:

One star: the answer is unfriendly or hostile

Two stars: the answer is mostly unfriendly

Three stars: the answer is neutral

Four stars: the answer is mostly friendly

Five stars: the answer is very friendly

Please assign a rating between 1 and 5 based on the tone and demeanor of the response.

**Example 1**
generated_query: I just don't feel like helping you! Your questions are getting very annoying.
output:
{"score": 1, "reason": "The response is not warm and is resisting to be providing helpful information."}
**Example 2**
generated_query: I'm sorry this watch is not working for you. Very happy to assist you with a replacement.
output:
{"score": 5, "reason": "The response is warm and empathetic, offering a resolution with care."}

**Here the actual conversation to be scored:**
generated_query: {{response}}
output:

Poi crea una classe FriendlinessEvaluator per caricare il file Prompty e elabora gli output con il formato JSON:

import os
import json
import sys
from promptflow.client import load_flow

class FriendlinessEvaluator:
    def __init__(self, model_config):
        current_dir = os.path.dirname(__file__)
        prompty_path = os.path.join(current_dir, "friendliness.prompty")
        self._flow = load_flow(source=prompty_path, model={"configuration": model_config})

    def __call__(self, *, response: str, **kwargs):
        llm_response = self._flow(response=response)
        try:
            response = json.loads(llm_response)
        except Exception as ex:
            response = llm_response
        return response

Ora, crea il tuo valutatore basato su Prompty ed eseguilo su una riga di dati:

from friendliness.friend import FriendlinessEvaluator

friendliness_eval = FriendlinessEvaluator(model_config)

friendliness_score = friendliness_eval(response="I will not apologize for my behavior!")

Output del valutatore basato su prompt: Valutatore di amicizia

{
    'score': 1, 
    'reason': 'The response is hostile and unapologetic, lacking warmth or approachability.'
}

Commenti e suggerimenti

Questa pagina è stata utile?

Last updated on 2026-05-01