Registrare e distribuire un modello di embedding open source

Apri la versione notebook di questa pagina

Questo notebook configura il modello di incorporamento di testo open source e5-small-v2 in un endpoint model serving utilizzabile per la ricerca vettoriale.

Scarica il modello dall'Hugging Face Hub.
Registrarlo nel Registro di sistema del modello MLflow.
Avviare un endpoint Model Serving per gestire il modello.

Il modello e5-small-v2 è disponibile in https://huggingface.co/intfloat/e5-small-v2.

Licenza MIT
Varianti:

Per un elenco delle versioni della libreria incluse in Databricks Runtime, consultare le note sulla versione della propria versione di Databricks Runtime.

Installare Databricks Python SDK

Questo notebook usa il client Python per gestire gli endpoint.

%pip install -U databricks-sdk python-snappy
%pip install sentence-transformers
dbutils.library.restartPython()

Scaricare il modello

# Download model using the sentence_transformers library.
from sentence_transformers import SentenceTransformer

source_model_name = 'intfloat/e5-small-v2'  # model name on Hugging Face Hub
model = SentenceTransformer(source_model_name)

# Test the model, just to show it works.
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings)

Registrare il modello in MLflow

import mlflow
mlflow.set_registry_uri("databricks-uc")


# Specify the catalog and schema to use. You must have USE_CATALOG privilege on the catalog and USE_SCHEMA and CREATE_TABLE privileges on the schema.
# Change the catalog and schema here if necessary.
catalog = "main"
schema = "default"
model_name = "e5-small-v2"

# MLflow model name. The Model Registry uses this name for the model.
registered_model_name = f"{catalog}.{schema}.{model_name}"

# Compute input and output schema.
signature = mlflow.models.signature.infer_signature(sentences, embeddings)
print(signature)

model_info = mlflow.sentence_transformers.log_model(
  model,
  artifact_path="model",
  signature=signature,
  input_example=sentences,
  registered_model_name=registered_model_name)

inference_test = ["I enjoy pies of both apple and cherry.", "I prefer cookies."]

# Load the custom model by providing the URI for where the model was logged.
loaded_model_pyfunc = mlflow.pyfunc.load_model(model_info.model_uri)

# Perform a quick test to ensure that the loaded model generates the correct output.
embeddings_test = loaded_model_pyfunc.predict(inference_test)
embeddings_test

# Extract the version of the model you just registered.
mlflow_client = mlflow.MlflowClient()

def get_latest_model_version(model_name):
  client = mlflow_client
  model_version_infos = client.search_model_versions("name = '%s'" % model_name)
  return max([int(model_version_info.version) for model_version_info in model_version_infos])

model_version = get_latest_model_version(registered_model_name)
model_version

Creare un endpoint di gestione del modello

Per altri dettagli, vedere Creare un modello di base che gestisce gli endpoint.

Nota: questo esempio crea un piccolo endpoint CPU che riduce le prestazioni fino a 0. Questo è per test rapidi e piccoli. Per casi d'uso più realistici, prendere in considerazione l'uso di endpoint GPU per un calcolo di incorporamento più rapido e non il ridimensionamento a 0 se si prevedono query frequenti, poiché gli endpoint di gestione dei modelli hanno un sovraccarico di avvio a freddo.

endpoint_name = "e5-small-v2"  # Name of endpoint to create

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import EndpointCoreConfigInput

w = WorkspaceClient()

endpoint_config_dict = {
    "served_entities": [
        {
            "name": f'{registered_model_name.replace(".", "_")}_{1}',
            "entity_name": registered_model_name,
            "entity_version": model_version,
            "workload_type": "CPU",
            "workload_size": "Small",
            "scale_to_zero_enabled": True,
        }
    ]
}

endpoint_config = EndpointCoreConfigInput.from_dict(endpoint_config_dict)

# The endpoint may take several minutes to get ready.
w.serving_endpoints.create_and_wait(name=endpoint_name, config=endpoint_config)

Endpoint di query

Il comando precedente create_and_wait attende fino a quando l'endpoint non è pronto. È anche possibile controllare lo stato dell'endpoint di gestione nell'interfaccia utente di Databricks.

Per altre informazioni, vedere Modelli di base di query.

# Only run this command after the Model Serving endpoint is in the Ready state.
import time

start = time.time()

# If the endpoint is not yet ready, you might get a timeout error. If so, wait and then rerun the command.
endpoint_response = w.serving_endpoints.query(name=endpoint_name, dataframe_records=['Hello world', 'Good morning'])

end = time.time()

print(endpoint_response)
print(f'Time taken for querying endpoint in seconds: {end-start}')

Notebook di esempio

Registrare e gestire un modello di incorporamento oss

Ottieni il notebook

Commenti e suggerimenti

Questa pagina è stata utile?

Last updated on 2026-04-25