Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
APPLIES TO:
Azure CLI ml extension v2 (current)
Python SDK azure-ai-ml v2 (current)
In this article, you learn how to use Microsoft Fabric to access models deployed to Azure Machine Learning batch endpoints. This workflow also supports batch pipeline deployments from Fabric.
Important
This feature is currently in public preview. This preview version is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Prerequisites
- A Microsoft Fabric subscription or free Microsoft Fabric trial with a lakehouse created.
- An Azure subscription with the free or paid version of Azure Machine Learning.
- An Azure Machine Learning workspace that has a model deployment to an Azure Data Lake Storage Gen 2 batch endpoint. Fabric supports only hierarchical storage accounts like Azure Data Lake Gen2. For more information, see Deploy models for scoring in batch endpoints.
- The heart-unlabeled-0.csv sample dataset downloaded to use for scoring.
Important
The identity that invokes the batch deployment can grant access to the storage account, but the compute that runs the deployment must also have permission to mount the storage account. For more information, see Access storage services.
Architecture
Azure Machine Learning can't directly access data stored in Fabric OneLake, but you can configure a OneLake shortcut and an Azure Machine Learning datastore to both access the same Azure Data Lake storage account. This workflow allows reading from and writing to the same underlying data without having to copy it.
The following diagram shows the data architecture.
Configure data access
Create or identify a connection to the storage account that contains the batch endpoint data, so both Fabric OneLake and Azure Machine Learning can access the information. Fabric supports only hierarchical storage accounts like Azure Data Lake Gen2.
Create a OneLake shortcut to the storage account
- In Fabric, select your workspace from the left navigation pane.
- Open the lakehouse you want to use.
- In the Explorer pane, select the More options icon next to Files, and then select New shortcut.
- On the New shortcut screen, select the Azure Data Lake Storage Gen2 option.
- On the Connection settings screen, select New connection, and then enter the URL for your Azure Data Lake Gen2 storage account.
- In the Connection credentials section, provide the following information:
- Connection: Select Create new connection.
- Connection name: Keep the populated value.
- Authentication kind: Select Organizational account to use the credentials of the connected user via OAuth 2.0. If you're not signed in, select Sign in to sign in.
- Select Next.
- On the next screen, select the storage account folder or folders to point the shortcut to, if applicable, and then select Next.
- On the next screen, review the settings, and then select Create.
Create a datastore pointing to the storage account
Create an Azure Machine Learning datastore that points to the storage account. Azure Machine Learning batch endpoints can write predictions only to blob storage accounts, so you select Azure Blob Storage rather than Azure Data Lake Gen2 as the Datastore type for the batch endpoint. All Azure Data Lake storage accounts are also blob storage accounts.
- In your Azure Machine Learning workspace in Azure Machine Learning studio, select Data from the left navigation menu.
- On the Data page, select the Datastores tab, and then select Create.
- On the Create datastore screen, provide the following information:
- Datastore name: Enter a name for the datastore.
- Datastore type: Select Azure Blob Storage.
- Account selection method: Select Enter manually.
- URL: Enter the URL for your storage account and data container.
- Subscription ID: Select your Azure subscription.
- Resource group of the storage resource: Select the resource group of the storage account.
- Authentication type: Select Account key.
- Account key: Enter the access key of the storage account.
- Select Create.
Upload sample dataset
In Fabric, upload the sample data for the batch endpoint to use as input.
- Go to the shortcut you created in the Fabric lakehouse.
- Select the More actions icon, select New subfolder, and create a new folder to store the sample dataset.
- In the new folder, select Get data and then select Upload files.
- Upload the sample dataset heart-unlabeled-0.csv.
The sample file is ready to be consumed. Note the path to the location where you saved it.
Create a Fabric-to-batch inferencing pipeline
Create a Fabric-to-batch inferencing pipeline in your existing Fabric workspace that invokes the batch endpoint.
- On the Home page of your Fabric workspace, select New item.
- On the New item page, select Pipeline.
- Name the pipeline and select Create.
- Select the Activities tab at the top of the pipeline designer page.
- Select the More options icon at the end of the tab, and then scroll down and select Azure Machine Learning.
Create and configure the batch deployment connection
To configure the connection to the Azure Machine Learning workspace, complete the following steps.
- On the lower designer pane, select the Settings tab.
- Next to Azure Machine Learning connection, select the dropdown arrow and then select Browse all to add a new connection.
- On the Choose a data source to get started screen, select Azure Machine Learning under New sources.
- On the Connect data source screen, under Connection settings, enter the Subscription ID, Resource group name, and Azure Machine Learning Workspace name where your endpoint is deployed.
- In the Connection credentials section, select Create new connection under Connection, and provide a connection name under Connection name.
- For Authentication kind, select Organizational account to use the credentials of the connected user, or Service principal to use a service principal.
Note
A service principal is recommended for production settings. For either choice, ensure that the identity associated with the connection has permission to call the batch endpoint you deployed.
- For an organizational account, sign in if necessary. For a service principal connection, provide the Tenant ID, Service principal client ID, and Service principal Key.
- Select Connect.
The new connection appears in the designer Settings tab, and Fabric automatically populates the available batch endpoints in the selected workspace.
- For Batch endpoint, select the batch endpoint you want to call. For this example, select a heart-classifier deployment endpoint.
- For Batch deployment, select a specific deployment if needed.
The Batch deployment section automatically populates with the available deployments under the endpoint. If you don't select a deployment, Fabric invokes the Default deployment under the endpoint, allowing the batch endpoint creator to decide which deployment to call. For most scenarios, keep this default behavior.
Configure inputs and outputs for the batch endpoint
Configure inputs and outputs for the batch endpoint. Inputs to batch endpoints supply data and parameters needed to run the process. Outputs provide the paths to place the batch data.
The Azure Machine Learning batch pipeline in Fabric supports both model deployments and pipeline deployments. The number and type of inputs you provide depend on the deployment type. This example uses a model deployment that requires exactly one input and produces one output.
For more information on batch endpoint inputs and outputs, see Understand inputs and outputs in batch endpoints.
Configure inputs
Configure the Job inputs section as follows:
Expand Job inputs, and select New to add a new input to your endpoint.
Name the input input_data. For your model deployment, you can use any name. For pipeline deployments, you must provide the exact name of the input that your model expects.
Select the caret next to the input to expand the Name and Value fields.
To indicate the type of input you're creating, enter JobInputType in the Name field.
To indicate that the input is a folder path, enter UriFolder in the Value field.
Note
You need to use the type of input that your deployment expects. Other supported values for this field are UriFile for a file path or Literal for any literal value like a string or integer.
To add another property for this input, select the plus sign next to the property.
To indicate the path to the data, enter Uri in the Name field.
Tip
If your input is of type Literal, enter Value in the Name field.
In the Value field, enter the path to the data, azureml://datastores/trusted_blob/datasets/uci-heart-unlabeled-0.
This input path points to the storage account linked to both OneLake in Fabric and to Azure Machine Learning. You can also use a direct path to the storage account, such as https://<storage-account>.dfs.azure.com. The path leads to the CSV files that contain the expected input data for the model deployed to the batch endpoint.
If your endpoint requires more inputs, repeat the previous steps for each input.
Configure outputs
Configure the Job outputs section as follows:
Expand the Job outputs section, and select New to add a new output to your endpoint.
Name the output output_data. For your model deployment, you can use any name. For pipeline deployments, you must provide the exact name of the output that your model generates.
Select the caret next to the output to expand the Name and Value fields.
To indicate the type of output you're creating, enter JobOutputType in the Name field.
To indicate that the output is a file path, enter UriFile in the Value field. The other supported value for this field is UriFolder, for a folder path. Literal isn't supported for outputs.
To add another property for this output, select the plus sign next to the property.
To indicate the path to the data, enter Uri in the Name field.
Enter
@concat(@concat('azureml://datastores/trusted_blob/paths/endpoints', pipeline().RunId, 'predictions.csv'), the path where the output should be placed, in the Value field.Azure Machine Learning batch endpoints support only datastore paths as outputs. Outputs must be unique to avoid conflicts, so you use a dynamic expression to construct the path.
If your endpoint returns more outputs, repeat the previous steps for each output.
Optionally configure job settings
You can also optionally configure job settings by expanding the Job settings section, selecting New, and adding any of the following properties.
For model deployments
| Name | Value |
|---|---|
| MiniBatchSize | The size of the batch. |
| ComputeInstanceCount | The number of compute instances to request from the deployment. |
For pipeline deployments
| Name | Value |
|---|---|
| ContinueOnStepFailure | Whether the pipeline should stop processing nodes after a failure. |
| DefaultDatastore | The default datastore to use for outputs. |
| ForceRun | Whether the pipeline should force all the components to run even if the output can be inferred from a previous run. |
Once configured, you can test the pipeline.