Edit

Share via


How to use MedImageParse healthcare AI models for segmentation of medical images (classic)

Note

This document refers to the Microsoft Foundry (classic) portal.

🔍 View the Microsoft Foundry (new) documentation to learn about the new portal.

Important

The healthcare AI models are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals.

MedImageParse and MedImageParse 3D are healthcare AI models for medical image segmentation using simple text prompts. In this article, you learn how to deploy these prompt-based segmentation models as online endpoints for real-time inference and issue basic calls to the API. The steps you take are:

  1. Deploy the model to a self-hosted managed compute.
  2. Grant permissions to the endpoint.
  3. Send test data to the model, receive results, and interpret them.

MedImageParse

MedImageParse unifies segmentation, detection, and recognition tasks through image parsing. You can segment medical images by using simple text prompts without manually specifying bounding boxes.

To learn more about these models, see Learn more about the models.

Prerequisites

  • An Azure subscription with a valid payment method. Free or trial Azure subscriptions don't work. If you don't have an Azure subscription, create a paid Azure account to begin.

  • If you don't have one, create a hub-based project

  • Azure role-based access controls (Azure RBAC) grant access to operations in Microsoft Foundry portal. To perform the steps in this article, your user account must be assigned the Azure AI Developer role on the resource group. Deploying models and invoking endpoints requires this role. For more information, see Role-based access control in Foundry portal.

  • Python 3.8 or later.

  • Install the required Python packages:

    pip install azure-ai-ml azure-identity
    
  • For MedImageParse, images must be resized to 1024x1024 pixels while preserving aspect ratio. Pad non-square images with black pixels. See the Generating Segmentation for a Variety of Imaging Modalities notebook for preprocessing code examples.

Sample notebooks

For complete working examples, see these interactive Python notebooks:

Deploy the model to a managed compute

Deployment to a self-hosted managed inference solution lets you customize and control all the details about how the model's served. The deployment process creates an online endpoint with a unique scoring URI and authentication keys. This endpoint lets you send inference requests to your model. You configure the compute resources (such as GPU-enabled VMs) and set deployment parameters like instance count and request timeout values.

To deploy the model programmatically or from its model card in Microsoft Foundry, see How to deploy and infer with a managed compute deployment. After deployment completes, note your endpoint name and deployment name for use in the inference code.

Send inference requests to the segmentation model

In this section, you consume the model and make basic calls to it.

Use REST API to consume the model

Use the model as a REST API, by using simple GET requests or by creating a client as follows:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Authenticate using Azure credentials
credential = DefaultAzureCredential()

# Create ML client from workspace configuration file (config.json)
# The config file is automatically created on Azure ML compute instances
ml_client_workspace = MLClient.from_config(credential)

This code authenticates your session and creates a workspace client that you use to invoke the deployed endpoint. The DefaultAzureCredential automatically uses available authentication methods in your environment (managed identity, Azure CLI, and environment variables).

Reference: MLClient, DefaultAzureCredential

In the deployment configuration, you select an authentication method. This example uses Azure Machine Learning token-based authentication. For more authentication options, see Set up authentication. The client is created from a configuration file that's created automatically for Azure Machine Learning virtual machines (VMs). Learn more in the MLClient.from_config API reference.

Make basic calls to the model

After you deploy the model, use the following code to send data and retrieve segmentation masks.

import base64
import json
import os

sample_image_xray = os.path.join(image_path)

def read_image(image_path):
    with open(image_path, "rb") as f:
        return f.read()

sample_image =  "sample_image.png"
data = {
    "input_data": {
        "columns": [ "image", "text" ],
        "index": [ 0 ],
        "data": [
            [
                base64.encodebytes(read_image(sample_image)).decode("utf-8"),
                "neoplastic cells in breast pathology & inflammatory cells"
            ]
        ]
    }
}
data_json = json.dumps(data)

# Create request json
request_file_name = "sample_request_data.json"
with open(request_file_name, "w") as request_file:
    json.dump(data, request_file)

response = ml_client_workspace.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=deployment_name,
    request_file=request_file_name,
)

The response contains base64-encoded segmentation masks as NumPy arrays. See the Response example section for details on decoding and interpreting the results.

Reference for REST API

MedImageParse and MedImageParse 3D models assume a simple single-turn interaction where one request produces one response.

Request schema

The request payload is a JSON-formatted string containing the following parameters:

Key Type Required/Default Description
input_data [object] Y An object containing the input data payload

The input_data object contains the following fields:

Key Type Required/Default Allowed values Description
columns list[string] Y "image", "text" An object containing the strings mapping data to inputs passed to the model.
index integer Y 0 - 256 Count of inputs passed to the model. You're limited by how much data you can pass in a single POST request, which depends on the size of your images. Therefore, it's reasonable to keep this number in the dozens.
data list[list[string]] Y "" The list contains the items you pass to the model, which the index parameter defines. Each item is a list of two strings. The order is defined by the columns parameter. The text string contains the prompt text. The image string is the image bytes encoded by using base64 and decoded as a utf-8 string.
NOTE: You should resize the image to 1024x1024 pixels before submitting it to the model, preserving the aspect ratio. Empty space should be padded with black pixels. See the Generating Segmentation for a Variety of Imaging Modalities sample notebook for an example of resizing and padding code.

The input text is a string containing multiple sentences separated by the special character &. For example: tumor core & enhancing tumor & non-enhancing tumor. In this case, there are three sentences, so the output consists of three images with segmentation masks.

Request example

Requesting segmentation of all cells in a pathology image

{
  "input_data": {
    "columns": [
      "image",
      "text"
    ],
    "index":[0],
    "data": [
      ["iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAYAAABytg0kAAAAAXNSR0IArs4c6QAAAARnQU1BAACx\njwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAAbSURBVBhXY/gUoPS/fhfDfwaGJe///9/J8B8A\nVGwJ5VDvPeYAAAAASUVORK5CYII=\n",
      "neoplastic & inflammatory cells "]
    ]
  }
}

Response schema

The response payload is a list of JSON-formatted strings, each corresponding to a submitted image. Each string contains a segmentation_object.

The segmentation_object contains the following fields:

Key Type Description
image_features segmentation_mask An object representing the segmentation masks for a given image
text_features list[string] List of strings, one per each submitted text string, classifying the segmentation masks into one of 16 biomedical segmentation categories each: liver, lung, kidney, pancreas, heart anatomies, brain anatomies, eye anatomies, vessel, other organ, tumor, infection, other lesion, fluid disturbance, other abnormality, histology structure, other

The segmentation_mask contains the following fields:

Key Type Description
data string A base64-encoded NumPy array containing the one-hot encoded segmentation mask. The array can include multiple instances of objects. Use np.frombuffer to deserialize after decoding. The array contains a three-dimensional matrix. The array's size is 1024x1024 (matching the input image dimensions), with the third dimension representing the number of input sentences provides. See the provided sample notebooks for decoding and usage examples.
shape list[int] A list representing the shape of the array (typically [NUM_PROMPTS, 1024, 1024])
dtype string An instance of the NumPy dtype class serialized to a string. Describes the data packing in the data array.

Response example

Response to a simple inference requesting segmentation of two objects

[
  {
    "image_features": "{ 
    'data': '4oCwUE5HDQoa...',
    'shape': [2, 1024, 1024], 
    'dtype': 'uint8'}",
    "text_features": ['liver', 'pancreas']
  }
]

Supported input formats

The deployed model API supports images encoded in PNG format. For optimal results, we recommend using uncompressed or lossless PNGs with RGB images.

As described in the API specification, the model only accepts images in the resolution of 1024x1024 pixels. You need to resize and pad images if they have a non-square aspect ratio.

For techniques and sample code useful for submitting images of various sizes stored using various biomedical imaging formats, see the Generating Segmentation for a Variety of Imaging Modalities notebook.

Learn more about the models

Biomedical image analysis is crucial for discovery in fields like cell biology, pathology, and radiology. Traditionally, tasks such as segmentation, detection, and recognition of relevant objects are addressed separately, which can limit the overall effectiveness of image analysis. However, MedImageParse unifies these tasks through image parsing by jointly conducting segmentation, detection, and recognition across numerous object types and imaging modalities. By using the interdependencies among these subtasks—such as the semantic labels of segmented objects—the model enhances accuracy and enables novel applications. For example, it lets users segment all relevant objects in an image by using a simple text prompt. This approach eliminates the need to manually specify bounding boxes for each object.

The following image shows the conceptual architecture of the MedImageParse model where an image embedding model is augmented with a task adaptation layer to produce segmentation masks and textual descriptions.

Screenshot of an animated diagram showing a medical image entering the MedImageParse model, flowing through a task adaptation layer, and outputting multiple segmentation masks with corresponding text labels.

The segmentation masks and textual descriptions are achieved by using only standard segmentation datasets, augmented by natural-language labels, or descriptions harmonized with established biomedical object ontologies. This approach improves individual task performance and offers an all-in-one tool for biomedical image analysis, paving the way for more efficient and accurate image-based biomedical discovery.