Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Solution ideas
This article describes a solution idea. Your cloud architect can use this guidance to help visualize the major components for a typical implementation of this architecture. Use this article as a starting point to design a well-architected solution that aligns with your workload's specific requirements.
By using Azure services such as Azure AI Content Understanding and Azure Functions, you can add image classification and metadata extraction to a web or mobile application without managing servers or training your own models. This solution idea targets image classification and tagging. If you have other AI needs, see the broader Foundry Tools and Microsoft Foundry catalogs.
Architecture
Download a Visio file of this solution idea.
Data flow
This scenario covers the back-end components of a web or mobile application. Data flows through the scenario as follows:
- New files (image uploads) added to Blob Storage trigger an event in Azure Event Grid. The upload is orchestrated by a web or mobile application, or images are uploaded directly to Blob Storage.
- Event Grid sends a notification that triggers an Azure function.
- The function calls Content Understanding to analyze the newly uploaded image against a defined analyzer schema. Content Understanding accesses the image through a time-limited SAS URL, or equivalent temporary access token, that the function passes in the request and scopes to least-privilege read access for only the target blob.
- The function persists the structured output that Content Understanding returns, along with image metadata, in Azure Cosmos DB for NoSQL.
- The web or mobile front end consumes the results. This dataflow returns the classification output and metadata; it doesn't return the original image bytes.
Components
Content Understanding is a Microsoft Foundry capability that uses generative AI to extract user-defined structured output from documents, images, video, and audio. In this architecture, it analyzes each uploaded image against an analyzer schema that defines the categories, attributes, and labels you want returned (for example, product type, color, defect class). The output is JSON that maps directly to your application's data model.
Azure Functions is a serverless compute platform. In this architecture, Azure Functions provides the back-end API and the event-processing layer for uploaded images. The function orchestrates the workflow. It calls Content Understanding, processes the response, and writes the result to the database. This architecture uses the Flex Consumption plan to support virtual network integration, instance memory choice, and fast scaling.
Azure Event Grid is a managed event-routing service that uses a publish-subscribe model. In this architecture, an Event Grid system topic on the storage account emits a
Microsoft.Storage.BlobCreatedevent when a new image is uploaded and delivers it to the function.Azure Blob Storage is an object store for unstructured data. In this architecture, it stores all uploaded images and any static assets that the web application serves. Blob Storage is the source of truth for incoming images.
Azure Cosmos DB for NoSQL is a managed NoSQL database. In this architecture, it stores the metadata for each image, including the structured output that Content Understanding returns.
Alternatives
Azure Machine Learning AutoML for Images trains custom image classification and object detection models from your labeled data using classic machine learning techniques. Choose AutoML when you have a labeled dataset and need a deterministic, deployable model for narrow domains (for example, manufacturing defect detection or medical imaging) where generative approaches don't fit. AutoML is the path that Microsoft recommends for customers migrating from Custom Vision when they want to keep a classic ML model.
Microsoft Foundry vision-enabled models let you call or fine-tune multimodal models (GPT-4.1, GPT-4o, and Phi-4 multimodal) directly. Choose this path when you need fine-grained control over the prompt and model, want to fine-tune on your own data, or need visual question answering and image-grounded chat instead of structured extraction.
Azure AI Search indexes the metadata so that users can query and filter images by tag, caption, or other attributes. The AI enrichment skillset can call vision and generative AI services and write the results directly to a search index without a separate function.
Azure Logic Apps is a fit when you don't need real-time reaction to uploads. A workflow that runs on a recurrence or sliding-window trigger can poll for new blobs and call Content Understanding in batch.
Azure Document Intelligence extracts images that are embedded in documents through the layout model, so you can run downstream classification on those embedded figures. Use custom classification models when input files contain multiple document types and you need to identify each one before further processing.
Scenario details
This scenario applies to businesses that process images at scale and want to attach structured metadata such as tags, captions, or category labels to each image without training and operating their own models.
Typical applications include classifying images on a fashion site, analyzing photos for insurance claims, and extracting context from game screenshots. Building this in-house traditionally requires expertise in computer vision, training data, and model lifecycle management. The architecture in this article replaces that work with managed Azure services.
Potential use cases
This solution applies to retail, e-commerce, gaming, finance, and insurance. Common use cases include:
Tagging images on a retail or fashion site. Sellers upload product photos. Content Understanding returns the tags, captions, and attributes that you define in the analyzer schema. The platform uses them to autofill listing fields, drive visual search, and reduce manual tagging effort.
Categorizing products in an e-commerce catalog. A Content Understanding analyzer assigns category and subcategory metadata (for example, footwear to running shoe) and visual attributes such as color and material. Buyers get more accurate search and filtering, and sellers spend less time correcting categories.
Classifying telemetry from game screenshots. Streaming platforms misclassify a stream when a creator forgets to update the title after switching games. A function that classifies periodic screenshots can detect the change and update the stream metadata. For narrow domains where generative classification underperforms, use AutoML for Images to train a deterministic classifier.
Routing insurance claim photos. Content Understanding identifies vehicle damage, natural-disaster damage, or property type from claim photos. The metadata routes the claim to the correct adjuster queue and shortens triage time.
Considerations
These considerations implement the pillars of the Azure Well-Architected Framework, a set of guiding tenets that you can use to improve the quality of a workload.
Security
Security provides assurances against deliberate attacks and the misuse of your valuable data and systems. For more information, see Design review checklist for Security.
- Use managed identities for the function app to authenticate to Blob Storage, Azure Cosmos DB, and the Microsoft Foundry resource that hosts Content Understanding. Avoid storing connection strings or API keys in app settings.
- Restrict the Foundry resource and Cosmos DB to private endpoints and disable public network access when the workload runs inside a virtual network. The Flex Consumption plan supports virtual network integration.
- Validate uploaded images before invoking the vision service. Enforce content-type and size limits at the upload boundary, scan for malware, and store uploads in a container that public users can't read directly.
- This architecture is only suitable for images that you decide are appropriate to be processed by a cloud solution, local/offline image processing isn't supported.
Cost Optimization
Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.
- Limit the analyzer schema in Content Understanding to the fields that the application actually consumes. Each additional field increases token usage and per-call cost. Review Microsoft Foundry pricing for the current rates.
- For Azure Functions, use the Flex Consumption plan for spiky event-driven workloads. It scales to zero and bills per second on active instances.
- For Azure Cosmos DB, evaluate serverless or autoscale throughput when traffic is uneven. Serverless suits low-traffic and dev/test workloads; autoscale suits production with variable load.
Operational Excellence
Operational Excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Design review checklist for Operational Excellence.
- Send Azure Functions, Event Grid, and Microsoft Foundry diagnostics to a shared Log Analytics workspace and use Application Insights for distributed tracing across the upload-to-result flow.
- Configure an Event Grid dead-letter destination so that events the function can't process land in a separate blob container for replay.
- Version Content Understanding analyzer schemas as code and deploy them through the same pipeline that deploys the function. Treat schema changes as breaking changes for downstream consumers.
Contributors
Microsoft maintains this article. The following contributors wrote this article.
Principal authors:
- Ananya Ghosh Chowdhury | Principal Cloud Solution Architect
Other contributors:
- Delyn Choong | Senior Cloud Solutions Architect – Data & AI
- Abhishek Singh | Tech Support Engineer
To see nonpublic LinkedIn profiles, sign in to LinkedIn.
Next steps
- What is Content Understanding?
- Microsoft Foundry models overview
- Azure Vision Image Analysis migration options
- AI enrichment in Azure AI Search
- Introduction to Azure Functions
- Azure Functions Flex Consumption plan
- What is Azure Event Grid?
- Introduction to Azure Blob Storage
- Welcome to Azure Cosmos DB
For guided learning paths, see:
- Develop a vision-enabled generative AI application
- Train custom image classification models with AutoML