Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Content Understanding analyzers define how to process and extract insights from your content. They ensure uniform processing and output structure across all your content, so you get reliable and predictable results. For common use cases, you can use the prebuilt analyzers. This guide shows how you can customize these analyzers to better fit your needs.
This guide shows you how to use the Content Understanding REST API to create a custom analyzer that extracts structured data from your content.
Prerequisites
- An active Azure subscription. If you don't have an Azure account, create one for free.
- A Microsoft Foundry resource created in a supported region.
- The portal lists this resource under Foundry > Foundry.
- Set up default model deployments for your Content Understanding resource. By setting defaults, you create a connection to the Microsoft Foundry models you use for Content Understanding requests. Choose one of the following methods:
Go to the Content Understanding settings page.
Select the + Add resource button in the upper left.
Select the Foundry resource that you want to use and select Next > Save.
Make sure that the Enable autodeployment for required models if no defaults are available checkbox is selected. This selection ensures your resource is fully set up with the required
GPT-4.1,GPT-4.1-mini, andtext-embedding-3-largemodels. Different prebuilt analyzers require different models.
- cURL installed for your dev environment.
Define an analyzer schema
To create a custom analyzer, define a field schema that describes the structured data you want to extract. In the following example, you create an analyzer based on the prebuilt document analyzer for processing a receipt.
Create a JSON file named receipt.json with the following content:
{
"description": "Sample receipt analyzer",
"baseAnalyzerId": "prebuilt-document",
"models": {
"completion": "gpt-4.1",
"embedding": "text-embedding-3-large"
},
"config": {
"returnDetails": true,
"enableFormula": false,
"estimateFieldSourceAndConfidence": true,
"tableFormat": "html"
},
"fieldSchema": {
"fields": {
"VendorName": {
"type": "string",
"method": "extract",
"description": "Vendor issuing the receipt"
},
"Items": {
"type": "array",
"method": "extract",
"items": {
"type": "object",
"properties": {
"Description": {
"type": "string",
"method": "extract",
"description": "Description of the item"
},
"Amount": {
"type": "number",
"method": "extract",
"description": "Amount of the item"
}
}
}
}
}
}
}
If you have various types of documents you need to process, but you want to categorize and analyze only the receipts, create an analyzer that categorizes the document first. Then, route it to the analyzer you created earlier with the following schema.
Create a JSON file named categorize.json with the following content:
{
"baseAnalyzerId": "prebuilt-document",
// Use the base analyzer to invoke the document specific capabilities.
//Specify the model the analyzer should use. This is one of the supported completion models and one of the supported embeddings model. The specific deployment used during analyze is set on the resource or provided in the analyze request.
"models": {
"completion": "gpt-4.1"
},
"config": {
// Enable splitting of the input into segments. Set this property to false if you only expect a single document within the input file. When specified and enableSegment=false, the whole content will be classified into one of the categories.
"enableSegment": false,
"contentCategories": {
// Category name.
"receipt": {
// Description to help with classification and splitting.
"description": "Any images or documents of receipts",
// Define the analyzer that any content classified as a receipt should be routed to
"analyzerId": "receipt"
},
"invoice": {
"description": "Any images or documents of invoice",
"analyzerId": "prebuilt-invoice"
},
"policeReport": {
"description": "A police or law enforcement report detailing the events that lead to the loss."
// Don't perform analysis for this category.
}
},
// Omit original content object and only return content objects from additional analysis.
"omitContent": true
}
//You can use fieldSchema here to define fields that are needed from the entire input content.
}
Create an analyzer
PUT request
Create a receipt analyzer first, and then create the categorize analyzer.
curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: application/json" \
-d @receipt.json
PUT response
The 201 Created response includes an Operation-Location header with a URL that you can use to track the status of this asynchronous analyzer creation operation.
201 Created
Operation-Location: {endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview
When the operation finishes, an HTTP GET on the operation location URL returns "status": "succeeded".
curl -i -X GET "{endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-11-01" \
-H "Ocp-Apim-Subscription-Key: {key}"
Analyze a file
Submit the file
You can now use the custom analyzer you created to process files and extract the fields you defined in the schema.
Before running the cURL command, make the following changes to the HTTP request:
- Replace
{endpoint}and{key}with the endpoint and key values from your Azure portal Foundry instance. - Replace
{analyzerId}with the name of the custom analyzer you created with thecategorize.jsonfile. - Replace
{fileUrl}with a publicly accessible URL of the file to analyze, such as a path to an Azure Storage Blob with a shared access signature (SAS) or the sample URLhttps://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/receipt.png.
POST Request
This example uses the custom analyzer you created with the categorize.json file to analyze a receipt.
curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-11-01" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: application/json" \
-d '{
"inputs":[
{
"url": "https://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/receipt.png"
}
]
}'
POST Response
The 202 Accepted response includes the {resultId} which you can use to track the status of this asynchronous operation.
{
"id": {resultId},
"status": "Running",
"result": {
"analyzerId": {analyzerId},
"apiVersion": "2025-11-01",
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
"warnings": [],
"contents": []
}
}
Get analyze result
Use the Operation-Location from the POST response to get the result of the analysis.
GET request
curl -i -X GET "{endpoint}/contentunderstanding/analyzerResults/{resultId}?api-version=2025-11-01" \
-H "Ocp-Apim-Subscription-Key: {key}"
GET response
A 200 OK response includes a status field that shows the operation's progress.
- The
statusisSucceededif the operation completes successfully. - If the status is
runningornotStarted, call the API again manually or use a script. Wait at least one second between requests.
Sample response
{
"id": {resultId},
"status": "Succeeded",
"result": {
"analyzerId": {analyzerId},
"apiVersion": "2025-11-01",
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
"warnings": [],
"contents": [
{
"path": "input1/segment1",
"category": "receipt",
"markdown": "Contoso\n\n123 Main Street\nRedmond, WA 98052\n\n987-654-3210\n\n6/10/2019 13:59\nSales Associate: Paul\n\n\n<table>\n<tr>\n<td>2 Surface Pro 6</td>\n<td>$1,998.00</td>\n</tr>\n<tr>\n<td>3 Surface Pen</td>\n<td>$299.97</td>\n</tr>\n</table> ...",
"fields": {
"VendorName": {
"type": "string",
"valueString": "Contoso",
"spans": [{"offset": 0,"length": 7}],
"confidence": 0.996,
"source": "D(1,774.0000,72.0000,974.0000,70.0000,974.0000,111.0000,774.0000,113.0000)"
},
"Items": {
"type": "array",
"valueArray": [
{
"type": "object",
"valueObject": {
"Description": {
"type": "string",
"valueString": "2 Surface Pro 6",
"spans": [ { "offset": 115, "length": 15}],
"confidence": 0.423,
"source": "D(1,704.0000,482.0000,875.0000,482.0000,875.0000,508.0000,704.0000,508.0000)"
},
"Amount": {
"type": "number",
"valueNumber": 1998,
"spans": [{ "offset": 140,"length": 9}
],
"confidence": 0.957,
"source": "D(1,952.0000,482.0000,1048.0000,482.0000,1048.0000,508.0000,952.0000,509.0000)"
}
}
}, ...
]
}
},
"kind": "document",
"startPageNumber": 1,
"endPageNumber": 1,
"unit": "pixel",
"pages": [
{
"pageNumber": 1,
"angle": -0.0944,
"width": 1743,
"height": 878
}
],
"analyzerId": "{analyzerId}",
"mimeType": "image/png"
}
]
},
"usage": {
"documentPages": 1,
"tokens": {
"contextualization": 1000
}
}
}
Client library | Samples | SDK source
This guide shows you how to use the Content Understanding Python SDK to create a custom analyzer that extracts structured data from your content. Custom analyzers support document, image, audio, and video content types.
Prerequisites
- An active Azure subscription. If you don't have an Azure account, create one for free.
- A Microsoft Foundry resource created in a supported region.
- Your resource endpoint and API key (found under Keys and Endpoint in the Azure portal).
- Model deployment defaults configured for your resource. See Models and deployments or this one-time configuration script for setup instructions.
- Python 3.9 or later.
Set up
Install the Content Understanding client library for Python with pip:
pip install azure-ai-contentunderstandingOptionally, install the Azure Identity library for Microsoft Entra authentication:
pip install azure-identity
Set up environment variables
To authenticate with the Content Understanding service, set the environment variables with your own values before running the sample:
CONTENTUNDERSTANDING_ENDPOINT- the endpoint to your Content Understanding resource.CONTENTUNDERSTANDING_KEY- your Content Understanding API key (optional if using Microsoft Entra ID DefaultAzureCredential).
Windows
setx CONTENTUNDERSTANDING_ENDPOINT "your-endpoint"
setx CONTENTUNDERSTANDING_KEY "your-key"
Linux / macOS
export CONTENTUNDERSTANDING_ENDPOINT="your-endpoint"
export CONTENTUNDERSTANDING_KEY="your-key"
Create the client
Import required libraries and models, and then create the client with your resource endpoint and credentials.
import os
import time
from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.core.credentials import AzureKeyCredential
endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
key = os.environ["CONTENTUNDERSTANDING_KEY"]
client = ContentUnderstandingClient(
endpoint=endpoint,
credential=AzureKeyCredential(key),
)
Create a custom analyzer
The following example creates a custom document analyzer based on the prebuilt document base analyzer. It defines fields using three extraction methods: extract for literal text, generate for AI-generated fields or interpretations, and classify for categorization.
from azure.ai.contentunderstanding.models import (
ContentAnalyzer,
ContentAnalyzerConfig,
ContentFieldSchema,
ContentFieldDefinition,
ContentFieldType,
GenerationMethod,
)
# Generate a unique analyzer ID
analyzer_id = f"my_document_analyzer_{int(time.time())}"
# Define field schema with custom fields
field_schema = ContentFieldSchema(
name="company_schema",
description="Schema for extracting company information",
fields={
"company_name": ContentFieldDefinition(
type=ContentFieldType.STRING,
method=GenerationMethod.EXTRACT,
description="Name of the company",
estimate_source_and_confidence=True,
),
"total_amount": ContentFieldDefinition(
type=ContentFieldType.NUMBER,
method=GenerationMethod.EXTRACT,
description="Total amount on the document",
estimate_source_and_confidence=True,
),
"document_summary": ContentFieldDefinition(
type=ContentFieldType.STRING,
method=GenerationMethod.GENERATE,
description=(
"A brief summary of the document content"
),
),
"document_type": ContentFieldDefinition(
type=ContentFieldType.STRING,
method=GenerationMethod.CLASSIFY,
description="Type of document",
enum=[
"invoice", "receipt", "contract",
"report", "other",
],
),
},
)
# Create analyzer configuration
config = ContentAnalyzerConfig(
enable_formula=True,
enable_layout=True,
enable_ocr=True,
estimate_field_source_and_confidence=True,
return_details=True,
)
# Create the analyzer with field schema
analyzer = ContentAnalyzer(
base_analyzer_id="prebuilt-document",
description=(
"Custom analyzer for extracting company information"
),
config=config,
field_schema=field_schema,
models={
"completion": "gpt-4.1",
"embedding": "text-embedding-3-large",
}, # Required when using field_schema and prebuilt-document base analyzer
)
# Create the analyzer
poller = client.begin_create_analyzer(
analyzer_id=analyzer_id,
resource=analyzer,
)
result = poller.result() # Wait for creation to complete
# Get the full analyzer details after creation
result = client.get_analyzer(analyzer_id=analyzer_id)
print(f"Analyzer '{analyzer_id}' created successfully!")
if result.description:
print(f" Description: {result.description}")
if result.field_schema and result.field_schema.fields:
print(f" Fields ({len(result.field_schema.fields)}):")
for field_name, field_def in result.field_schema.fields.items():
method = field_def.method if field_def.method else "auto"
field_type = field_def.type if field_def.type else "unknown"
print(f" - {field_name}: {field_type} ({method})")
An example output looks like:
Analyzer 'my_document_analyzer_ID' created successfully!
Description: Custom analyzer for extracting company information
Fields (4):
- company_name: ContentFieldType.STRING (GenerationMethod.EXTRACT)
- total_amount: ContentFieldType.NUMBER (GenerationMethod.EXTRACT)
- document_summary: ContentFieldType.STRING (GenerationMethod.GENERATE)
- document_type: ContentFieldType.STRING (GenerationMethod.CLASSIFY)
Tip
This code is based on the create analyzer sample in the SDK repository.
Optionally, you can create a classifier analyzer to categorize documents and use its results to route documents to prebuilt or custom analyzers you created. Here is an example of creating a custom analyzer for classification workflows.
import time
from azure.ai.contentunderstanding.models import (
ContentAnalyzer,
ContentAnalyzerConfig,
ContentCategoryDefinition,
)
# Generate a unique analyzer ID
analyzer_id = f"my_classifier_{int(time.time())}"
print(f"Creating classifier '{analyzer_id}'...")
# Define content categories for classification
categories = {
"Loan_Application": ContentCategoryDefinition(
description="Documents submitted by individuals or businesses to request funding, "
"typically including personal or business details, financial history, "
"loan amount, purpose, and supporting documentation."
),
"Invoice": ContentCategoryDefinition(
description="Billing documents issued by sellers or service providers to request "
"payment for goods or services, detailing items, prices, taxes, totals, "
"and payment terms."
),
"Bank_Statement": ContentCategoryDefinition(
description="Official statements issued by banks that summarize account activity "
"over a period, including deposits, withdrawals, fees, and balances."
),
}
# Create analyzer configuration
config = ContentAnalyzerConfig(
return_details=True,
enable_segment=True, # Enable automatic segmentation by category
content_categories=categories,
)
# Create the classifier analyzer
classifier = ContentAnalyzer(
base_analyzer_id="prebuilt-document",
description="Custom classifier for financial document categorization",
config=config,
models={"completion": "gpt-4.1"},
)
# Create the classifier
poller = client.begin_create_analyzer(
analyzer_id=analyzer_id,
resource=classifier,
)
result = poller.result() # Wait for creation to complete
# Get the full analyzer details after creation
result = client.get_analyzer(analyzer_id=analyzer_id)
print(f"Classifier '{analyzer_id}' created successfully!")
if result.description:
print(f" Description: {result.description}")
Tip
This code is based on the create classifier sample in the SDK repository.
Use the custom analyzer
After creating the analyzer, use it to analyze a document and extract the custom fields. Delete the analyzer when you no longer need it.
# --- Use the custom document analyzer ---
from azure.ai.contentunderstanding.models import AnalysisInput
print("\nAnalyzing document...")
document_url = (
"https://raw.githubusercontent.com/"
"Azure-Samples/"
"azure-ai-content-understanding-assets/"
"main/document/invoice.pdf"
)
poller = client.begin_analyze(
analyzer_id=analyzer_id,
inputs=[AnalysisInput(url=document_url)],
)
result = poller.result()
if result.contents and len(result.contents) > 0:
content = result.contents[0]
if content.fields:
company = content.fields.get("company_name")
if company:
print(f"Company Name: {company.value}")
if company.confidence:
print(
f" Confidence:"
f" {company.confidence:.2f}"
)
total = content.fields.get("total_amount")
if total:
print(f"Total Amount: {total.value}")
summary = content.fields.get(
"document_summary"
)
if summary:
print(f"Summary: {summary.value}")
doc_type = content.fields.get("document_type")
if doc_type:
print(f"Document Type: {doc_type.value}")
else:
print("No content returned from analysis.")
# --- Clean up ---
print(f"\nCleaning up: deleting analyzer '{analyzer_id}'...")
client.delete_analyzer(analyzer_id=analyzer_id)
print(f"Analyzer '{analyzer_id}' deleted successfully.")
An example output looks like:
Analyzing document...
Company Name: CONTOSO LTD.
Confidence: 0.81
Total Amount: 610.0
Summary: This document is an invoice from CONTOSO LTD. to Microsoft Corporation for consulting, document, and printing services provided during the service period. It details line items, subtotal, sales tax, total, previous unpaid balance, and the final amount due.
Document Type: invoice
Cleaning up: deleting analyzer 'my_document_analyzer_ID'...
Analyzer 'my_document_analyzer_ID' deleted successfully.
Tip
Check out more examples of running analyzers at SDK samples.
Client library | Samples | SDK source
This guide shows you how to use the Content Understanding .NET SDK to create a custom analyzer that extracts structured data from your content. Custom analyzers support document, image, audio, and video content types.
Prerequisites
- An active Azure subscription. If you don't have an Azure account, create one for free.
- A Microsoft Foundry resource created in a supported region.
- Your resource endpoint and API key (found under Keys and Endpoint in the Azure portal).
- Model deployment defaults configured for your resource. See Models and deployments or this one-time configuration script for setup instructions.
- The current version of .NET.
Set up
Create a new .NET console application:
dotnet new console -n CustomAnalyzerTutorial cd CustomAnalyzerTutorialInstall the Content Understanding client library for .NET:
dotnet add package Azure.AI.ContentUnderstandingOptionally, install the Azure Identity library for Microsoft Entra authentication:
dotnet add package Azure.Identity
Set up environment variables
To authenticate with the Content Understanding service, set the environment variables with your own values before running the sample:
CONTENTUNDERSTANDING_ENDPOINT- the endpoint to your Content Understanding resource.CONTENTUNDERSTANDING_KEY- your Content Understanding API key (optional if using Microsoft Entra ID DefaultAzureCredential).
Windows
setx CONTENTUNDERSTANDING_ENDPOINT "your-endpoint"
setx CONTENTUNDERSTANDING_KEY "your-key"
Linux / macOS
export CONTENTUNDERSTANDING_ENDPOINT="your-endpoint"
export CONTENTUNDERSTANDING_KEY="your-key"
Create the client
using Azure;
using Azure.AI.ContentUnderstanding;
string endpoint = Environment.GetEnvironmentVariable(
"CONTENTUNDERSTANDING_ENDPOINT");
string key = Environment.GetEnvironmentVariable(
"CONTENTUNDERSTANDING_KEY");
var client = new ContentUnderstandingClient(
new Uri(endpoint),
new AzureKeyCredential(key)
);
Create a custom analyzer
The following example creates a custom document analyzer based on the prebuilt document analyzer. It defines fields using three extraction methods: extract for literal text, generate for AI-generated summaries, and classify for categorization.
string analyzerId =
$"my_document_analyzer_{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}";
var fieldSchema = new ContentFieldSchema(
new Dictionary<string, ContentFieldDefinition>
{
["company_name"] = new ContentFieldDefinition
{
Type = ContentFieldType.String,
Method = GenerationMethod.Extract,
Description = "Name of the company"
},
["total_amount"] = new ContentFieldDefinition
{
Type = ContentFieldType.Number,
Method = GenerationMethod.Extract,
Description =
"Total amount on the document"
},
["document_summary"] = new ContentFieldDefinition
{
Type = ContentFieldType.String,
Method = GenerationMethod.Generate,
Description =
"A brief summary of the document content"
},
["document_type"] = new ContentFieldDefinition
{
Type = ContentFieldType.String,
Method = GenerationMethod.Classify,
Description = "Type of document"
}
})
{
Name = "company_schema",
Description =
"Schema for extracting company information"
};
fieldSchema.Fields["document_type"].Enum.Add("invoice");
fieldSchema.Fields["document_type"].Enum.Add("receipt");
fieldSchema.Fields["document_type"].Enum.Add("contract");
fieldSchema.Fields["document_type"].Enum.Add("report");
fieldSchema.Fields["document_type"].Enum.Add("other");
var config = new ContentAnalyzerConfig
{
EnableFormula = true,
EnableLayout = true,
EnableOcr = true,
EstimateFieldSourceAndConfidence = true,
ShouldReturnDetails = true
};
var customAnalyzer = new ContentAnalyzer
{
BaseAnalyzerId = "prebuilt-document",
Description =
"Custom analyzer for extracting"
+ " company information",
Config = config,
FieldSchema = fieldSchema
};
customAnalyzer.Models["completion"] = "gpt-4.1";
customAnalyzer.Models["embedding"] =
"text-embedding-3-large"; // Required when using field_schema and prebuilt-document base analyzer
var operation = await client.CreateAnalyzerAsync(
WaitUntil.Completed,
analyzerId,
customAnalyzer);
ContentAnalyzer result = operation.Value;
Console.WriteLine(
$"Analyzer '{analyzerId}'"
+ " created successfully!");
// Get the full analyzer details after creation
var analyzerDetails =
await client.GetAnalyzerAsync(analyzerId);
result = analyzerDetails.Value;
if (result.Description != null)
{
Console.WriteLine(
$" Description: {result.Description}");
}
if (result.FieldSchema?.Fields != null)
{
Console.WriteLine(
$" Fields"
+ $" ({result.FieldSchema.Fields.Count}):");
foreach (var kvp
in result.FieldSchema.Fields)
{
var method =
kvp.Value.Method?.ToString()
?? "auto";
var fieldType =
kvp.Value.Type?.ToString()
?? "unknown";
Console.WriteLine(
$" - {kvp.Key}:"
+ $" {fieldType} ({method})");
}
}
An example output looks like:
Analyzer 'my_document_analyzer_ID' created successfully!
Description: Custom analyzer for extracting company information
Fields (4):
- company_name: string (extract)
- total_amount: number (extract)
- document_summary: string (generate)
- document_type: string (classify)
Tip
This code is based on Create Analyzer sample in the SDK repository.
Optionally, you can create a classifier analyzer to categorize documents and use its results to route documents to prebuilt or custom analyzers you created. Here is an example of creating a custom analyzer for classification workflows.
// Generate a unique analyzer ID
string classifierId =
$"my_classifier_{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}";
Console.WriteLine(
$"Creating classifier '{classifierId}'...");
// Define content categories for classification
var classifierConfig = new ContentAnalyzerConfig
{
ShouldReturnDetails = true,
EnableSegment = true
};
classifierConfig.ContentCategories
.Add("Loan_Application",
new ContentCategoryDefinition
{
Description =
"Documents submitted by individuals"
+ " or businesses to request"
+ " funding, typically including"
+ " personal or business details,"
+ " financial history, loan amount,"
+ " purpose, and supporting"
+ " documentation."
});
classifierConfig.ContentCategories
.Add("Invoice",
new ContentCategoryDefinition
{
Description =
"Billing documents issued by"
+ " sellers or service providers"
+ " to request payment for goods"
+ " or services, detailing items,"
+ " prices, taxes, totals, and"
+ " payment terms."
});
classifierConfig.ContentCategories
.Add("Bank_Statement",
new ContentCategoryDefinition
{
Description =
"Official statements issued by"
+ " banks that summarize account"
+ " activity over a period,"
+ " including deposits,"
+ " withdrawals, fees,"
+ " and balances."
});
// Create the classifier analyzer
var classifierAnalyzer = new ContentAnalyzer
{
BaseAnalyzerId = "prebuilt-document",
Description =
"Custom classifier for financial"
+ " document categorization",
Config = classifierConfig
};
classifierAnalyzer.Models["completion"] =
"gpt-4.1";
var classifierOp =
await client.CreateAnalyzerAsync(
WaitUntil.Completed,
classifierId,
classifierAnalyzer);
// Get the full classifier details
var classifierDetails =
await client.GetAnalyzerAsync(classifierId);
var classifierResult =
classifierDetails.Value;
Console.WriteLine(
$"Classifier '{classifierId}'"
+ " created successfully!");
if (classifierResult.Description != null)
{
Console.WriteLine(
$" Description:"
+ $" {classifierResult.Description}");
}
Tip
This code is based on the Create Classifier Sample for classification workflows.
Use the custom analyzer
After creating the analyzer, use it to analyze a document and extract the custom fields. Delete the analyzer when you no longer need it.
var documentUrl = new Uri(
"https://raw.githubusercontent.com/"
+ "Azure-Samples/"
+ "azure-ai-content-understanding-assets/"
+ "main/document/invoice.pdf"
);
var analyzeOperation = await client.AnalyzeAsync(
WaitUntil.Completed,
analyzerId,
inputs: new[] {
new AnalysisInput { Uri = documentUrl }
});
var analyzeResult = analyzeOperation.Value;
if (analyzeResult.Contents?.FirstOrDefault()
is DocumentContent content)
{
if (content.Fields.TryGetValue(
"company_name", out var companyField))
{
var name =
companyField is ContentStringField sf
? sf.Value : null;
Console.WriteLine(
$"Company Name: "
+ $"{name ?? "(not found)"}");
Console.WriteLine(
" Confidence: "
+ (companyField.Confidence?
.ToString("F2") ?? "N/A"));
}
if (content.Fields.TryGetValue(
"total_amount", out var totalField))
{
var total =
totalField is ContentNumberField nf
? nf.Value : null;
Console.WriteLine(
$"Total Amount: {total}");
}
if (content.Fields.TryGetValue(
"document_summary", out var summaryField))
{
var summary =
summaryField is ContentStringField sf
? sf.Value : null;
Console.WriteLine(
$"Summary: "
+ $"{summary ?? "(not found)"}");
}
if (content.Fields.TryGetValue(
"document_type", out var typeField))
{
var docType =
typeField is ContentStringField sf
? sf.Value : null;
Console.WriteLine(
$"Document Type: "
+ $"{docType ?? "(not found)"}");
}
}
// --- Clean up ---
Console.WriteLine(
$"\nCleaning up: deleting analyzer"
+ $" '{analyzerId}'...");
await client.DeleteAnalyzerAsync(analyzerId);
Console.WriteLine(
$"Analyzer '{analyzerId}'"
+ " deleted successfully.");
An example output looks like:
Company Name: CONTOSO LTD.
Confidence: 0.88
Total Amount: 610
Summary: This document is an invoice from CONTOSO LTD. to MICROSOFT CORPORATION for consulting services, document fees, and printing fees, detailing service periods, billing and shipping addresses, itemized charges, and the total amount due.
Document Type: invoice
Cleaning up: deleting analyzer 'my_document_analyzer_ID'...
Analyzer 'my_document_analyzer_ID' deleted successfully.
Tip
Check out more examples of running analyzers at .NET SDK samples.
Client library | Samples | SDK source
This guide shows you how to use the Content Understanding Java SDK to create a custom analyzer that extracts structured data from your content. Custom analyzers support document, image, audio, and video content types.
Prerequisites
- An active Azure subscription. If you don't have an Azure account, create one for free.
- A Microsoft Foundry resource created in a supported region.
- Your resource endpoint and API key (found under Keys and Endpoint in the Azure portal).
- Model deployment defaults configured for your resource. See Models and deployments or this one-time configuration script for setup instructions.
- Java Development Kit (JDK) version 8 or later.
- Apache Maven.
Set up
Create a new Maven project:
mvn archetype:generate -DgroupId=com.example \ -DartifactId=custom-analyzer-tutorial \ -DarchetypeArtifactId=maven-archetype-quickstart \ -DinteractiveMode=false cd custom-analyzer-tutorialAdd the Content Understanding dependency to your pom.xml file in the
<dependencies>section:<dependency> <groupId>com.azure</groupId> <artifactId>azure-ai-contentunderstanding</artifactId> <version>1.0.0</version> </dependency>Optionally, add the Azure Identity library for Microsoft Entra authentication:
<dependency> <groupId>com.azure</groupId> <artifactId>azure-identity</artifactId> <version>1.14.2</version> </dependency>
Set up environment variables
To authenticate with the Content Understanding service, set the environment variables with your own values before running the sample:
CONTENTUNDERSTANDING_ENDPOINT- the endpoint to your Content Understanding resource.CONTENTUNDERSTANDING_KEY- your Content Understanding API key (optional if using Microsoft Entra ID DefaultAzureCredential).
Windows
setx CONTENTUNDERSTANDING_ENDPOINT "your-endpoint"
setx CONTENTUNDERSTANDING_KEY "your-key"
Linux / macOS
export CONTENTUNDERSTANDING_ENDPOINT="your-endpoint"
export CONTENTUNDERSTANDING_KEY="your-key"
Create the client
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import com.azure.ai.contentunderstanding
.ContentUnderstandingClient;
import com.azure.ai.contentunderstanding
.ContentUnderstandingClientBuilder;
import com.azure.ai.contentunderstanding.models.*;
String endpoint =
System.getenv("CONTENTUNDERSTANDING_ENDPOINT");
String key =
System.getenv("CONTENTUNDERSTANDING_KEY");
ContentUnderstandingClient client =
new ContentUnderstandingClientBuilder()
.endpoint(endpoint)
.credential(new AzureKeyCredential(key))
.buildClient();
Create a custom analyzer
The following example creates a custom document analyzer based on the prebuilt document analyzer. It defines fields using three extraction methods: extract for literal text, generate for AI-generated summaries, and classify for categorization.
String analyzerId =
"my_document_analyzer_"
+ System.currentTimeMillis();
Map<String, ContentFieldDefinition> fields =
new HashMap<>();
ContentFieldDefinition companyNameDef =
new ContentFieldDefinition();
companyNameDef.setType(ContentFieldType.STRING);
companyNameDef.setMethod(
GenerationMethod.EXTRACT);
companyNameDef.setDescription(
"Name of the company");
fields.put("company_name", companyNameDef);
ContentFieldDefinition totalAmountDef =
new ContentFieldDefinition();
totalAmountDef.setType(ContentFieldType.NUMBER);
totalAmountDef.setMethod(
GenerationMethod.EXTRACT);
totalAmountDef.setDescription(
"Total amount on the document");
fields.put("total_amount", totalAmountDef);
ContentFieldDefinition summaryDef =
new ContentFieldDefinition();
summaryDef.setType(ContentFieldType.STRING);
summaryDef.setMethod(
GenerationMethod.GENERATE);
summaryDef.setDescription(
"A brief summary of the document content");
fields.put("document_summary", summaryDef);
ContentFieldDefinition documentTypeDef =
new ContentFieldDefinition();
documentTypeDef.setType(ContentFieldType.STRING);
documentTypeDef.setMethod(
GenerationMethod.CLASSIFY);
documentTypeDef.setDescription(
"Type of document");
documentTypeDef.setEnumProperty(
Arrays.asList(
"invoice", "receipt", "contract",
"report", "other"
));
fields.put("document_type", documentTypeDef);
ContentFieldSchema fieldSchema =
new ContentFieldSchema();
fieldSchema.setName("company_schema");
fieldSchema.setDescription(
"Schema for extracting company information");
fieldSchema.setFields(fields);
Map<String, String> models = new HashMap<>();
models.put("completion", "gpt-4.1");
models.put("embedding", "text-embedding-3-large"); // Required when using field_schema and prebuilt-document base analyzer
ContentAnalyzer customAnalyzer =
new ContentAnalyzer()
.setBaseAnalyzerId("prebuilt-document")
.setDescription(
"Custom analyzer for extracting"
+ " company information")
.setConfig(new ContentAnalyzerConfig()
.setOcrEnabled(true)
.setLayoutEnabled(true)
.setFormulaEnabled(true)
.setEstimateFieldSourceAndConfidence(
true)
.setReturnDetails(true))
.setFieldSchema(fieldSchema)
.setModels(models);
SyncPoller<ContentAnalyzerOperationStatus,
ContentAnalyzer> operation =
client.beginCreateAnalyzer(
analyzerId, customAnalyzer, true);
ContentAnalyzer result =
operation.getFinalResult();
System.out.println(
"Analyzer '" + analyzerId
+ "' created successfully!");
if (result.getDescription() != null) {
System.out.println(
" Description: "
+ result.getDescription());
}
if (result.getFieldSchema() != null
&& result.getFieldSchema()
.getFields() != null) {
System.out.println(
" Fields ("
+ result.getFieldSchema()
.getFields().size() + "):");
result.getFieldSchema().getFields()
.forEach((fieldName, fieldDef) -> {
String method =
fieldDef.getMethod() != null
? fieldDef.getMethod()
.toString()
: "auto";
String type =
fieldDef.getType() != null
? fieldDef.getType()
.toString()
: "unknown";
System.out.println(
" - " + fieldName
+ ": " + type
+ " (" + method + ")");
});
}
An example output looks like:
Analyzer 'my_document_analyzer_ID' created successfully!
Description: Custom analyzer for extracting company information
Fields (4):
- total_amount: number (extract)
- company_name: string (extract)
- document_summary: string (generate)
- document_type: string (classify)
Tip
This code is based on the Create Analyzer sample in the SDK repository.
Optionally, you can create a classifier analyzer to categorize documents and use its results to route documents to prebuilt or custom analyzers you created. Here is an example of creating a custom analyzer for classification workflows.
// Generate a unique analyzer ID
String classifierId =
"my_classifier_" + System.currentTimeMillis();
System.out.println(
"Creating classifier '"
+ classifierId + "'...");
// Define content categories for classification
Map<String, ContentCategoryDefinition>
categories = new HashMap<>();
categories.put("Loan_Application",
new ContentCategoryDefinition()
.setDescription(
"Documents submitted by individuals"
+ " or businesses to request funding,"
+ " typically including personal or"
+ " business details, financial"
+ " history, loan amount, purpose,"
+ " and supporting documentation."));
categories.put("Invoice",
new ContentCategoryDefinition()
.setDescription(
"Billing documents issued by sellers"
+ " or service providers to request"
+ " payment for goods or services,"
+ " detailing items, prices, taxes,"
+ " totals, and payment terms."));
categories.put("Bank_Statement",
new ContentCategoryDefinition()
.setDescription(
"Official statements issued by banks"
+ " that summarize account activity"
+ " over a period, including deposits,"
+ " withdrawals, fees,"
+ " and balances."));
// Create the classifier
Map<String, String> classifierModels =
new HashMap<>();
classifierModels.put("completion", "gpt-4.1");
ContentAnalyzer classifier =
new ContentAnalyzer()
.setBaseAnalyzerId("prebuilt-document")
.setDescription(
"Custom classifier for financial"
+ " document categorization")
.setConfig(new ContentAnalyzerConfig()
.setReturnDetails(true)
.setSegmentEnabled(true)
.setContentCategories(categories))
.setModels(classifierModels);
SyncPoller<ContentAnalyzerOperationStatus,
ContentAnalyzer> classifierOp =
client.beginCreateAnalyzer(
classifierId, classifier, true);
classifierOp.getFinalResult();
// Get the full classifier details
ContentAnalyzer classifierResult =
client.getAnalyzer(classifierId);
System.out.println(
"Classifier '" + classifierId
+ "' created successfully!");
if (classifierResult.getDescription() != null) {
System.out.println(
" Description: "
+ classifierResult.getDescription());
}
Tip
This code is based on the Create Classifier sample for classification workflows.
Use the custom analyzer
After creating the analyzer, use it to analyze a document and extract the custom fields. Delete the analyzer when you no longer need it.
String documentUrl =
"https://raw.githubusercontent.com/"
+ "Azure-Samples/"
+ "azure-ai-content-understanding-assets/"
+ "main/document/invoice.pdf";
AnalysisInput input = new AnalysisInput();
input.setUrl(documentUrl);
SyncPoller<ContentAnalyzerAnalyzeOperationStatus,
AnalysisResult> analyzeOperation =
client.beginAnalyze(
analyzerId, Arrays.asList(input));
AnalysisResult analyzeResult =
analyzeOperation.getFinalResult();
if (analyzeResult.getContents() != null
&& !analyzeResult.getContents().isEmpty()
&& analyzeResult.getContents().get(0)
instanceof DocumentContent) {
DocumentContent content =
(DocumentContent) analyzeResult
.getContents().get(0);
ContentField companyField =
content.getFields() != null
? content.getFields()
.get("company_name") : null;
if (companyField
instanceof ContentStringField) {
ContentStringField sf =
(ContentStringField) companyField;
System.out.println(
"Company Name: " + sf.getValue());
System.out.println(
" Confidence: "
+ companyField.getConfidence());
}
ContentField totalField =
content.getFields() != null
? content.getFields()
.get("total_amount") : null;
if (totalField != null) {
System.out.println(
"Total Amount: "
+ totalField.getValue());
}
ContentField summaryField =
content.getFields() != null
? content.getFields()
.get("document_summary") : null;
if (summaryField
instanceof ContentStringField) {
ContentStringField sf =
(ContentStringField) summaryField;
System.out.println(
"Summary: " + sf.getValue());
}
ContentField typeField =
content.getFields() != null
? content.getFields()
.get("document_type") : null;
if (typeField
instanceof ContentStringField) {
ContentStringField sf =
(ContentStringField) typeField;
System.out.println(
"Document Type: " + sf.getValue());
}
}
// --- Clean up ---
System.out.println(
"\nCleaning up: deleting analyzer '"
+ analyzerId + "'...");
client.deleteAnalyzer(analyzerId);
System.out.println(
"Analyzer '" + analyzerId
+ "' deleted successfully.");
An example output looks like:
Company Name: CONTOSO LTD.
Confidence: 0.781
Total Amount: 610.0
Summary: This document is an invoice from CONTOSO LTD. to Microsoft Corporation for consulting services, document fees, and printing fees, detailing service dates, itemized charges, taxes, and the total amount due.
Document Type: invoice
Cleaning up: deleting analyzer 'my_document_analyzer_ID'...
Analyzer 'my_document_analyzer_ID' deleted successfully.
Tip
Check out more examples of running analyzers at Java SDK samples.
Client library | Samples | SDK source
This guide shows you how to use the Content Understanding JavaScript SDK to create a custom analyzer that extracts structured data from your content. Custom analyzers support document, image, audio, and video content types.
Prerequisites
- An active Azure subscription. If you don't have an Azure account, create one for free.
- A Microsoft Foundry resource created in a supported region.
- Your resource endpoint and API key (found under Keys and Endpoint in the Azure portal).
- Model deployment defaults configured for your resource. See Models and deployments or this one-time configuration script for setup instructions.
- Node.js LTS version.
Set up
Create a new Node.js project:
mkdir custom-analyzer-tutorial cd custom-analyzer-tutorial npm init -yInstall the Content Understanding client library:
npm install @azure/ai-content-understandingOptionally, install the Azure Identity library for Microsoft Entra authentication:
npm install @azure/identity
Set up environment variables
To authenticate with the Content Understanding service, set the environment variables with your own values before running the sample:
CONTENTUNDERSTANDING_ENDPOINT- the endpoint to your Content Understanding resource.CONTENTUNDERSTANDING_KEY- your Content Understanding API key (optional if using Microsoft Entra ID DefaultAzureCredential).
Windows
setx CONTENTUNDERSTANDING_ENDPOINT "your-endpoint"
setx CONTENTUNDERSTANDING_KEY "your-key"
Linux / macOS
export CONTENTUNDERSTANDING_ENDPOINT="your-endpoint"
export CONTENTUNDERSTANDING_KEY="your-key"
Create the client
const { AzureKeyCredential } =
require("@azure/core-auth");
const {
ContentUnderstandingClient,
} = require("@azure/ai-content-understanding");
const endpoint =
process.env["CONTENTUNDERSTANDING_ENDPOINT"];
const key =
process.env["CONTENTUNDERSTANDING_KEY"];
const client = new ContentUnderstandingClient(
endpoint,
new AzureKeyCredential(key)
);
Create a custom analyzer
The following example creates a custom document analyzer based on the prebuilt document analyzer. It defines fields using three extraction methods: extract for literal text, generate for AI-generated summaries, and classify for categorization.
const analyzerId =
`my_document_analyzer_${Math.floor(
Date.now() / 1000
)}`;
const analyzer = {
baseAnalyzerId: "prebuilt-document",
description:
"Custom analyzer for extracting"
+ " company information",
config: {
enableFormula: true,
enableLayout: true,
enableOcr: true,
estimateFieldSourceAndConfidence: true,
returnDetails: true,
},
fieldSchema: {
name: "company_schema",
description:
"Schema for extracting company"
+ " information",
fields: {
company_name: {
type: "string",
method: "extract",
description:
"Name of the company",
},
total_amount: {
type: "number",
method: "extract",
description:
"Total amount on the"
+ " document",
},
document_summary: {
type: "string",
method: "generate",
description:
"A brief summary of the"
+ " document content",
},
document_type: {
type: "string",
method: "classify",
description: "Type of document",
enum: [
"invoice", "receipt",
"contract", "report", "other",
],
},
},
},
models: {
completion: "gpt-4.1",
embedding: "text-embedding-3-large", // Required when using field_schema and prebuilt-document base analyzer
},
};
const poller = client.createAnalyzer(
analyzerId, analyzer
);
await poller.pollUntilDone();
const result = await client.getAnalyzer(
analyzerId
);
console.log(
`Analyzer '${analyzerId}' created`
+ ` successfully!`
);
if (result.description) {
console.log(
` Description: ${result.description}`
);
}
if (result.fieldSchema?.fields) {
const fields = result.fieldSchema.fields;
console.log(
` Fields`
+ ` (${Object.keys(fields).length}):`
);
for (const [name, fieldDef]
of Object.entries(fields)) {
const method =
fieldDef.method ?? "auto";
const fieldType =
fieldDef.type ?? "unknown";
console.log(
` - ${name}: `
+ `${fieldType} (${method})`
);
}
}
An example output looks like:
Analyzer 'my_document_analyzer_ID' created successfully!
Description: Custom analyzer for extracting company information
Fields (4):
- company_name: string (extract)
- total_amount: number (extract)
- document_summary: string (generate)
- document_type: string (classify)
Tip
This code is based on the create Analyzer sample in the SDK repository.
Optionally, you can create a classifier analyzer to categorize documents and use its results to route documents to prebuilt or custom analyzers you created. Here is an example of creating a custom analyzer for classification workflows.
const classifierId =
`my_classifier_${Math.floor(
Date.now() / 1000
)}`;
console.log(
`Creating classifier '${classifierId}'...`
);
const classifierAnalyzer = {
baseAnalyzerId: "prebuilt-document",
description:
"Custom classifier for financial"
+ " document categorization",
config: {
returnDetails: true,
enableSegment: true,
contentCategories: {
Loan_Application: {
description:
"Documents submitted by"
+ " individuals or"
+ " businesses to request"
+ " funding, typically"
+ " including personal or"
+ " business details,"
+ " financial history,"
+ " loan amount, purpose,"
+ " and supporting"
+ " documentation.",
},
Invoice: {
description:
"Billing documents issued"
+ " by sellers or service"
+ " providers to request"
+ " payment for goods or"
+ " services, detailing"
+ " items, prices, taxes,"
+ " totals, and payment"
+ " terms.",
},
Bank_Statement: {
description:
"Official statements"
+ " issued by banks that"
+ " summarize account"
+ " activity over a"
+ " period, including"
+ " deposits, withdrawals,"
+ " fees, and balances.",
},
},
},
models: {
completion: "gpt-4.1",
},
};
const classifierPoller =
client.createAnalyzer(
classifierId, classifierAnalyzer
);
await classifierPoller.pollUntilDone();
const classifierResult =
await client.getAnalyzer(classifierId);
console.log(
`Classifier '${classifierId}' created`
+ ` successfully!`
);
if (classifierResult.description) {
console.log(
` Description: `
+ `${classifierResult.description}`
);
}
Tip
This code is based on the create Classifier sample for classification workflows.
Use the custom analyzer
After creating the analyzer, use it to analyze a document and extract the custom fields. Delete the analyzer when you no longer need it.
const documentUrl =
"https://raw.githubusercontent.com/"
+ "Azure-Samples/"
+ "azure-ai-content-understanding-assets/"
+ "main/document/invoice.pdf";
const analyzePoller = client.analyze(
analyzerId, [{ url: documentUrl }]
);
const analyzeResult =
await analyzePoller.pollUntilDone();
if (analyzeResult.contents
&& analyzeResult.contents.length > 0) {
const content = analyzeResult.contents[0];
if (content.fields) {
const company =
content.fields["company_name"];
if (company) {
console.log(
`Company Name: `
+ `${company.value}`
);
console.log(
` Confidence: `
+ `${company.confidence}`
);
}
const total =
content.fields["total_amount"];
if (total) {
console.log(
`Total Amount: `
+ `${total.value}`
);
}
const summary =
content.fields["document_summary"];
if (summary) {
console.log(
`Summary: ${summary.value}`
);
}
const docType =
content.fields["document_type"];
if (docType) {
console.log(
`Document Type: `
+ `${docType.value}`
);
}
}
}
// --- Clean up ---
console.log(
`\nCleaning up: deleting analyzer`
+ ` '${analyzerId}'...`
);
await client.deleteAnalyzer(analyzerId);
console.log(
`Analyzer '${analyzerId}' deleted`
+ ` successfully.`
);
An example output looks like:
Company Name: CONTOSO LTD.
Confidence: 0.739
Total Amount: 610
Summary: This document is an invoice from CONTOSO LTD. to Microsoft Corporation for consulting, document, and printing services provided during the service period. It details line items, subtotal, sales tax, total, previous unpaid balance, and the final amount due.
Document Type: invoice
Cleaning up: deleting analyzer 'my_document_analyzer_ID'...
Analyzer 'my_document_analyzer_ID' deleted successfully.
Tip
Check out more examples of running analyzers at JavaScript SDK samples.
Client library | Samples | SDK source
This guide shows you how to use the Content Understanding TypeScript SDK to create a custom analyzer that extracts structured data from your content. Custom analyzers support document, image, audio, and video content types.
Prerequisites
- An active Azure subscription. If you don't have an Azure account, create one for free.
- A Microsoft Foundry resource created in a supported region.
- Your resource endpoint and API key (found under Keys and Endpoint in the Azure portal).
- Model deployment defaults configured for your resource. See Models and deployments or this one-time configuration script for setup instructions.
- Node.js LTS version.
- TypeScript 5.x or later.
Set up
Create a new Node.js project:
mkdir custom-analyzer-tutorial cd custom-analyzer-tutorial npm init -yInstall TypeScript and the Content Understanding client library:
npm install typescript ts-node @azure/ai-content-understandingOptionally, install the Azure Identity library for Microsoft Entra authentication:
npm install @azure/identity
Set up environment variables
To authenticate with the Content Understanding service, set the environment variables with your own values before running the sample:
CONTENTUNDERSTANDING_ENDPOINT- the endpoint to your Content Understanding resource.CONTENTUNDERSTANDING_KEY- your Content Understanding API key (optional if using Microsoft Entra ID DefaultAzureCredential).
Windows
setx CONTENTUNDERSTANDING_ENDPOINT "your-endpoint"
setx CONTENTUNDERSTANDING_KEY "your-key"
Linux / macOS
export CONTENTUNDERSTANDING_ENDPOINT="your-endpoint"
export CONTENTUNDERSTANDING_KEY="your-key"
Create the client
import { AzureKeyCredential } from "@azure/core-auth";
import {
ContentUnderstandingClient,
} from "@azure/ai-content-understanding";
import type {
ContentAnalyzer,
ContentAnalyzerConfig,
ContentFieldSchema,
} from "@azure/ai-content-understanding";
const endpoint =
process.env["CONTENTUNDERSTANDING_ENDPOINT"]!;
const key =
process.env["CONTENTUNDERSTANDING_KEY"]!;
const client = new ContentUnderstandingClient(
endpoint,
new AzureKeyCredential(key)
);
Create a custom analyzer
The following example creates a custom document analyzer based on the prebuilt document analyzer. It defines fields using three extraction methods: extract for literal text, generate for AI-generated summaries, and classify for categorization.
const analyzerId =
`my_document_analyzer_${Math.floor(
Date.now() / 1000
)}`;
const fieldSchema: ContentFieldSchema = {
name: "company_schema",
description:
"Schema for extracting company"
+ " information",
fields: {
company_name: {
type: "string",
method: "extract",
description:
"Name of the company",
},
total_amount: {
type: "number",
method: "extract",
description:
"Total amount on the document",
},
document_summary: {
type: "string",
method: "generate",
description:
"A brief summary of the"
+ " document content",
},
document_type: {
type: "string",
method: "classify",
description: "Type of document",
enum: [
"invoice", "receipt",
"contract", "report", "other",
],
},
},
};
const config: ContentAnalyzerConfig = {
enableFormula: true,
enableLayout: true,
enableOcr: true,
estimateFieldSourceAndConfidence: true,
returnDetails: true,
};
const analyzer: ContentAnalyzer = {
baseAnalyzerId: "prebuilt-document",
description:
"Custom analyzer for extracting"
+ " company information",
config,
fieldSchema,
models: {
completion: "gpt-4.1",
embedding: "text-embedding-3-large", // Required when using field_schema and prebuilt-document base analyzer
},
} as unknown as ContentAnalyzer;
const poller = client.createAnalyzer(
analyzerId, analyzer
);
await poller.pollUntilDone();
const result = await client.getAnalyzer(
analyzerId
);
console.log(
`Analyzer '${analyzerId}' created`
+ ` successfully!`
);
if (result.description) {
console.log(
` Description: ${result.description}`
);
}
if (result.fieldSchema?.fields) {
const fields = result.fieldSchema.fields;
console.log(
` Fields`
+ ` (${Object.keys(fields).length}):`
);
for (const [name, fieldDef]
of Object.entries(fields)) {
const method =
fieldDef.method ?? "auto";
const fieldType =
fieldDef.type ?? "unknown";
console.log(
` - ${name}: `
+ `${fieldType} (${method})`
);
}
}
An example output looks like:
Analyzer 'my_document_analyzer_ID' created successfully!
Description: Custom analyzer for extracting company information
Fields (4):
- company_name: string (extract)
- total_amount: number (extract)
- document_summary: string (generate)
- document_type: string (classify)
Tip
This code is based on the create Analyzer sample in the SDK repository.
Optionally, you can create a classifier analyzer to categorize documents and use its results to route documents to prebuilt or custom analyzers you created. Here is an example of creating a custom analyzer for classification workflows.
const classifierId =
`my_classifier_${Math.floor(
Date.now() / 1000
)}`;
console.log(
`Creating classifier '${classifierId}'...`
);
const classifierAnalyzer: ContentAnalyzer = {
baseAnalyzerId: "prebuilt-document",
description:
"Custom classifier for financial"
+ " document categorization",
config: {
returnDetails: true,
enableSegment: true,
contentCategories: {
Loan_Application: {
description:
"Documents submitted by"
+ " individuals or"
+ " businesses to request"
+ " funding, typically"
+ " including personal or"
+ " business details,"
+ " financial history,"
+ " loan amount, purpose,"
+ " and supporting"
+ " documentation.",
},
Invoice: {
description:
"Billing documents issued"
+ " by sellers or service"
+ " providers to request"
+ " payment for goods or"
+ " services, detailing"
+ " items, prices, taxes,"
+ " totals, and payment"
+ " terms.",
},
Bank_Statement: {
description:
"Official statements"
+ " issued by banks that"
+ " summarize account"
+ " activity over a"
+ " period, including"
+ " deposits, withdrawals,"
+ " fees, and balances.",
},
},
} as unknown as ContentAnalyzerConfig,
models: {
completion: "gpt-4.1",
},
} as unknown as ContentAnalyzer;
const classifierPoller =
client.createAnalyzer(
classifierId, classifierAnalyzer
);
await classifierPoller.pollUntilDone();
const classifierResult =
await client.getAnalyzer(classifierId);
console.log(
`Classifier '${classifierId}' created`
+ ` successfully!`
);
if (classifierResult.description) {
console.log(
` Description: `
+ `${classifierResult.description}`
);
}
Tip
This code is based on the create Classifier sample for classification workflows.
Use the custom analyzer
After creating the analyzer, use it to analyze a document and extract the custom fields. Delete the analyzer when you no longer need it.
const documentUrl =
"https://raw.githubusercontent.com/"
+ "Azure-Samples/"
+ "azure-ai-content-understanding-assets/"
+ "main/document/invoice.pdf";
const analyzePoller = client.analyze(
analyzerId, [{ url: documentUrl }]
);
const analyzeResult =
await analyzePoller.pollUntilDone();
if (analyzeResult.contents
&& analyzeResult.contents.length > 0) {
const content = analyzeResult.contents[0];
if (content.fields) {
const company =
content.fields["company_name"];
if (company) {
console.log(
`Company Name: `
+ `${company.value}`
);
console.log(
` Confidence: `
+ `${company.confidence}`
);
}
const total =
content.fields["total_amount"];
if (total) {
console.log(
`Total Amount: `
+ `${total.value}`
);
}
const summary =
content.fields["document_summary"];
if (summary) {
console.log(
`Summary: ${summary.value}`
);
}
const docType =
content.fields["document_type"];
if (docType) {
console.log(
`Document Type: `
+ `${docType.value}`
);
}
}
}
// --- Clean up ---
console.log(
`\nCleaning up: deleting analyzer`
+ ` '${analyzerId}'...`
);
await client.deleteAnalyzer(analyzerId);
console.log(
`Analyzer '${analyzerId}' deleted`
+ ` successfully.`
);
An example output looks like:
Company Name: CONTOSO LTD.
Confidence: 0.818
Total Amount: 610
Summary: This document is an invoice from CONTOSO LTD. to MICROSOFT CORPORATION for consulting, document, and printing services provided during the service period 10/14/2019 - 11/14/2019. It details line items, subtotal, sales tax, total, previous unpaid balance, and the final amount due.
Document Type: invoice
Cleaning up: deleting analyzer 'my_document_analyzer_ID'...
Analyzer 'my_document_analyzer_ID' deleted successfully.
Tip
Check out more examples of running analyzers at TypeScript SDK samples.
Related content
- Review code samples: visual document search.
- Review code sample: analyzer templates.
- Explore more Python SDK samples
- Explore more .NET SDK samples
- Explore more Java SDK samples
- Explore more JavaScript SDK samples
- Explore more TypeScript SDK samples
- Try processing your document content using Content Understanding in Foundry.