mistral doc ai 2505 gives null in the data annotation schema, but markdown is sucessfull

Question

mistral doc ai 2505 gives null in the data annotation schema, but markdown is sucessfull

Vansh Tiwari 20

Hello, I am using Azure AI Foundry's Mistral doc AI 2505 model for key-value pair extraction, where I will be getting 5-6 forms in a PDF. The thing is i am using the document_annotation method to call the model, but sometimes I get the document_annotation JSON object null, even though the markdown that the model gives is completely fine.
Below is the code sample

payload = {
        "model": model,
        "document": {"type": "document_url", "document_url": data_url_pdf},
        "document_annotation_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "document_annotation",
            "strict": False,
            "schema": {
                "title": "form",
                "type": "object",
                "additionalProperties": True,
                "properties": {                   

                    "flight_details": {
                        "type": "object",
                        "additionalProperties": True,
                        "properties": {
                            "flight_number": {
                                "type": "string",
                                "description": "Flight number exactly as printed in the document. Note that this is handwritten flight number consists of only alphabets and numbers. First it should be alphabet and then numbers."
                            },
                            "date_of_transport_raw": {
                                "type": "string",
                                "description": "Date exactly as it appears in MM/DD/YYYY format in the document.Note that this is handwritten date You have to return in MM/DD/YYYY format"
                            }
                        }
                    },

                    "patient_route": {
                        "type": "object",
                        "additionalProperties": True,
                        "properties": {
                            "pickup_airport_identifier": {
                                "type": "string",
                                "description": "Pickup airport identifier exactly as printed in the document.Note that this is handwritten pickup airport identifier consists of only alphabets"
                            },
                            "dropoff_airport_identifier": {
                                "type": "string",
                                "description": "Dropoff airport identifier exactly as printed in the document.Note that this is handwritten dropoff airport identifier consists of only alphabets"
                            }
                        }
                    },

"metadata": {
                        "type": "object",
                        "additionalProperties": True,
                        "properties": {
                            "language": { "type": "string" },
                            "page_count": { "type": "integer" }
                        }
                    }
                }
            }
        }
    },
        "include_image_base64": False,
}

Also, can someone explain to me if there is a concurrent request, then how will the Azure AI Foundry scale up .
Thank you for helping me out.

SRILAKSHMI C 15,030 Reputation points Microsoft External Staff Moderator

2026-03-02T06:05:16.37+00:00

Hi Vansh Tiwari,

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!
Vansh Tiwari 20 Reputation points

2026-03-02T13:17:29.2033333+00:00

Hi @SRILAKSHMI C Thank you so much for the help. I figured out that setting "strict":"True" and "additionalProperties": false made it work.

Thank you for your answer; I got to know more about this model's use case. Will definitely try to apply your suggestions.
SRILAKSHMI C 15,030 Reputation points Microsoft External Staff Moderator

2026-03-03T11:22:48.07+00:00

Hi Vansh Tiwari,

Glad to hear that setting "strict": "True" and "additionalProperties": false resolved the issue.

Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted? This helps others in the community with the same question find the solution more easily.

Thank you!

Answer accepted by question author

0 additional answers

Your answer

SRILAKSHMI C 15,030 Reputation points Microsoft External Staff Moderator

2026-03-02T06:05:16.37+00:00

Hi Vansh Tiwari,

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!
Vansh Tiwari 20 Reputation points

2026-03-02T13:17:29.2033333+00:00

Hi @SRILAKSHMI C Thank you so much for the help. I figured out that setting "strict":"True" and "additionalProperties": false made it work.

Thank you for your answer; I got to know more about this model's use case. Will definitely try to apply your suggestions.
SRILAKSHMI C 15,030 Reputation points Microsoft External Staff Moderator

2026-03-03T11:22:48.07+00:00

Hi Vansh Tiwari,

Glad to hear that setting "strict": "True" and "additionalProperties": false resolved the issue.

Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted? This helps others in the community with the same question find the solution more easily.

Thank you!

Answer 1

Hello Vansh Tiwari,

Welcome to Microsoft Q&A and Thank you for reaching out.

I understand that you're experiencing issues with the Mistral doc AI 2505 model where you're sometimes getting a null response for the document_annotation JSON object, even though the markdown output looks good.

This typically indicates that the document was processed successfully, but the structured output did not pass schema validation or could not be reliably generated.

Below is a consolidated and structured set of possible causes and recommendations.

1.Why document_annotation Can Return null

When using:

"document_annotation_format": {

The service validates the model’s structured output against your schema.

If the model:

Cannot confidently extract fields

Produces output that doesn't match the schema

Encounters conflicting structure (e.g., multiple forms)

Hits token limits or truncation

Sees low OCR confidence (especially handwritten text)

The structured output may fail validation and return:

document_annotation: null

Even while markdown output succeeds

Most Likely Causes in Your Case

1.Multiple Forms in a Single PDF

You mentioned 5–6 forms in one PDF.

Your root schema is:

"type": "object"

If the model detects multiple logical form instances, it may attempt to extract multiple entities. That conflicts with a single-object schema and can cause validation failure → null.

Recommended Fix

Change the root schema to:

"type": "array",

If multiple forms are expected, the schema must reflect that.

You can also test by splitting the PDF into single-form documents and comparing results.

2.Schema Strictness & Required Fields

Your schema:

Uses "additionalProperties": true

Does not define "required" fields

Uses strict handwritten formatting instructions

If handwriting confidence is low, the model may decline to populate structured fields rather than guess incorrectly.

Suggestions

Add minimal "required" fields where appropriate.

Slightly relax instructions like:

Instead of: “You have to return in MM/DD/YYYY format”
Use: “Return in MM/DD/YYYY format if clearly readable”

This reduces validation failure risk.

3. PDF Structure or Formatting Issues

Ensure The PDF contains machine-readable or high-quality scanned text.

Forms are consistently structured.

Key-value layout is predictable.

Inconsistent layouts or poor scans can cause structured extraction failure while markdown still appears coherent.

Testing with simpler or smaller documents is a good way to isolate this.

4.Token Limits / Large Documents

With 5–6 forms in one file:

Token usage increases significantly.
Structured output may get truncated.
Truncation → invalid JSON → null.

Recommended Test

Split large PDFs into individual forms.
Compare structured output reliability.

This often improves consistency dramatically.

5.API Version

Ensure you are using API version 4.0 GA

Older API versions can produce inconsistent behavior.

6. Logging & Error Inspection

Check Response metadata, Azure portal logs, Any validation or truncation warnings

If possible, log the raw model response before schema validation to see whether the model is producing partial JSON.

About Concurrent Requests & Scaling

Azure AI Foundry scales automatically within your allocated quota.

Key points

Scaling is limited by your TPM/RPM quota.

High concurrency beyond quota → 429 throttling.

It does not auto-increase quota during bursts.

Large document processing consumes more tokens per request.

If you expect high concurrency:

Monitor TPM usage.

Implement exponential backoff retry logic.

Consider splitting large documents.

Request higher quota if needed.

Scaling is quota-driven, not dynamic elasticity.

Recommended Action

Test with a single-form PDF.

Change root schema to an array if multiple forms exist.

Add minimal required fields.

Relax overly strict formatting requirements.

Confirm API version 4.0 GA.

Monitor token usage and concurrency.

Most Probable Root Cause

Based on your description, the highest probability causes are:

Schema expecting a single object while multiple forms exist

Token/truncation issue with large multi-form PDFs

Schema validation failure due to handwritten uncertainty

Please refer this Azure AI Foundry Documentation.

I Hope this helps. Do let me know if you have any further queries.

Thank you!

Share via

mistral doc ai 2505 gives null in the data annotation schema, but markdown is sucessfull

0 additional answers

Your answer