Share via

mistral doc ai 2505 gives null in the data annotation schema, but markdown is sucessfull

Vansh Tiwari 20 Reputation points
2026-02-09T13:18:39.44+00:00

Hello, I am using Azure AI Foundry's Mistral doc AI 2505 model for key-value pair extraction, where I will be getting 5-6 forms in a PDF. The thing is i am using the document_annotation method to call the model, but sometimes I get the document_annotation JSON object null, even though the markdown that the model gives is completely fine.
Below is the code sample

payload = {
        "model": model,
        "document": {"type": "document_url", "document_url": data_url_pdf},
        "document_annotation_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "document_annotation",
            "strict": False,
            "schema": {
                "title": "form",
                "type": "object",
                "additionalProperties": True,
                "properties": {                   

                    "flight_details": {
                        "type": "object",
                        "additionalProperties": True,
                        "properties": {
                            "flight_number": {
                                "type": "string",
                                "description": "Flight number exactly as printed in the document. Note that this is handwritten flight number consists of only alphabets and numbers. First it should be alphabet and then numbers."
                            },
                            "date_of_transport_raw": {
                                "type": "string",
                                "description": "Date exactly as it appears in MM/DD/YYYY format in the document.Note that this is handwritten date You have to return in MM/DD/YYYY format"
                            }
                        }
                    },

                    "patient_route": {
                        "type": "object",
                        "additionalProperties": True,
                        "properties": {
                            "pickup_airport_identifier": {
                                "type": "string",
                                "description": "Pickup airport identifier exactly as printed in the document.Note that this is handwritten pickup airport identifier consists of only alphabets"
                            },
                            "dropoff_airport_identifier": {
                                "type": "string",
                                "description": "Dropoff airport identifier exactly as printed in the document.Note that this is handwritten dropoff airport identifier consists of only alphabets"
                            }
                        }
                    },

"metadata": {
                        "type": "object",
                        "additionalProperties": True,
                        "properties": {
                            "language": { "type": "string" },
                            "page_count": { "type": "integer" }
                        }
                    }
                }
            }
        }
    },
        "include_image_base64": False,
}

Also, can someone explain to me if there is a concurrent request, then how will the Azure AI Foundry scale up .
Thank you for helping me out.

Azure AI services
Azure AI services

A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.

{count} votes

Answer accepted by question author
  1. SRILAKSHMI C 15,030 Reputation points Microsoft External Staff Moderator
    2026-02-27T12:10:38.63+00:00

    Hello Vansh Tiwari,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    I understand that you're experiencing issues with the Mistral doc AI 2505 model where you're sometimes getting a null response for the document_annotation JSON object, even though the markdown output looks good.

    This typically indicates that the document was processed successfully, but the structured output did not pass schema validation or could not be reliably generated.

    Below is a consolidated and structured set of possible causes and recommendations.

    1.Why document_annotation Can Return null

    When using:

    "document_annotation_format": {
    

    The service validates the model’s structured output against your schema.

    If the model:

    Cannot confidently extract fields

    Produces output that doesn't match the schema

    Encounters conflicting structure (e.g., multiple forms)

    Hits token limits or truncation

    Sees low OCR confidence (especially handwritten text)

    The structured output may fail validation and return:

    document_annotation: null
    

    Even while markdown output succeeds

    Most Likely Causes in Your Case

    1.Multiple Forms in a Single PDF

    You mentioned 5–6 forms in one PDF.

    Your root schema is:

    "type": "object"
    

    If the model detects multiple logical form instances, it may attempt to extract multiple entities. That conflicts with a single-object schema and can cause validation failure → null.

    Recommended Fix

    Change the root schema to:

    "type": "array",
    

    If multiple forms are expected, the schema must reflect that.

    You can also test by splitting the PDF into single-form documents and comparing results.

    2.Schema Strictness & Required Fields

    Your schema:

    Uses "additionalProperties": true

    Does not define "required" fields

    Uses strict handwritten formatting instructions

    If handwriting confidence is low, the model may decline to populate structured fields rather than guess incorrectly.

    Suggestions

    Add minimal "required" fields where appropriate.

    Slightly relax instructions like:

    • Instead of: “You have to return in MM/DD/YYYY format”
    • Use: “Return in MM/DD/YYYY format if clearly readable”

    This reduces validation failure risk.

    3. PDF Structure or Formatting Issues

    Ensure The PDF contains machine-readable or high-quality scanned text.

    Forms are consistently structured.

    Key-value layout is predictable.

    Inconsistent layouts or poor scans can cause structured extraction failure while markdown still appears coherent.

    Testing with simpler or smaller documents is a good way to isolate this.

    4.Token Limits / Large Documents

    With 5–6 forms in one file:

    • Token usage increases significantly.
    • Structured output may get truncated.
    • Truncation → invalid JSON → null.

    Recommended Test

    • Split large PDFs into individual forms.
    • Compare structured output reliability.

    This often improves consistency dramatically.

    5.API Version

    Ensure you are using API version 4.0 GA

    Older API versions can produce inconsistent behavior.

    6. Logging & Error Inspection

    Check Response metadata, Azure portal logs, Any validation or truncation warnings

    If possible, log the raw model response before schema validation to see whether the model is producing partial JSON.

    About Concurrent Requests & Scaling

    Azure AI Foundry scales automatically within your allocated quota.

    Key points

    Scaling is limited by your TPM/RPM quota.

    High concurrency beyond quota → 429 throttling.

    It does not auto-increase quota during bursts.

    Large document processing consumes more tokens per request.

    If you expect high concurrency:

    Monitor TPM usage.

    Implement exponential backoff retry logic.

    Consider splitting large documents.

    Request higher quota if needed.

    Scaling is quota-driven, not dynamic elasticity.

    Recommended Action

    Test with a single-form PDF.

    Change root schema to an array if multiple forms exist.

    Add minimal required fields.

    Relax overly strict formatting requirements.

    Confirm API version 4.0 GA.

    Monitor token usage and concurrency.

    Most Probable Root Cause

    Based on your description, the highest probability causes are:

    Schema expecting a single object while multiple forms exist

    Token/truncation issue with large multi-form PDFs

    Schema validation failure due to handwritten uncertainty

    Please refer this Azure AI Foundry Documentation.

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.