A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
Hello Vansh Tiwari,
Welcome to Microsoft Q&A and Thank you for reaching out.
I understand that you're experiencing issues with the Mistral doc AI 2505 model where you're sometimes getting a null response for the document_annotation JSON object, even though the markdown output looks good.
This typically indicates that the document was processed successfully, but the structured output did not pass schema validation or could not be reliably generated.
Below is a consolidated and structured set of possible causes and recommendations.
1.Why document_annotation Can Return null
When using:
"document_annotation_format": {
The service validates the model’s structured output against your schema.
If the model:
Cannot confidently extract fields
Produces output that doesn't match the schema
Encounters conflicting structure (e.g., multiple forms)
Hits token limits or truncation
Sees low OCR confidence (especially handwritten text)
The structured output may fail validation and return:
document_annotation: null
Even while markdown output succeeds
Most Likely Causes in Your Case
1.Multiple Forms in a Single PDF
You mentioned 5–6 forms in one PDF.
Your root schema is:
"type": "object"
If the model detects multiple logical form instances, it may attempt to extract multiple entities. That conflicts with a single-object schema and can cause validation failure → null.
Recommended Fix
Change the root schema to:
"type": "array",
If multiple forms are expected, the schema must reflect that.
You can also test by splitting the PDF into single-form documents and comparing results.
2.Schema Strictness & Required Fields
Your schema:
Uses "additionalProperties": true
Does not define "required" fields
Uses strict handwritten formatting instructions
If handwriting confidence is low, the model may decline to populate structured fields rather than guess incorrectly.
Suggestions
Add minimal "required" fields where appropriate.
Slightly relax instructions like:
- Instead of: “You have to return in MM/DD/YYYY format”
- Use: “Return in MM/DD/YYYY format if clearly readable”
This reduces validation failure risk.
3. PDF Structure or Formatting Issues
Ensure The PDF contains machine-readable or high-quality scanned text.
Forms are consistently structured.
Key-value layout is predictable.
Inconsistent layouts or poor scans can cause structured extraction failure while markdown still appears coherent.
Testing with simpler or smaller documents is a good way to isolate this.
4.Token Limits / Large Documents
With 5–6 forms in one file:
- Token usage increases significantly.
- Structured output may get truncated.
- Truncation → invalid JSON →
null.
Recommended Test
- Split large PDFs into individual forms.
- Compare structured output reliability.
This often improves consistency dramatically.
5.API Version
Ensure you are using API version 4.0 GA
Older API versions can produce inconsistent behavior.
6. Logging & Error Inspection
Check Response metadata, Azure portal logs, Any validation or truncation warnings
If possible, log the raw model response before schema validation to see whether the model is producing partial JSON.
About Concurrent Requests & Scaling
Azure AI Foundry scales automatically within your allocated quota.
Key points
Scaling is limited by your TPM/RPM quota.
High concurrency beyond quota → 429 throttling.
It does not auto-increase quota during bursts.
Large document processing consumes more tokens per request.
If you expect high concurrency:
Monitor TPM usage.
Implement exponential backoff retry logic.
Consider splitting large documents.
Request higher quota if needed.
Scaling is quota-driven, not dynamic elasticity.
Recommended Action
Test with a single-form PDF.
Change root schema to an array if multiple forms exist.
Add minimal required fields.
Relax overly strict formatting requirements.
Confirm API version 4.0 GA.
Monitor token usage and concurrency.
Most Probable Root Cause
Based on your description, the highest probability causes are:
Schema expecting a single object while multiple forms exist
Token/truncation issue with large multi-form PDFs
Schema validation failure due to handwritten uncertainty
Please refer this Azure AI Foundry Documentation.
I Hope this helps. Do let me know if you have any further queries.
Thank you!