Azure OpenAI Regional (Standard) quotas

Question

Azure OpenAI Regional (Standard) quotas

hb 0

Hello,

I have a question on a point that I believe is missing in Azure OpenAI quotas documentation, regarding Standard deployments.

I am currently using Azure OpenAI models in an application but I would like to switch from my current usage which uses Data Zone Standard / Global Standard deployments to Regional (Standard) deployment. The region I'm interested in is Switzerland.

From the Standard deployment model availability table, I see that I can have access to gpt-4.1**, **2025-04-14 in Switzerland North.
From the Quota Tier Reference table, I know that I'm currently Tier 4 in my subscription as I can see in my Azure interface that I have 6M TPM quota on GPT-4.1 in Data Zone Standard.
However, this quota table does not give any insight on what quotas I could have access to if I swapped to a Regional deployment of GPT-4.1. The only "Standard" deployment in the list is GPT-4.1-mini which has a higher TPM limit than the same model in Data Zone Standard which seems illogical.
(When I look in my Azure interface / Foundry, I see that I currently have only 50K TPM for GPT-4.1 Standard deployment in every region. Likewise, the GPT-4.1-mini quotas for which the docs says I should have 75M TPM only shows 200K TPM in every region, which is inconsistent).

Is there a similar list of the TPM & RPM associated with Standard / Regional deployments ?
Does it depend on your tier ?
I know there is a system of quota increase request ; do I have to go through this form ? But even then, I would like to know what quotas I'm eligible to, in order to fill 1- the form properly and 2- swap to another solution if the quotas are insufficient in the first place.

Thank you very much

1 answer

Your answer

Answer 1

For Azure OpenAI, quota is always scoped per subscription, per region, and per model or deployment type. The public quota tier table only documents default quotas for some deployment types and doesn’t list a separate, fixed TPM/RPM matrix for every Standard (regional) deployment.

Key points for the scenario:

No separate public table for Standard/regional TPM & RPM
The quotas and limits reference describes global rules (for example, max resources, deployments, training jobs, etc.) and explains that TPM/RPM are defined per region, per subscription, per model or deployment type, but it does not publish a full per‑region TPM/RPM table for Standard deployments such as gpt-4.1 in Switzerland North.
What is seen in the portal is the authoritative value
In the Azure AI Foundry quota blade (Management → Quota), the Quota allocation column shows, for each model/deployment type and region, how much TPM is currently approved for that subscription and region. That 50K TPM for gpt‑4.1 Standard and 200K TPM for gpt‑4.1‑mini Standard are the actual allocations for that subscription today, even if they differ from examples or tier values in the documentation.
Quotas are tiered and capacity‑dependent
TPM/RPM limits are tied to:
- The subscription’s quota tier, and
- Available regional capacity for that model/deployment type.
The documentation notes that TPM and RPM limits are defined per region and that you can view quota availability by region for a subscription in the Foundry/AI Foundry portal or via the capacity API. This means two subscriptions at the same tier can still see different numbers in different regions depending on capacity.
How to see what is available before committing to a move
To understand what quota is available for gpt‑4.1 Standard in Switzerland North for the current subscription:
- Use the Quota page in Azure AI Foundry to see current allocation and remaining capacity for that model/region.
- Optionally, call the model capacities API (model-capacities/list) with subscriptionId, model_name and model_version to see capacity across all regions and deployment types for that subscription.
Requesting higher Standard/regional quota
If the current 50K TPM is insufficient, the only supported way to increase it is to submit a quota increase request:
- From the Quota page, use the Request quota action, which opens the quota increase request form.
- Or go directly to the quota increase request form referenced in the documentation.
Requests are processed in order of receipt, and priority is given to subscriptions that are already consuming their existing allocation. The documentation does not publish hard “maximums” per tier for each Standard region/model; what is ultimately granted depends on tier and regional capacity.
What to put in the request form
Because there is no public per‑tier maximum table for Standard/regional deployments, the recommended approach is:
- Specify the exact model and deployment type (for example, gpt‑4.1, Standard) and region (Switzerland North).
- Specify the TPM/RPM needed based on workload estimates.
- If capacity is insufficient, the request may be partially approved or denied; in that case, consider distributing load across multiple supported regions or using Data Zone/Global deployments where higher quota is already available.

In summary, Standard/regional quotas do depend on tier and regional capacity, but there is no separate public table listing their TPM/RPM. The values shown in the Quota blade (or returned by the capacity API) are the authoritative numbers, and any increase beyond those requires a quota increase request.

References:

Share via

Azure OpenAI Regional (Standard) quotas

1 answer

Your answer