Tag not monitored by Microsoft.
For Azure OpenAI, quota is always scoped per subscription, per region, and per model or deployment type. The public quota tier table only documents default quotas for some deployment types and doesn’t list a separate, fixed TPM/RPM matrix for every Standard (regional) deployment.
Key points for the scenario:
- No separate public table for Standard/regional TPM & RPM
The quotas and limits reference describes global rules (for example, max resources, deployments, training jobs, etc.) and explains that TPM/RPM are defined per region, per subscription, per model or deployment type, but it does not publish a full per‑region TPM/RPM table for Standard deployments such asgpt-4.1in Switzerland North. - What is seen in the portal is the authoritative value
In the Azure AI Foundry quota blade (Management→Quota), the Quota allocation column shows, for each model/deployment type and region, how much TPM is currently approved for that subscription and region. That 50K TPM forgpt‑4.1Standard and 200K TPM forgpt‑4.1‑miniStandard are the actual allocations for that subscription today, even if they differ from examples or tier values in the documentation. - Quotas are tiered and capacity‑dependent
TPM/RPM limits are tied to:- The subscription’s quota tier, and
- Available regional capacity for that model/deployment type.
- How to see what is available before committing to a move
To understand what quota is available forgpt‑4.1Standard in Switzerland North for the current subscription:- Use the Quota page in Azure AI Foundry to see current allocation and remaining capacity for that model/region.
- Optionally, call the model capacities API (
model-capacities/list) withsubscriptionId,model_nameandmodel_versionto see capacity across all regions and deployment types for that subscription.
- Requesting higher Standard/regional quota
If the current 50K TPM is insufficient, the only supported way to increase it is to submit a quota increase request:- From the Quota page, use the Request quota action, which opens the quota increase request form.
- Or go directly to the quota increase request form referenced in the documentation.
- What to put in the request form
Because there is no public per‑tier maximum table for Standard/regional deployments, the recommended approach is:- Specify the exact model and deployment type (for example,
gpt‑4.1, Standard) and region (Switzerland North). - Specify the TPM/RPM needed based on workload estimates.
- If capacity is insufficient, the request may be partially approved or denied; in that case, consider distributing load across multiple supported regions or using Data Zone/Global deployments where higher quota is already available.
- Specify the exact model and deployment type (for example,
In summary, Standard/regional quotas do depend on tier and regional capacity, but there is no separate public table listing their TPM/RPM. The values shown in the Quota blade (or returned by the capacity API) are the authoritative numbers, and any increase beyond those requires a quota increase request.
References: