A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
To summarize the case, you have seen slowness (800 milli seconds-1.3 second) recently on GPT 4.1 mini deployment in Azure OpenAI side (earlier it used to stay in 500 milli second).
Occasional service degradation on certain zones happens due to multiple individual spikes from different customers, product group normally load balance and increase capacity in those region.
Other best practice suggested to customer to load balance between multiple regions, Increase Max TPM, Reduce prompt size.
If you can help us share the details in private message, we can investigate and escalate to product group.
Thank you.