VMSS still failing after Feb 2 incident — VMSS control‑plane operations broken, only instance APIs work

Question

VMSS still failing after Feb 2 incident — VMSS control‑plane operations broken, only instance APIs work

Mike Jones 20 Microsoft Employee

I have a Virtual Machine Scale Set running in East US 2 that was impacted by the Feb 2 service‑management outage. During the outage, the VMSS failed to download extensions from Microsoft‑managed storage. Autoscaling attempted to scale out, and I manually deleted several instances while the outage was active (I did not know at the time that a platform incident was occurring).

After the outage was marked “mitigated,” three of my VMSS recovered normally. One VMSS did not recover and remains in a broken state.

Mike Jones 20 Reputation points Microsoft Employee

2026-02-03T20:27:23.8366667+00:00
I’ve confirmed this is not an instance‑level issue. Instance view succeeds and all VMs report ProvisioningState/succeeded.

However all VMSS‑level operations fail with 500 InternalOperationError, including vmss show, scale, and resource tag (model write). This looks like VMSS model corruption after the Feb 2 outage, likely due to instance deletions that happened during the extension‑storage failure.

Correlation IDs:

GET failure: Correlation ID removed for privacy

PATCH (tag) failure: Correlation ID removed for privacy

Requesting backend repair of the VMSS model so control‑plane operations succeed again.
Mike Jones 20 Reputation points Microsoft Employee

2026-02-03T21:04:26.55+00:00

Just to clarify the support flow: this subscription uses internal Microsoft support, where Severity B issues are expected to be posted here first, and then Microsoft Q&A/PCS escalates to the appropriate engineering team (CSS/Compute RP) when backend action is required. I’m following that standard internal process with this request.

Answer accepted by question author

0 additional answers

Your answer

Mike Jones 20 Reputation points Microsoft Employee

2026-02-03T20:27:23.8366667+00:00

I’ve confirmed this is not an instance‑level issue. Instance view succeeds and all VMs report ProvisioningState/succeeded.

However all VMSS‑level operations fail with 500 InternalOperationError, including vmss show, scale, and resource tag (model write). This looks like VMSS model corruption after the Feb 2 outage, likely due to instance deletions that happened during the extension‑storage failure.

Correlation IDs:

GET failure: Correlation ID removed for privacy

PATCH (tag) failure: Correlation ID removed for privacy

Requesting backend repair of the VMSS model so control‑plane operations succeed again.
Mike Jones 20 Reputation points Microsoft Employee

2026-02-03T21:04:26.55+00:00

Just to clarify the support flow: this subscription uses internal Microsoft support, where Severity B issues are expected to be posted here first, and then Microsoft Q&A/PCS escalates to the appropriate engineering team (CSS/Compute RP) when backend action is required. I’m following that standard internal process with this request.

Answer 1

Hello Mike

This is not an instance-level issue and not a configuration problem in your VMSS.

Your VM Scale Set is in a partially corrupted control‑plane state due to the February 2, 2026 Azure service‑management incident. During that incident, Azure confirmed a platform issue where VM and VMSS management operations failed because access to Microsoft‑managed extension storage was disrupted. As a result, VM lifecycle and scale‑set model updates were intermittently failing in affected regions, including East US 2.

In your case:

Instance‑level operations succeed (instance view shows ProvisioningState/succeeded)
Control‑plane operations fail with 500 InternalOperationError
- vmss show
  - scale
    - resource tag write
    - This indicates that the VMSS model persisted in an inconsistent state after instance deletions occurred while the platform outage was active

Microsoft confirms that manual instance deletes during a VMSS control‑plane outage can leave the scale‑set model unrecoverable without backend repair, even after the incident is marked mitigated.

There is no customer‑side fix for this condition.

The only supported resolution is:

Microsoft backend repair of the VMSS model, or
Recreation of the VMSS from a clean configuration

Microsoft requires a support‑initiated backend action to restore control‑plane consistency when this condition occurs. [learn.microsoft.com]

If an Azure support case can be opened, provide:

VMSS name
Region
Correlation IDs from failing operations
Confirmation that the VMSS was modified during the Feb 2 incident window

Microsoft engineering teams use this to perform a model reconciliation or forced repair.

Summary :

Your VMSS is healthy at the instance level but broken at the control plane due to a known Azure platform incident. This is a Microsoft‑side condition, and recovery requires backend intervention or VMSS re‑creation. Customer‑initiated retries, redeploys, or scaling changes cannot fix this state.
Manual patching is needed from the backend.

Thanks,
Manish Deshpande.

Share via

VMSS still failing after Feb 2 incident — VMSS control‑plane operations broken, only instance APIs work

0 additional answers

Your answer