Share via


Change the MTU for the cluster network

This document details the supported process for enabling Jumbo MTU (Maximum Transmission Unit) on Azure Red Hat OpenShift (ARO) clusters. Enabling Jumbo MTU in ARO is strictly limited to intra-cluster network traffic, specifically covering pod-to-pod, pod-to-service, and node-to-node communication that utilizes the OVN overlay network. It is important to note that this configuration does not impact outbound or external network traffic, which continues to adhere to standard Azure networking MTU limits.

Overview

Changing the MTU may be beneficial for workloads that generate large volumes of east–west traffic within the cluster. Common use cases include high-throughput data processing pipelines, distributed databases, large-scale logging and monitoring systems, AI/ML training workloads, and storage-intensive applications. These are the workloads where reducing packet fragmentation can significantly improve throughput and CPU efficiency. Customers should consider enabling Jumbo MTU when network performance within the cluster is a known bottleneck or when running workloads that benefit from larger packet sizes.

Azure Red Hat OpenShift supports increasing the cluster network MTU and machine-level NIC MTU when the underlying Azure VM hardware uses the Microsoft Azure Network Adapter (MANA) driver. While Azure exposes a maximum NIC MTU of 9,000 bytes, the maximum configurable MTU is 8900 to account for 100 bytes of OVN overlay overhead.

Prerequisites

Jumbo MTU can only be enabled when every node in the cluster, including all control-plane and worker nodes, is running on Azure Virtual Machines (VM) that support the Microsoft Azure Network Adapter (MANA) driver.

  • An ARO cluster running OpenShift 4.19 or higher.
  • NICs with Accelerated Networking enabled.
  • All cluster nodes must use VMs that support the MANA driver. Supported types include the Dv6 series (for example, D4as_v6, D8s_v6, D16s_v6, etc.).

Note

If the cluster has a mix of VMs where some support MANA and some do not, the MTU migration must be delayed until all nodes support MANA. Otherwise, the cluster uses the default MTU of 1500, and may lead to fragmentation or connectivity issues for traffic between nodes with different MTU capabilities.

Validate the cluster nodes have the MANA driver

Validate that the MANA driver is in use by running the following commands on all nodes in the cluster (control plane and worker):

  1. Get the node names in your cluster using this command.

    oc get nodes -o name
    
  2. For each node, check what driver is used by the enP* interface.

    As the interface name may vary, first get the interface name:

    oc debug node/<NODE_NAME> -- chroot /host ip link | grep enP
    

    You see an output like:

    3: enP30xxxxx: <BROADCAST,MULTICAST,...,UP,LOWER_UP> mtu 1500 qdisc mq master eth0 state UP mode DEFAULT group default qlen 1000
    altname enP30xxxxxxx
    

    Use the interface name returned (like "enP30xxxxx") in the following command:

    oc debug node/<NODE_NAME> -- /bin/sh -c 'chroot /host ethtool -i <INTERFACE_NAME> | grep driver'
    

    You should see an output that shows that the MANA driver is in use.

Important

If the driver is mlx4, mlx5, hv_netvsc, or anything other than MANA, the node does not support changing the MTU to 9,000.

Change the Maximum Transmission Unit (MTU)

Before changing the MTU, ensure that the cluster and all operators are healthy. Also ensure that all machine config pools are in a stable, fully updated, and healthy state. Plan for rolling reboots, as MTU changes trigger multiple MachineConfigPool rollouts.

  1. To begin the MTU migration, specify the migration configuration by entering the following command. The Machine Config Operator performs a rolling reboot of the nodes in the cluster in preparation for the MTU change.

    oc patch Network.operator.openshift.io cluster --type=merge --patch \
    '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 8900 }, "machine": { "to": 9000 } } } } }'
    
  2. Monitor the rollout status by running the following command.

    oc get machineconfigpool
    

    Wait for all MachineConfigPool groups (master and worker) to reach a stable state, indicated by the following status values: UPDATED=true, UPDATING=false, DEGRADED=false. This takes some time to complete. The amount of time also depends on the size of the cluster.

    It should look similar to the following output.

    NAME     CONFIG                  UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    master   rendered-master-xxxxx   True      False      False      3              3                   3                     0                      2d4h
    worker   rendered-worker-xxxxx   True      False      False      3              3                   3                     0                      2d4h
    
  3. Verify MachineConfig rollout and MTU migration injection.

    After initiating the MTU migration in Step 1, verify that the Machine Config Operator (MCO) has successfully rendered and applied the updated MachineConfig to all nodes. Confirm that each node is transitioned to the expected rendered MachineConfig and that the configuration state is stable by running:

    oc describe node | egrep "hostname|machineconfig"
    

    An example of the expected output is shown:

    kubernetes.io/hostname=master-0
    [...]
    machineconfiguration.openshift.io/currentConfig: rendered-master-xxxx
    machineconfiguration.openshift.io/desiredConfig: rendered-master-xxxx
    [...]
    machineconfiguration.openshift.io/state: Done
    

    Ensure that the value of machineconfiguration.openshift.io/state is Done and that the value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.

  4. Verify the presence of the MTU migration script.

    During MTU migration, the Cluster Network Operator (CNO) injects a temporary systemd unit into the rendered MachineConfig. This unit runs the mtu-migration.sh script, which safely orchestrates the MTU transition across nodes and prevents network disruption during rolling reboots.

    To validate, inspect the MachineConfig referenced in the previous step (for example, rendered-master-xxxx or rendered-worker-xxxx).

    oc get machineconfig <CONFIG_NAME> -o yaml | grep mtu-migration.sh
    

    Where <CONFIG_NAME> specifies the name of the machine config from the machineconfiguration.openshift.io/currentConfig field. The expected output should include the following entry: "ExecStart=/usr/local/bin/mtu-migration.sh".

    Note

    The migration script is present only in the rendered MachineConfig generated by the MCO, not in user-created MachineConfigs. Always verify the specific rendered-* MachineConfig shown on the node.

  5. Apply the new hardware MTU value.

    After verifying that all previous steps are successful, create the following two MachineConfig files and apply them to the cluster.

    1. Create the master MachineConfig (99-master-mtu.yaml) file

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
      labels:
          machineconfiguration.openshift.io/role: master
      name: 99-master-mtu
      spec:
      config:
          ignition:
          version: 3.5.0
          storage:
          files:
              - contents:
                  compression: ""
                  source: data:,%5Bconnection%5D%0Amatch-device%3Dinterface-name%3Aeth0%0Aethernet.mtu%3D9000
              mode: 420
              path: /etc/NetworkManager/conf.d/99-eth0-mtu.conf
      
    2. Apply the MachineConfig by running the following command:

      oc create -f 99-master-mtu.yaml
      
    3. Create the worker MachineConfig (99-worker-mtu.yaml) file.

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
      labels:
          machineconfiguration.openshift.io/role: worker
      name: 99-worker-mtu
      spec:
      config:
          ignition:
          version: 3.5.0
          storage:
          files:
              - contents:
                  compression: ""
                  source: data:,%5Bconnection%5D%0Amatch-device%3Dinterface-name%3Aeth0%0Aethernet.mtu%3D9000
              mode: 420
              path: /etc/NetworkManager/conf.d/99-eth0-mtu.conf
      
    4. Apply the MachineConfig by running the following command:

      oc create -f 99-worker-mtu.yaml
      
    5. Monitor the rollout status by running the following command.

      oc get machineconfigpool
      

      Wait for all MachineConfigPool groups (master and worker) to reach a stable state, indicated by the following status values: UPDATED=true, UPDATING=false, DEGRADED=false. It will take some time to complete and also depends on the size of the cluster.

  6. Reverify MachineConfig rollout.

    Verify that the Machine Config Operator (MCO) is successfully rendered and applies the updated MachineConfig to all nodes. Confirm that each node is transitioned to the expected rendered MachineConfig and that the configuration state is stable by running:

    oc describe node | egrep "hostname|machineconfig"
    

    An example of the expected output is shown:

    kubernetes.io/hostname=master-0
    [...]
    machineconfiguration.openshift.io/currentConfig: rendered-master-xxxx
    machineconfiguration.openshift.io/desiredConfig: rendered-master-xxxx
    [...]
    machineconfiguration.openshift.io/state: Done
    

    Ensure that the value of machineconfiguration.openshift.io/state is Done and that the value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.

  7. Finalize the MTU migration.

    Clear the migration specification and apply the final MTU value by running the following command:

    oc patch Network.operator.openshift.io cluster --type=merge --patch \
    '{"spec": { "migration": null, "defaultNetwork": { "ovnKubernetesConfig": { "mtu": 8900 } } }}'
    

    Monitor the final rollout by checking the MachineConfigPool status, which should show UPDATED=true, UPDATING=false, DEGRADED=false.

    oc get machineconfigpool
    

Verify the change is completed

  1. To verify that the Jumbo MTU configuration is successfully applied across the ARO cluster, you should begin by checking the cluster-wide network MTU. This can be done by inspecting the network configuration, where the expected value should reflect "Cluster Network MTU: 8900", confirming that the OVN overlay network is updated.

    oc describe network.config cluster | grep "Cluster Network MTU"
    
  2. Verify the MTU at the node level by examining the primary network interface within a debug session. The interface should report an MTU of 9,000. OpenShift configures the overlay MTU conservatively (8900) to accommodate platform and overlay encapsulation and avoid packet fragmentation.

    oc debug node/<NODE_NAME> -- chroot /host ip -d link show eth0
    
  3. For end-to-end pod connectivity validation, use a payload size of 8,872 bytes. This value is calculated to ensure the resulting packet size is exactly 8900 bytes (8,872 payload + 8 ICMP header + 20 IP header). This should succeed without fragmentation, indicating that the Jumbo MTU is consistently applied throughout the data path.

    You may select any pod to ping. For example:

    oc get pods -n openshift-monitoring -o wide
    

    We can see an output like:

    NAME                                                     READY   STATUS    RESTARTS   AGE     IP            NODE                                NOMINATED NODE   READINESS GATES
    ...
    metrics-server-xxxxxxxxxx-xxxxx                          1/1     Running   0          2d16h   10.129.2.13   myarocluster-worker-westus2-xxxxx   <none>           <none>
    metrics-server-xxxxxxxxxx-xxxxx                          1/1     Running   0          2d16h   10.131.0.7    myarocluster-worker-westus2-xxxxx   <none>           <none>
    monitoring-plugin-xxxxxxxxxx-xxxxx                       1/1     Running   0          2d16h   10.129.2.8    myarocluster-worker-westus2-xxxxx   <none>           <none>
    monitoring-plugin-xxxxxxxxxx-xxxxx                       1/1     Running   0          2d16h   10.131.0.16   myarocluster-worker-westus2-xxxxx   <none>           <none>
    node-exporter-xxxxx                                      2/2     Running   10         3d      10.0.0.10     myarocluster-master-0               <none>           <none>
    ....
    

    We can select the metrics-server pod.

    Send an ICMP packet with a payload size of 8,872 bytes using the following command.

    oc debug node/<NODE_NAME> -- chroot /host ping -M do -s 8,872 <POD_IP>
    

    We should see a result like:

    PING 10.129.2.13 (10.129.2.13) 8872(8900) bytes of data.
    8880 bytes from 10.129.2.13: icmp_seq=1 ttl=62 time=5.26 ms
    8880 bytes from 10.129.2.13: icmp_seq=2 ttl=62 time=0.408 ms
    8880 bytes from 10.129.2.13: icmp_seq=3 ttl=62 time=0.198 ms
    

Troubleshooting

  1. MachineConfigPool is stuck in an UPDATING state

    If the MachineConfigPool becomes stuck in the UPDATING state during the MTU migration process, you can begin troubleshooting by reviewing the Machine Config Operator (MCO) logs. This is done by running the command oc logs -n openshift-machine-config-operator deploy/machine-config-operator, which provides insight into what may be preventing the rollout from completing. Common issues that lead to an MCO stall include incorrectly formatted or improperly indented YAML within the applied MachineConfig, conflicts caused by multiple MachineConfigs attempting to modify the same files or settings, or nodes failing to reboot automatically after receiving updated configurations. Reviewing and correcting these issues typically allows the MachineConfigPool to resume progress and complete the update successfully.

  2. MTU not updated on node NIC

    If the MTU does not appear to be applied correctly at the node level, begin troubleshooting by first identifying the physical network interface on the node (for example, eth0 or enp*), as interface names may vary across environments. Verify the NetworkManager configuration file that sets the interface MTU by reviewing /etc/NetworkManager/conf.d/99-eth0-mtu.conf to ensure the expected MTU value is present. Next, confirm the MTU applied to the physical interface by inspecting its link settings rather than an OVS bridge such as br-ex. If the physical interface still reports an MTU of 1500, further investigation is required. In such cases, verify that the NIC driver in use is MANA, ensure that the underlying Azure VM size supports Jumbo MTU capabilities, and check whether the MachineConfigPool is still progressing by confirming whether UPDATING=true, which may indicate that the rollout has not yet completed.

  3. Missing MANA driver

    Each NIC should list MANA as the active driver. If it is missing, the node’s VM instance type may not support the Microsoft Azure Network Adapter (MANA). In such cases, you must resize the node to a Dv6-series or another MANA-supported Azure VM size. After resizing, the node may need to be rebuilt for the change to take effect which is typically done by draining and deleting the node, allowing ARO to automatically recreate it with the correct VM configuration.

  4. OVN pods showing MTU mismatch

    If OVN pods report an MTU mismatch, you can begin troubleshooting by reviewing the OVN node logs using the command oc logs -n openshift-ovn-kubernetes ds/ovnkube-node --all-containers=true | grep mtu. This helps identify any discrepancies between the MTU values applied at the OVN layer and those configured through the MachineConfigs. If mismatches appear in the logs, ensure that the Cluster Network Operator (CNO) MTU settings align with the values defined in the MachineConfig files, as inconsistencies between these components can prevent the MTU migration from completing correctly.

  5. Connectivity issues after migration

    Connectivity issues after the MTU migration typically indicate that MTU values were not fully propagated across the cluster. To diagnose this, first verify that each node reports an MTU of 9,000, reflecting the expected hardware-level MTU on Azure. Next, ensure that the OVN overlay network is configured with an MTU of 8900, which aligns with the cluster-wide network settings. It is also critical to confirm that the Azure NIC is using the MANA driver, as non-MANA drivers do not support Jumbo MTU. If these values appear correct yet connectivity problems persist, you can further validate the end-to-end path MTU by running tracepath <destination>, which helps identify where packet fragmentation or MTU drops may be occurring in the network path.

Next steps