Upgrading comware IRF stacks with no downtime (kind of)

Linking multiple switches together has many advantages, including a single logical device to manage. This is true on face value when it comes to upgrades. You perform the operation once and every member is upgraded automatically. In fact the IRF system means it is impossible to not have all the members running the same version.

However, the upgrade process can take a long time with no traffic passing. For a nine member IRF this can be over 20 minutes. Not everyone has the luxury of having users able to wait that long even out of hours. In a datacentre 20 seconds is enough to cause chaos in some setups. There is a solution which I’ll get to below but first it is important to understand the different upgrade methods.

Reading the official litrature you’ll see there are two distinct upgrade methods. The first traditional method involves the reboot of the system after having placed the new code on all the members and setting it to make use of this new version. The basic steps are:

  1. Copy the .ipe file to the flash of a member
  2. Run the boot-loader file command . This extracts code to all members and sets the boot loader to boot from the new files upon next reboot.
  3. Reboot entire system
  4. Wait many minutes

The second documented version is called ISSU (in situ software upgrade). This method promises nevana in so much as it upgrades the software whilst traffic is still flowing through all the members. This sounds great and it is such a shame it never works. Even if the many caveats (e.g. only between minor versions, incompatible with features…) it is known to break leaving the system in an uncertain state. I recommend never touching ISSU for comware or any other vendor. I used to work for one of other the big players in the market and it was well known internally how the challenges of ISSU meant it was never to be recommended.

Reading this far it seems HPE provide the datacentre comware customer with the choice between uncertainty and a unacceptably long downtime when upgrading stacks. There is a third way which will help in some scenarios and in my world has dictated the design of the DC switching fabric. There is a way to upgrade a two member IRF with only a very small amount of downtime to the system as a whole. This means that LACP connected devices (at least one link to each of the two comware members) will see less than 5 seconds of switching dowtime. In fact in most cases a continuous ping to a connected server will loose only a single ping.

Upload .ipe file to master and use the boot-loader file command to extract to both members as well as pointing the boot loader to the new files. This can be done in advance as it is only invoked upon boot.

reboot member 1 (MUST! be the lowest numbered member regardless of IRF priorities or other factors). At this point member 2 becomes master and the LACP connections continue to serve connected devices.

During the reboot of member 1 login to the stack again (you should be kicked out as mastership moves to member 2). Find the IRF link physical interfaces and administratively down them with shut command. DO NOT SAVE.

After 5-9 minutes member 1 finishes loading. It can’t connect to member 2 because the links are down. It assumes the master role as it sees no other member. Member 2 continues to be master because it sees no other member. However, MAD packets (typically via the uplink) allow both members to detect there is another master online. The MAD process does it job of putting all members of the stack except the lowest ID into the MAD fault state. In our case this makes member 2 shut down every interface. Without intervention this state remains forever with member 1 serving traffic while member 2 sulks with no flashing lights.

If you have console access the process is completed by simply rebooting member 2 without saving. If you save before reboot the IRF links remain shut down after the reboot and the MAD process continues. If the reboot occurs without a save then member 2 reloads, forms an IRF and everything is back to normal.

Note that at no point during this process was there no active member to service packet switching. Actually, there is a brief moment as member 1 comes back after the reboot when you will detect a tiny outage.

If you don’t have console access to the devices then when you admin down the IRF interfaces on member 2 you can issue a schedule reboot after 9 command. This means that after SSH connectivity is lost as the MAD process does its thing, the second reboot occurs. Doing it on the console reduces the time when member 2 is out of service but has no effect on LACP connectivity. If you have a mix of LACP and singularly connected devices then it is desirable to reboot member 2 as soon as member 1 is back from the dead.

It is worth reiterating the restrictions of this method:

  • Only works for two member IRF stacks (you can fudge it for more if you accept all but member 1 will go offline after the first reboot).
  • Only helps LACP connected devices. Any device connected with just one physical interface will see upto ten minutes of downtime.
  • If the system does more than layer 2 switching then the downtime in the middle switchover is longer. For example if there are OSPF adjacencies these are broken and re-established twice as each master takes over. I’ve recorded HP5930 stacks with OSPF, VRRP and BGP running taking about ten seconds before everything type of traffic is flowing normally again during each failover.
  • This method works for any upgrade, not just minor version upgrades, so long as the old and new versions are compatible for a single box upgrade.
  • MAD needs to be fully operational. If it isn’t you’ll need to be extremely quick to manually reboot member 2 after member 1 comes back. Not recommended and your IRF is at risk during normal operation if the MAD setup isn’t operational anyway.

Leave a comment

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started