Design a site like this with WordPress.com
Get started

Upgrading Aruba CX VSX Pairs

Aruba AOS-CX contains a technology for logically linking a pair of switches so that they exchange specific types of information and states. A typical use case is to have two physical switches act as a single LACP partner (MC-LAG). Config and other info can be kept in sync automatically if desired.

When using this technology the devices need to be running the same (similar?) versions of firmware. There isn’t really a good reason to have them running different versions even if permitted. It is possible to upgrade them independently. Log on to each box and perform the standalone method (e.g. using the web interface to upload code followed by a reboot). However, Aruba provide a mechanism to automatically upgrade both boxes with a single command. This article describes what happens, and contains a warning.

The Process

Issue the following on the primary VSX member

vsx update-software  tftp://filestore9.company.com/firmware/CX/ArubaOS-CX_8325_10_10_1000.swi 

The switch replies with the following

This command will download new software to the secondary image of both VSX primary and
secondary systems, then reboot them in sequence. The VSX secondary will reboot first,
followed by primary.
Continue (y/n)? y
Do you want to save the current configuration (y/n)? y
The running configuration is saved to the startup configuration.

VSX Primary Software Update Status     : Image download started
VSX Secondary Software Update Status   : Image download started
VSX ISL Status                         : Up
Progress [######....................................................................................]

As soon as the switch is showing the progress bar both primary and secondary download the image simultaneously. Expect this to take many minutes. It can be useful/interesting to have the secondary display log messages to provide feedback. Here is an example output with the oldest entry at the bottom.

2022-09-06T17:07:22.057838+01:00 router2b vsx-syncd[5758]: Event|7603|LOG_INFO|AMM|-|Configuration-persistence : Configuration saved to startup-configuration on secondary VSX device.
2022-09-06T17:07:20.287681+01:00 router2b hpe-config[2726774]: Event|6801|LOG_INFO|AMM|-|Copying configs from: running-config to: startup-config
2022-09-06T17:07:20.036668+01:00 router2b vsx-swupdated[1746]: Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from none to image_download_started.
2022-09-06T17:07:20.018778+01:00 router2b vsx-syncd[5758]: Event|7603|LOG_INFO|AMM|-|Configuration-persistence : Configuration saved to startup-configuration on primary VSX device.

One small note is that state change you see in the logs isn’t shown via the show command:

router2b# show vsx status
VSX Operational State
---------------------
  ISL channel             : In-Sync
  ISL mgmt channel        : operational
  Config Sync Status      : In-Sync
  NAE                     : peer_reachable
  HTTPS Server            : peer_reachable

Attribute           Local               Peer
------------        --------            --------
ISL link            lag256              lag256
ISL version         2                   2

Since it takes so very long to download via TFTP I found it useful to run tcpdump on the server, so that I at least had the reassurance that packets were being sent to and fro.

As soon as the download is complete the secondary will start gracefully shutting down (e.g. LACP graceful shutdown). reboot (with the boot image automatically pointing to the new one). The example below is from the primary with a CLI command issued at 1709. Therefore a download of about 13 minutes followed by 3 minutes of secondary shutdown. The secondary was alive at 1727.

2022-09-06T17:22:47.946809+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from image_download_started to image_download_complete.
2022-09-06T17:22:56.271178+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from image_download_complete to control_plane_shutdown_initiated.
2022-09-06T17:22:56.271558+0100 lacpd[1675] <INFO> Event|1324|LOG_INFO|AMM|1/1|LACP Graceful Shut is initiated
2022-09-06T17:22:56.273077+0100 hpe-routing[9371] <INFO> Event|2909|LOG_INFO|AMM|1/1|10.192.0.103: User reset request. vrf-name: default
2022-09-06T17:22:56.287282+0100 hpe-routing[9371] <INFO> Event|2902|LOG_INFO|AMM|1/1|10.192.0.103: Peer down. error-code: Cease, error-sub-code: Peer De-configured. vrf-name: default
2022-09-06T17:22:56.292055+0100 hpe-routing[9371] <INFO> Event|2901|LOG_INFO|AMM|1/1|10.192.0.103: Peer up. vrf-name: default
2022-09-06T17:22:56.338176+0100 hpe-vsxd[1670] <INFO> Event|7012|LOG_INFO|AMM|1/1|VSX 9 state local down, remote up
2022-09-06T17:22:56.340856+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ALFNCD
2022-09-06T17:22:56.343713+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ALFN
2022-09-06T17:22:56.584847+0100 lacpd[1675] <INFO> Event|1325|LOG_INFO|AMM|1/1|LACP Graceful Shut is completed
2022-09-06T17:24:18.584925+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ASFO
2022-09-06T17:24:48.586420+0100 lacpd[1675] <INFO> Event|1309|LOG_INFO|AMM|1/1|Partner is detected for interface 1/1/46 LAG 9 : 32768,2c:23:3a:e8:71:b6. Actor state: ALFO, partner state ALFN
2022-09-06T17:24:48.586743+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ALFN
2022-09-06T17:24:56.276332+0100 hpe-routing[9371] <INFO> Event|2909|LOG_INFO|AMM|1/1|10.192.0.103: User reset request. vrf-name: default
2022-09-06T17:24:56.278336+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from control_plane_shutdown_initiated to control_plane_shutdown_completed.
2022-09-06T17:24:56.289401+0100 hpe-routing[9371] <INFO> Event|2902|LOG_INFO|AMM|1/1|10.192.0.103: Peer down. error-code: Cease, error-sub-code: Peer De-configured. vrf-name: default
2022-09-06T17:24:56.293782+0100 hpe-routing[9371] <INFO> Event|2901|LOG_INFO|AMM|1/1|10.192.0.103: Peer up. vrf-name: default
2022-09-06T17:24:59.587821+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from control_plane_shutdown_completed to reboot_started.
2022-09-06T17:25:04.686230+0100 intfd[1676] <INFO> Event|404|LOG_INFO|AMM|1/1|Link status for interface 1/1/46 is down - Updating software
2022-09-06T17:25:34.673221+0100 vsx-swupdated[1746] <INFO> Event|7017|LOG_INFO|AMM|1/1|Rebooting the VSX Secondary device with newly updated secondary image.
2022-09-06T17:25:34.675716+0100 hpe-mgmtmd[2055] <INFO> Event|706|LOG_INFO|AMM|1/1|Initiating system reboot
Sep  6 17:25:34 hpe-mgmtmd[2731852]: RebootLibPh1: Reboot reason: VSX software update
2022-09-06T17:25:34.683351+0100 hpe-routing[9371] <INFO> Event|2402|LOG_INFO|AMM|1/1|Interface IP addr 10.192.0.57( area ID 0.0.0.0) changed from Loopback to Down, input: IF_INTERFACE_DOWN




After the secondary returns there is a pause of about 4 mins after which the primary reboots

Sep  6 17:31:06 hpe-mgmtmd[4011680]: RebootLibPh1: Reboot reason: VSX software update

A constant ping to each member showed that in total 53 seconds of lost pings to loopback0 (unique IP on each) was observed. Loss of service of traffic through the device would be sub 5 second and is dependant on external factors like OSPF re-convergence, routing timers etc.

Therefore, when this process goes well, a single command can provide a seamless upgrade within 25 minutes.

Warnings

One things to note is that while you can perform standalone upgrades via SFTP (CLI) and HTTPS (web), only TFTP is supported for the VSX upgrade process (as of 10.10). Firstly this isn’t encrypted and the server doesn’t authenticate itself leading to security eyebrow raising.

The second issue with this is that the file transfer takes a very long time. Testing against two TFTP servers on a LAN with sub ms latency the transfer takes about an hour. Upgrading from 10.5 to 10.6 always times out using TFTP. I was forced to evoke the web interface to get the code on the box. Even then the process times out unless you select the 45min timeout option.

On Comware, you get a slow transfer and I think that was because of the file system. However on CX the transfer via a USB stick is very quick (a few seconds at most).

If using the VSX upgrade method check back after 20 minutes to see it is progressing. If there is no error message on the primary CLI then leave alone for up to an hour. If you get timeouts, consider the web or sftp methods and reboot manually.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑