Aruba AOS-CX contains a technology for logically linking a pair of switches so that they exchange specific types of information and states. A typical use case is to have two physical switches act as a single LACP partner (MC-LAG). Config and other info can be kept in sync automatically if desired.
When using this technology the devices need to be running the same (similar?) versions of firmware. There isn’t really a good reason to have them running different versions even if permitted. It is possible to upgrade them independently. Log on to each box and perform the standalone method (e.g. using the web interface to upload code followed by a reboot). However, Aruba provide a mechanism to automatically upgrade both boxes with a single command. This article describes what happens, and contains a warning.
The Process
Issue the following on the primary VSX member
vsx update-software tftp://filestore9.company.com/firmware/CX/ArubaOS-CX_8325_10_10_1000.swi
The switch replies with the following
This command will download new software to the secondary image of both VSX primary and secondary systems, then reboot them in sequence. The VSX secondary will reboot first, followed by primary. Continue (y/n)? y Do you want to save the current configuration (y/n)? y The running configuration is saved to the startup configuration. VSX Primary Software Update Status : Image download started VSX Secondary Software Update Status : Image download started VSX ISL Status : Up Progress [######....................................................................................]
As soon as the switch is showing the progress bar both primary and secondary download the image simultaneously. Expect this to take many minutes. It can be useful/interesting to have the secondary display log messages to provide feedback. Here is an example output with the oldest entry at the bottom.
2022-09-06T17:07:22.057838+01:00 router2b vsx-syncd[5758]: Event|7603|LOG_INFO|AMM|-|Configuration-persistence : Configuration saved to startup-configuration on secondary VSX device. 2022-09-06T17:07:20.287681+01:00 router2b hpe-config[2726774]: Event|6801|LOG_INFO|AMM|-|Copying configs from: running-config to: startup-config 2022-09-06T17:07:20.036668+01:00 router2b vsx-swupdated[1746]: Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from none to image_download_started. 2022-09-06T17:07:20.018778+01:00 router2b vsx-syncd[5758]: Event|7603|LOG_INFO|AMM|-|Configuration-persistence : Configuration saved to startup-configuration on primary VSX device.
One small note is that state change you see in the logs isn’t shown via the show command:
router2b# show vsx status VSX Operational State --------------------- ISL channel : In-Sync ISL mgmt channel : operational Config Sync Status : In-Sync NAE : peer_reachable HTTPS Server : peer_reachable Attribute Local Peer ------------ -------- -------- ISL link lag256 lag256 ISL version 2 2
Since it takes so very long to download via TFTP I found it useful to run tcpdump on the server, so that I at least had the reassurance that packets were being sent to and fro.
As soon as the download is complete the secondary will start gracefully shutting down (e.g. LACP graceful shutdown). reboot (with the boot image automatically pointing to the new one). The example below is from the primary with a CLI command issued at 1709. Therefore a download of about 13 minutes followed by 3 minutes of secondary shutdown. The secondary was alive at 1727.
2022-09-06T17:22:47.946809+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from image_download_started to image_download_complete. 2022-09-06T17:22:56.271178+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from image_download_complete to control_plane_shutdown_initiated. 2022-09-06T17:22:56.271558+0100 lacpd[1675] <INFO> Event|1324|LOG_INFO|AMM|1/1|LACP Graceful Shut is initiated 2022-09-06T17:22:56.273077+0100 hpe-routing[9371] <INFO> Event|2909|LOG_INFO|AMM|1/1|10.192.0.103: User reset request. vrf-name: default 2022-09-06T17:22:56.287282+0100 hpe-routing[9371] <INFO> Event|2902|LOG_INFO|AMM|1/1|10.192.0.103: Peer down. error-code: Cease, error-sub-code: Peer De-configured. vrf-name: default 2022-09-06T17:22:56.292055+0100 hpe-routing[9371] <INFO> Event|2901|LOG_INFO|AMM|1/1|10.192.0.103: Peer up. vrf-name: default 2022-09-06T17:22:56.338176+0100 hpe-vsxd[1670] <INFO> Event|7012|LOG_INFO|AMM|1/1|VSX 9 state local down, remote up 2022-09-06T17:22:56.340856+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ALFNCD 2022-09-06T17:22:56.343713+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ALFN 2022-09-06T17:22:56.584847+0100 lacpd[1675] <INFO> Event|1325|LOG_INFO|AMM|1/1|LACP Graceful Shut is completed 2022-09-06T17:24:18.584925+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ASFO 2022-09-06T17:24:48.586420+0100 lacpd[1675] <INFO> Event|1309|LOG_INFO|AMM|1/1|Partner is detected for interface 1/1/46 LAG 9 : 32768,2c:23:3a:e8:71:b6. Actor state: ALFO, partner state ALFN 2022-09-06T17:24:48.586743+0100 lacpd[1675] <INFO> Event|1321|LOG_INFO|AMM|1/1|LAG 9 State change for interface 1/1/46: Actor state: ALFO, Partner state ALFN 2022-09-06T17:24:56.276332+0100 hpe-routing[9371] <INFO> Event|2909|LOG_INFO|AMM|1/1|10.192.0.103: User reset request. vrf-name: default 2022-09-06T17:24:56.278336+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from control_plane_shutdown_initiated to control_plane_shutdown_completed. 2022-09-06T17:24:56.289401+0100 hpe-routing[9371] <INFO> Event|2902|LOG_INFO|AMM|1/1|10.192.0.103: Peer down. error-code: Cease, error-sub-code: Peer De-configured. vrf-name: default 2022-09-06T17:24:56.293782+0100 hpe-routing[9371] <INFO> Event|2901|LOG_INFO|AMM|1/1|10.192.0.103: Peer up. vrf-name: default 2022-09-06T17:24:59.587821+0100 vsx-swupdated[1746] <INFO> Event|7024|LOG_INFO|AMM|1/1|VSX secondary state changed from control_plane_shutdown_completed to reboot_started. 2022-09-06T17:25:04.686230+0100 intfd[1676] <INFO> Event|404|LOG_INFO|AMM|1/1|Link status for interface 1/1/46 is down - Updating software 2022-09-06T17:25:34.673221+0100 vsx-swupdated[1746] <INFO> Event|7017|LOG_INFO|AMM|1/1|Rebooting the VSX Secondary device with newly updated secondary image. 2022-09-06T17:25:34.675716+0100 hpe-mgmtmd[2055] <INFO> Event|706|LOG_INFO|AMM|1/1|Initiating system reboot Sep 6 17:25:34 hpe-mgmtmd[2731852]: RebootLibPh1: Reboot reason: VSX software update 2022-09-06T17:25:34.683351+0100 hpe-routing[9371] <INFO> Event|2402|LOG_INFO|AMM|1/1|Interface IP addr 10.192.0.57( area ID 0.0.0.0) changed from Loopback to Down, input: IF_INTERFACE_DOWN
After the secondary returns there is a pause of about 4 mins after which the primary reboots
Sep 6 17:31:06 hpe-mgmtmd[4011680]: RebootLibPh1: Reboot reason: VSX software update
A constant ping to each member showed that in total 53 seconds of lost pings to loopback0 (unique IP on each) was observed. Loss of service of traffic through the device would be sub 5 second and is dependant on external factors like OSPF re-convergence, routing timers etc.
Therefore, when this process goes well, a single command can provide a seamless upgrade within 25 minutes.

Warnings
One things to note is that while you can perform standalone upgrades via SFTP (CLI) and HTTPS (web), only TFTP is supported for the VSX upgrade process (as of 10.10). Firstly this isn’t encrypted and the server doesn’t authenticate itself leading to security eyebrow raising.
The second issue with this is that the file transfer takes a very long time. Testing against two TFTP servers on a LAN with sub ms latency the transfer takes about an hour. Upgrading from 10.5 to 10.6 always times out using TFTP. I was forced to evoke the web interface to get the code on the box. Even then the process times out unless you select the 45min timeout option.
On Comware, you get a slow transfer and I think that was because of the file system. However on CX the transfer via a USB stick is very quick (a few seconds at most).
If using the VSX upgrade method check back after 20 minutes to see it is progressing. If there is no error message on the primary CLI then leave alone for up to an hour. If you get timeouts, consider the web or sftp methods and reboot manually.
Leave a Reply