Nexus 3K vPC redundancy tests

Introduction

Lately I got four Nexus 3064PQ switches (http://www.cisco.com/c/en/us/products/collateral/switches/nexus-3000-series-switches/white_paper_c11-661242.html)

I wanted to perform some redundancy tests.

So I just wanted to find out what happens if vPC peer-keepalive link is down or vPC peer link is down or one or more switches are reloaded.

Here is our setup:

  • Switch0 – Nexus 3K (software ver. 6.0(2)U2(2))
  • Switch1 – Nexus 3K (software ver. 5.0(3)U3(2a))
  • Switch2 – Nexus 3K (software ver. 5.0(3)U3(2a))
  • Switch3 – Nexus 3K (software ver. 5.0(3)U3(2a))
  • Switch4 – Catalyst 2960S
  • Switch5 – Catalyst 2960S

And the picture below show the setup:

In this setup switches 4 and 5 are only used to test connectivity between them, each has its SVI (Switch4: 1.1.1.1 and Switch5: 1.1.12). To test redundancy I was pinging Switch5 from Switch4 and vice versa.

Configuration

Before we go to configuration, I’ve always been wondering if two vPC peer devices have to have the same NX-OS version running in order to form vPC adjacency.

Switch0 is running on NX-OS version 6.0(2)U2(2) and Switch1 on 5.0(3)U3(2a):

Switch0# sh ver | i system
  system:    version 6.0(2)U2(2)

Switch1# sh ver | i system
  system:    version 5.0(3)U3(2a)

Switch2 and 3 run on 5.0(3)U3(2a). It happens that vPC adjacency between Switch0 and 1 is established, but if I reload one of them during redundancy tests there will be connection loss. So even if sh vpc command shows that everything is OK, it is not!

vPC domain

First let’s configure vPC domain on all four switches.

Switch0# sh run vpc

vpc domain 8
role priority 10
system-priority 10
delay restore 120
peer-gateway
auto-recovery

Switch1# sh run vpc

vpc domain 8
role priority 10
system-priority 10
delay restore 120
peer-gateway
auto-recovery

Switch2# sh run vpc

vpc domain 9
role priority 10
system-priority 10
delay restore 120
peer-gateway
auto-recovery

Switch3# sh run vpc

vpc domain 9
role priority 10
system-priority 10
delay restore 120
peer-gateway
auto-recovery

vPC peer-keepalive

While configuring vPC domain on all switches, you’ve noticed that peer-keepalive command is missing. Before doing it, I was thinking how should we deal with it.

Should we use mgmt0 interface for it or some other physical interface or maybe SVI or to put normal port in routed mode (no switchport)? I did some testing using SVI for peer-keepalive communication, and there are some caveats:

– SVI should be excluded from vPC domain configuration (dual-active exclude interface-vlan), otherwise it will be suspended if something goes wrong

– If one vPC switch member stops working (reload, power outage, etc…) and comes back again, the SVI will be UP, yet peer-keepalive will be down. I’m not sure what might be the reason for that, but I had to issue shut/no shut on SVI to make it work again (version 5.0(3)U3(2a)). As it happened after I upgraded all switches to version 6.0(2)U2(2) problem disappeared.

I decided to configure E1/13 interface on each switch in non-routed mode and to use them for peer-keepalive link.

Now on each switch we’ll add peer-keepalive destination source command.

vPC domain 8

Switch0(config-vpc-domain)#peer-keepalive destination 10.0.0.2 source 10.0.0.1

Switch1(config-vpc-domain)#peer-keepalive destination 10.0.0.1 source 10.0.0.2

vPC domain 9

Switch2(config-vpc-domain)#peer-keepalive destination 10.0.1.2 source 10.0.1.1

Switch3(config-vpc-domain)#peer-keepalive destination 10.0.1.1 source 10.0.1.2

Note: if you use SVIs to bring peer-keepalive link UP, don’t forget to exclude the VLAN interface from vPC configuration, otherwise vPC might not come UP after reload.

vPC domain 8

Switch0(config-vpc-domain)#dual-active exclude interface-vlan 8

Switch1(config-vpc-domain)#dual-active exclude interface-vlan 8

vPC domain 9

Switch2(config-vpc-domain)#dual-active exclude interface-vlan 9

Switch3(config-vpc-domain)#dual-active exclude interface-vlan 9

vPC peer-link

Before vPC peers form adjacency, we’ll configure port-channel as vpc peer-link:

The configuration is same for all 4 Nexus switches (Switch0, Switch1, Switch2 and Switch3):

interface port-channel1
  switchport mode trunk
  spanning-tree port type network
  no negotiate auto
  vpc peer-link

For peer-link, we’ll use two ports on each switch:

interface Ethernet1/47
  switchport mode trunk
  channel-group 1 mode active

interface Ethernet1/48
  switchport mode trunk
  channel-group 1 mode active

vPC status

After we have configured vPC peer-keepalive and vPC peer-link let’s check the status of the vPC:

Switch0# sh vpc

Legend:
                (*) – local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 8   
Peer status                       : peer adjacency formed ok      
vPC keep-alive status             : peer is alive                 
Configuration consistency status  : success
Per-vlan consistency status       : success                       
Type-2 consistency status         : success
vPC role                          : primary                       
Number of vPCs configured         : 2   
Peer Gateway                      : Enabled
Peer gateway excluded VLANs     : –
Dual-active excluded VLANs        : –
Graceful Consistency Check        : Enabled
Auto-recovery status              : Enabled (timeout = 240 seconds)

vPC Peer-link status
———————————————————————
id   Port   Status Active vlans    
—   —-   —— ————————————————–
1    Po1    up     1-2,10                                                    

vPC status
—————————————————————————-
id     Port        Status Consistency Reason                     Active vlans
—— ———– —— ———– ————————– ———–
10     Po10        up     success     success                    10          
11     Po11        up     success     success                    10     

As we can see vPC adjacency came UP. Even if Switch0 and Switch1 have different NX-OS versions running.

Note: after I tested redundancy between switches 4 and 5, it looks like that using different major NX-OS versions will not work when it comes to vPC between end hosts. So mixing different NX-OS versions is not a good idea.

Redundancy tests

After we configured each pair of N3K switches to work with vPC setup, now it’s time to perform redundancy tests:

  1. Reload Switch0, Switch1, Switch2, Switch3 one at a time.
  2. Reload Switch0, Switch2 both at the same time
  3. vPC keepalive link between Switch0 and Switch1 is disconnected, vPC peer link is OK
  4. vPC keepalive link between Switch2 and Switch3 is disconnected, vPC peer link is OK
  5. vPC peer link between Switch0 and Switch1 is disconnected, vPC keepalive link is OK
  6. vPC peer link between Switch2 and Switch3 is disconnected, vPC keepalive link is OK
  7. both vPC keepalive and peer link between Switch0 and Switch1 are disconnected
  8. both vPC keepalive and peer link between Switch2 and Switch3 are disconnected
  9. Shutting down uplinks on Switch4 and Switch5

1. Reload Switch0, Switch1, Switch2, Switch3 one at a time

In this test we’ll reload each switch at a time, one by one and we’ll wait that the reloaded switch becomes fully operational before we move to the next one.

The test was successful, no connection loss between switches 4 and 5.

2. Reload Switch0, Switch3 both at the same time

In this test we’ll reload one switch at the same time in each vPC pair, which means Switch0 and Switch3.

Redundancy2

This test was also successful, with no connection loss between switches 4 and 5. Switches 1 and 2 have taken over the forwarding, as expected.

3. vPC keepalive link between Switch0 and Switch1 is disconnected, vPC peer link is OK

Now let’s see what happens if the peer keepalive link is down.

Redundancy3

Is the connectivity between switches 4 and 5 still there? Yes, it is, as peer keepalive link is there just to resolve “split brain” situations, when the peer link goes down. Of course, it is highly recommended to fix peer keepalive link to avoid even worse situations.

4. vPC keepalive link between Switch2 and Switch3 is disconnected, vPC peer link is OK

The same behavior as in step 3.

Redundancy4

5. vPC peer link between Switch0 and Switch1 is disconnected, vPC keepalive link is OK

Now it becomes interesting. When I have shut down the peer link between switches 0 and 1, Switch 1 suspended its vPC ports. See the picture below:

Redundancy5

The connectivity between switches 4 and 5 is still OK, since the traffic goes through Switch0.

6. vPC peer link between Switch2 and Switch3 is disconnected, vPC keepalive link is OK

The same situation as in step 5.

Redundancy6

 7. both vPC keepalive and peer link between Switch0 and Switch1 are disconnected

This is even more interesting situation than previous one.

Redundancy7

Because of the “auto-recovery” feature, we have now the situation that both Switch0 and Switch1 decided not to suspend their vPC ports. Although, the switches 4 and 5 will be able to re-establish connectivity, the short outage is inevitable.

8. both vPC keepalive and peer link between Switch2 and Switch3 are disconnected

Here’s the same situation as in step 7.

Redundancy8

9. Shutting down uplinks on Switch4 and Switch5

This scenario case is the most harmless one in this series of redundancy tests, but I wanted it anyway.

Redundancy9

I was just doing “shut” “no shut” on port-channel member ports on each switch 4 and 5, the result was as expected – no connectivity loss.

Conclusion

vPC was behaving as expected, although in some situations above there were some connectivity issues when I was testing the redundancy with NX-OS version 5.0(3)U3(2a). As soon as I upgraded to 6.0(2)U2(2), the issues magically disappeared.

Even if there is not needed to pay any attention to STP in this setup, you should define some bridge priorities, just to keep things behaving as expected.