In this guide, we are going to look at a less common, but still supported and deployed VXLAN EVPN model.

Lets say we have two (or more) Data Centres and we need to connect them at Layer 2, but we don't want to have a layer 2 link running between the DCs or we have an existing Layer 3 connection between them which we want to re-use. This is where this topology and design would come in.

For a new greenfield deployment, you would more commonly see two VXLAN EVPN Fabrics connected with Multi-Site which is a much more involved configuration. I do have a number of guides and videos on how to setup Multi-Site from scratch!

Topology and background information

This is the topology we are going to configure:

The topology showing all switches and their connections

Its worth noting here that this design is L2 only. You could setup anycast gateways and tenant VRFs on the VTEPs if you wanted. However, the end goal here is to be able to communicate between the DCs at Layer 2. This design assumes the DCs already have an existing infrastructure for L3. A common use-case for this is something like vMotion or Live Migration that require a Layer 2 connection.

DC1-ACC and DC2-ACC are standard nexus switches, they do not have any VTEP configuration on them at all. Just a L2 VLAN and a port-channel connecting them up to the DCI switches. The DCI switches are acting as VTEPs in the topology. There are no spines, hence these sometimes being called 'Back to Back' VTEPs. Each DCs VTEPs are in a vPC Domain. This post assumes that you have the vPC already stood up and working. If you don't know what vPC is or how to set it up, I do have some guides on here about it, make a start here. We do want to make sure we use peer-switch under the vpc domain too as these switches are participating in Spanning Tree.

Underlay Routing and IP Addressing

In the underlay for the routing, we are going to use OSPF. This only applies to the DCI switches.

First off, we need loopbacks for all of the switches:

dc1-dci1: 10.0.0.1/32
dc1-dci2: 10.0.0.2/32
dc2-dci1: 10.0.0.3/32
dc2-dci2: 10.0.0.4/32

Lets enable OSPF and configure loopback0, remembering to change the IP for each VTEP:

feature ospf

interface Loopback0
  no shutdown
  ip address 10.0.0.x/32

router ospf UNDERLAY
  log-adjacency-changes

interface Loopback0
  ip router ospf UNDERLAY area 0.0.0.0

For the IP addressing in the interfaces in the underlay, between the DCs, we will use a pair of /30s and between switches in each DC we will use ip unnumbered because its nice and easy. Eth1/1 on each switch goes to the other DC and Eth1/2 is a L3 interconnect between the switches inside the DCs.

dc1-dci1:

interface Ethernet1/1
  no switchport
  mtu 9216
  medium p2p
  ip address 10.99.99.1/30
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

interface Ethernet1/2
  no switchport
  mtu 9216
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

dc1-dci2:

interface Ethernet1/1
  no switchport
  mtu 9216
  medium p2p
  ip address 10.100.100.1/30
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

interface Ethernet1/2
  no switchport
  mtu 9216
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

dc2-dci1:

interface Ethernet1/1
  no switchport
  mtu 9216
  medium p2p
  ip address 10.99.99.2/30
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

interface Ethernet1/2
  no switchport
  mtu 9216
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

dc2-dci2:

interface Ethernet1/1
  no switchport
  mtu 9216
  medium p2p
  ip address 10.100.100.2/30
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

interface Ethernet1/2
  no switchport
  mtu 9216
  medium p2p
  ip unnumbered loopback0
  ip router ospf UNDERLAY area 0.0.0.0
  no shutdown

Multicast

In order to send BUM (Broadcast, Unknown Unicast and Multicast) traffic between the DCs, we can either use ingress replication, or Multicast. In this topology we are going to use Multicast Anycast RP where each of the DCI VTEPs are acting as an RP for resilience.

Its the exact same configuration on all the DCI switches:

feature pim

ip pim rp-address 10.0.0.99 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
ip pim anycast-rp 10.0.0.99 10.0.0.1
ip pim anycast-rp 10.0.0.99 10.0.0.2
ip pim anycast-rp 10.0.0.99 10.0.0.3
ip pim anycast-rp 10.0.0.99 10.0.0.4

interface loopback1
  ip address 10.0.0.99/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

interface loopback0
  ip pim sparse-mode

int eth1/1-2
  ip pim sparse-mode

There is no configuration required on the ACC switches for Multicast.

We can verify the RP configuration on a VTEP:

dc2-dci1# show ip pim rp
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None

Anycast-RP 10.0.0.99 members:
  10.0.0.1  10.0.0.2  10.0.0.3*  10.0.0.4  

RP: 10.0.0.99*, (0), 
 uptime: 02:28:56   priority: 255, 
 RP-source: (local),  
 group ranges:
 224.0.0.0/4   

NVE Configuration

Now we need to setup the VTEP (NVE) interface on each of the DCI switches. Firstly, we need to create another loopback (loopback2) to source this from.

Due to this being a vPC enabled environment, there is a secondary Anycast VTEP address added to the interface which is the same on each vPC domain:

dc1-dci1: 10.0.1.1/32
dc1-dci2: 10.0.1.2/32
dc2-dci1: 10.0.1.3/32
dc2-dci2: 10.0.1.4/32

And then each vPC pair needs a secondary IP address on the loopback2 interface which is common:

vpc domain 100 (dc1): 10.0.1.101/32
vpc domain 200 (dc2): 10.0.1.102/32

For example on dc1-dci1:

interface loopback2
  ip address 10.0.1.1/32
  ip address 10.0.1.101/32 secondary
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

and dc1-dci2:

interface loopback2
  ip address 10.0.1.2/32
  ip address 10.0.1.101/32 secondary
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

The NVE Interface configuration for all DCI switches:

feature fabric forwarding
feature vn-segment-vlan-based
feature nv overlay

interface nve1
  no shutdown
  host-reachability protocol bgp
  advertise virtual-rmac
  source-interface loopback2

advertise virtual-rmac is another vPC specific command to advertise the virtual mac for the vPC Pair.

BGP Overlay Routing

For the overlay routing and for transmission of type-2 EVPN routes, we use BGP. There are no route reflectors in this topology, so we will have a full mesh of peerings, which is only 3 per switch, so its not that bad to manage.

This is the configuration for dc1-dci1:

feature bgp

router bgp 64500
  log-neighbor-changes
  address-family ipv4 unicast
  address-family l2vpn evpn
    advertise-pip
  template peer VTEP
    remote-as 64500
    update-source loopback0
    address-family ipv4 unicast
      send-community extended
      soft-reconfiguration inbound
    address-family l2vpn evpn
      send-community
      send-community extended
  neighbor 10.0.0.2
    inherit peer VTEP
  neighbor 10.0.0.3
    inherit peer VTEP
  neighbor 10.0.0.4
    inherit peer VTEP

On the other DCI switches, we just need to swap the neighbors. For example, dc2-dci1 will have 10.0.0.1, 10.0.0.2 and 10.0.0.4. In order to form a mesh of connections:

dc1-dci1# show bgp l2vpn evpn summary 
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 10.0.0.1, local AS number 64500
BGP table version is 22, L2VPN EVPN config peers 3, capable peers 3
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS    MsgRcvd    MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.0.0.2        4 64500        170        170        0    0    0 00:00:05 0         
10.0.0.3        4 64500        171        166        0    0    0 00:00:05 0         
10.0.0.4        4 64500        168        166        0    0    0 00:00:04 0         

Neighbor        T    AS PfxRcd     Type-2     Type-3     Type-4     Type-5    
10.0.0.2        I 64500 0          0          0          0          0         
10.0.0.3        I 64500 0          0          0          0          0         
10.0.0.4        I 64500 0          0          0          0          0         

L2VNI Configuration

Last thing we need to do is create the Vlan and all the VNI mappings for it. Again, this config is just for the DCI switches:

vlan 100
  vn-segment 100100

interface nve1
  member vni 100100
    mcast-group 224.1.1.192

evpn
  vni 100100 l2
    rd auto
    route-target import auto
    route-target export auto

Verification

And with that, the VTEPs are configured. From each VTEP we have port-channel2 which goes to the ACC switch in each DC:

interface port-channel2
  switchport mode trunk
  vpc 2

This hs just a standard L2 trunk as the ACC switches have no knowledge of VXLAN or any fabric services. They just have access ports connected to the end hosts:

interface Ethernet1/3
  switchport access vlan 100

From the perspective of the hosts, lets see if they can ping one another:

Ping working between server in dc1 to dc2

Perfect, the first ping took a bit longer, this was because of ARP having to transit over the DCI links. Which we can see if we do a capture:

ARP request packet seen over the DCI

The IP destination of this was 224.1.1.192 which we know is the multicast group for the L2VNI.

And we see the reply come as a direct unicast now the VTEPs are known:

ARP reply packet seen over the DCI

We can also check the l2route tables and check we have entries in there for the local and remote endpoints:

dc1-dci1# show l2route evpn mac evi 100 detail 

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote 
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Asy):Asymmetric (Gw):Gateway
(Bh):Blackhole
(Pf):Permanently-Frozen, (Orp): Orphan

(PipOrp): Directly connected Orphan to PIP based vPC BGW 
(PipPeerOrp): Orphan connected to peer of PIP based vPC BGW 
Topology    Mac Address    Prod   Flags              Seq No     Next-Hops                              
----------- -------------- ------ ------------------- ---------- ---------------------------------------------------------
100         0050.0000.0700 BGP    Rcv                0          10.0.1.102 (Label: 100100)                               
            Route Resolution Type: Regular
            Forwarding State: Resolved (PeerID: 1)
            Sent To: L2FM
            SOO: 808333361      
            Encap: 1           

100         0050.0000.0900 Local  L,                 0          Po2                                                      
            Route Resolution Type: Regular
            Forwarding State: Resolved
            Sent To: BGP
            SOO: 808333361                                                         

That looks good! We don't see IP addresses in this topology because we don't have any Layer3 services on the VTEPs like anycast gateways. We also are not able to make use of arp suppression here either as Cisco don't support it without Layer3 services.

Last but not least, we can check multicast to make sure that looks good:

dc1-dci1# show ip pim neighbor 
PIM Neighbor Status for VRF "default"
Neighbor        Interface            Uptime    Expires   DR       Bidir-  BFD    ECMP Redirect
                                                         Priority Capable State     Capable
10.99.99.2      Ethernet1/1          02:51:38  00:01:22  1        yes     n/a     no
10.0.0.2        Ethernet1/2          02:50:45  00:01:43  1        yes     n/a     no
dc1-dci1# show ip mroute 
IP Multicast Routing Table for VRF "default"

(*, 224.1.1.192/32), uptime: 02:53:41, nve ip pim 
  Incoming interface: loopback1, RPF nbr: 10.0.0.99
  Outgoing interface list: (count: 1)
    nve1, uptime: 02:53:41, nve

(10.0.1.101/32, 224.1.1.192/32), uptime: 02:53:41, nve mrib ip pim 
  Incoming interface: loopback2, RPF nbr: 10.0.1.101, internal
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 02:47:33, pim

(10.0.1.102/32, 224.1.1.192/32), uptime: 02:46:20, pim mrib ip 
  Incoming interface: Ethernet1/1, RPF nbr: 10.99.99.2, internal
  Outgoing interface list: (count: 1)
    nve1, uptime: 02:46:20, mrib

(*, 232.0.0.0/8), uptime: 02:53:42, pim ip 
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)

From that output, we can see that this VTEP has registered two (S,G) entries for its own VTEP IP and the IP of the remote VTEP. The outgoing interfaces tell us where frames will be forwarded if they come in and match the criteria. We also see a couple of neighbors for pim which is the local DCI switch and the directly connected inter-dc switch.

Finally, its worth mentioning about failover. In a leaf and spine architecture, its not common to have a Layer 3 link between the switches. You may have one for the peer-keepalive, but not like we have in this topology with Eth1/2. Normally, when configuring vPC VXLAN, you need to setup a backup peering in the underlay and use nve infra-vlans. However, when you have a dedicated Layer3 link like we do in this topology, this is no required because we are not using an SVI or the peer-link to form a neighborship. Eth1/2 is a direct connection separate from the vPC setup. Due to this, we also don't require the layer3 peer-router command under the vPC domain.

Thats it for this setup, nice and easy!

Accompanying YouTube video:

Cisco VXLAN EVPN - Connect Legacy DCs together at L2

Full configurations:

Categories: CiscoVXLAN

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *