When you peer with a Service Provider using BGP. They will likely ask you if you want the full BGP table, a partial table or just a default route. Most devices you have on your companies edge, will not be able to handle the full BGP table. As even just the IPv4 tale contains nearly 1M routes!
There is a great resource on Twitter where you can see the values and fluctuations on this:
IPv4: BGP4-Table (@bgp4_table) / Twitter
IPv6: BGP6-Table (@bgp6_table) / Twitter
So, you can get the service provider to send you a partial table. This can be beneficial if you are multi-homed and you would like to send traffic to certain destinations via a specific route. We can see an example with the below topology:
We can see there is a multi-homed customer (CUSTA) with two routers connected to different PE routers. Lets say these devices cannot handle the full BGP table due to memory constraints. We can also see there are routers simulating Google and Microsoft connected to this service provider too. If we just asked for a default route, it would point all traffic out to the PEs without knowing the destination address. However, if we got a partial BGP table that contained the specific prefixes the GOOGLE-1 router was advertising, we can do out own traffic engineering to make sure that traffic to the routes they advertise go to PE2 via CE_CUSTA_2. This means that the traffic has a shorter path than it would if the traffic were to egress from CE_CUSTA_1 and to to PE1.
CE_CUSTA_1 -> PE1 -> P2 -> P4 -> PE2 -> GOOGLE-1
CE_CUSTA_2 -> PE2 -> GOOGLE-1
So, lets see how we can configure the topology to give the customer a partial BGP table by tagging the routes with BGP communities.
Firstly, we need to understand the topology.
The Service Provider Topology
We start up on P1. This is a BGP Route Reflector for IPv4. We can see its configuration below:
P1#sh run | sect ospf
router ospf 1
router-id 1.1.1.1
network 1.1.1.1 0.0.0.0 area 0
network 80.1.0.0 0.0.255.255 area 0
mpls ldp autoconfig
P1#sh run | sect bgp
router bgp 65991
bgp router-id 1.1.1.1
bgp log-neighbor-changes
neighbor 5.5.5.5 remote-as 65991
neighbor 5.5.5.5 update-source Loopback0
neighbor 6.6.6.6 remote-as 65991
neighbor 6.6.6.6 update-source Loopback0
neighbor 7.7.7.7 remote-as 65991
neighbor 7.7.7.7 update-source Loopback0
!
address-family ipv4
neighbor 5.5.5.5 activate
neighbor 5.5.5.5 send-community both
neighbor 5.5.5.5 route-reflector-client
neighbor 6.6.6.6 activate
neighbor 6.6.6.6 send-community both
neighbor 6.6.6.6 route-reflector-client
neighbor 7.7.7.7 activate
neighbor 7.7.7.7 send-community both
neighbor 7.7.7.7 route-reflector-client
exit-address-family
P1#sh ip int br | exc administratively
Interface IP-Address OK? Method Status Protocol
GigabitEthernet0/0 80.1.10.1 YES NVRAM up up
GigabitEthernet1/0 80.1.11.1 YES NVRAM up up
Loopback0 1.1.1.1 YES NVRAM up up
We are also using OSPF as the IGP within the Service Provider network. MPLS is also enabled globally across the network on all but the CE routers.
Next we look at P2, P3 and P4. None of these routers are running BGP. As this service provider is using a BGP free core. They are however, running OSPF. All connected interfaces and the Loopback0 interfaces are in area 0.
Lets now look at the 3 Provider Edge routers. These are running BGP with a peering to P1 and the Customers connected to the PE.
PE1:
PE1#sh run | sect ospf
router ospf 1
router-id 5.5.5.5
passive-interface default
no passive-interface GigabitEthernet1/0
network 5.5.5.5 0.0.0.0 area 0
network 80.1.14.0 0.0.0.3 area 0
network 80.1.17.0 0.0.0.7 area 0
mpls ldp autoconfig
PE1#sh run | sect bgp
router bgp 65991
bgp router-id 5.5.5.5
bgp log-neighbor-changes
no bgp default ipv4-unicast
! P1 iBGP Peering
neighbor 1.1.1.1 remote-as 65991
neighbor 1.1.1.1 update-source Loopback0
! CE_CUSTA_1 eBGP Peering
neighbor 80.1.17.2 remote-as 65001
!
address-family ipv4
neighbor 1.1.1.1 activate
neighbor 1.1.1.1 send-community both
neighbor 1.1.1.1 next-hop-self
neighbor 80.1.17.2 activate
neighbor 80.1.17.2 default-originate
exit-address-family
PE1#sh ip int br | exc administratively
Interface IP-Address OK? Method Status Protocol
GigabitEthernet0/0 80.1.17.1 YES NVRAM up up
GigabitEthernet1/0 80.1.14.2 YES NVRAM up up
Loopback0 5.5.5.5 YES NVRAM up up
Loopback1 unassigned YES unset up up
PE2:
PE2#sh run | sect ospf
router ospf 1
router-id 6.6.6.6
passive-interface default
no passive-interface GigabitEthernet1/0
network 6.6.6.6 0.0.0.0 area 0
network 80.1.16.0 0.0.0.3 area 0
network 80.1.19.0 0.0.0.7 area 0
network 80.1.20.0 0.0.0.7 area 0
mpls ldp autoconfig
PE2#sh run | sect bgp
router bgp 65991
bgp router-id 6.6.6.6
bgp log-neighbor-changes
no bgp default ipv4-unicast
! P1 iBGP Peering
neighbor 1.1.1.1 remote-as 65991
neighbor 1.1.1.1 update-source Loopback0
! CE_CUSTA_2 eBGP Peering
neighbor 80.1.19.2 remote-as 65001
! GOOGLE-1 eBGP Peering
neighbor 80.1.20.2 remote-as 65003
!
address-family ipv4
neighbor 1.1.1.1 activate
neighbor 1.1.1.1 send-community both
neighbor 1.1.1.1 next-hop-self
neighbor 80.1.19.2 activate
neighbor 80.1.19.2 default-originate
neighbor 80.1.20.2 activate
neighbor 80.1.20.2 default-originate
exit-address-family
PE2#sh ip int br | exc administratively
Interface IP-Address OK? Method Status Protocol
GigabitEthernet0/0 80.1.19.1 YES NVRAM up up
GigabitEthernet1/0 80.1.16.2 YES NVRAM up up
GigabitEthernet2/0 80.1.20.1 YES NVRAM up up
Loopback0 6.6.6.6 YES NVRAM up up
Loopback1 unassigned YES unset up up
PE3:
PE3#sh run | sect ospf
router ospf 1
router-id 7.7.7.7
passive-interface default
no passive-interface GigabitEthernet0/0
network 7.7.7.7 0.0.0.0 area 0
network 80.1.15.0 0.0.0.3 area 0
network 80.1.18.0 0.0.0.7 area 0
mpls ldp autoconfig
PE3#sh run | sect bgp
router bgp 65991
bgp router-id 7.7.7.7
bgp log-neighbor-changes
no bgp default ipv4-unicast
! P1 iBGP Peering
neighbor 1.1.1.1 remote-as 65991
neighbor 1.1.1.1 update-source Loopback0
! MICROSOFT-1 eBGP Peering
neighbor 80.1.18.2 remote-as 65004
!
address-family ipv4
neighbor 1.1.1.1 activate
neighbor 1.1.1.1 next-hop-self
neighbor 80.1.18.2 activate
neighbor 80.1.18.2 default-originate
exit-address-family
PE3#sh ip int br | exc administratively
Interface IP-Address OK? Method Status Protocol
GigabitEthernet0/0 80.1.15.2 YES NVRAM up up
GigabitEthernet1/0 80.1.18.1 YES NVRAM up up
GigabitEthernet2/0 unassigned YES NVRAM up up
Loopback0 7.7.7.7 YES NVRAM up up
Something else to note on all the PE devices is that 'next-hop-self' is turned on for the peering with P1. Meaning that the PE routers will swap out the next hop from being the CE router IP address to being the PE loopback.
The CE routers are configured to peer with the PE devices. They are running NAT between their Loopback0 interface and G0/0. The loopbacks will become apparent later on in the guide.
Now, lets check and see what is being advertised by the CE devices. Lets have a look on the Route Reflector:
P1#sh bgp ipv4 unicast
BGP table version is 8, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*>i 8.8.4.0/24 6.6.6.6 0 100 0 65003 i
*>i 8.8.8.0/24 6.6.6.6 0 100 0 65003 i
* i 90.1.0.0/16 6.6.6.6 0 100 0 65001 i
*>i 5.5.5.5 0 100 0 65001 i
*>i 100.1.1.0/24 7.7.7.7 0 100 0 65004 i
*>i 101.1.1.0/24 7.7.7.7 0 100 0 65004 i
The routes are being advertised as per the below:
8.8.4.0/24 - GOOGLE-1
8.8.8.0/24 - GOOGLE-1
90.1.0.0/16 - CE_CUSTA_1 and CE_CUSTA_2
100.1.1.0/24 - MICROSOFT-1
101.1.1.0/24 - MICROSOFT-1
So, lets say that we only want the routes from GOOGLE to be in our partial BGP table. How would we achieve this?
We can either tag the routes with a community value to mark them as being from Google, or use prefix lists to match on the exact routes we would like to sent to the customer.
In this guide, I will show tagging the routes with a BGP community.
Lets first check that we have connectivity with the GOOGLE router from both of Customer A routers.
CE_CUSTA_1#ping 8.8.8.8 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 64/68/76 ms
CE_CUSTA_1#ping 8.8.4.4 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 64/72/80 ms
CE_CUSTA_2#ping 8.8.8.8 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 10.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/40/76 ms
CE_CUSTA_2#ping 8.8.4.4 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds:
Packet sent with a source address of 10.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/21/24 ms
As we can see, both routers have connectivity from their Loopback0 interfaces. As in the current configuration, all the CE routers know about all the prefixes within BGP because we aren't doing any filtering.
Lets first create a route-map to set the BGP community on routes:
PE1(config)#route-map RM_TAG_GOOGLE permit 10
PE1(config-route-map)#set community 65003:1
PE1(config-route-map)#exit
PE2(config)#route-map RM_TAG_GOOGLE permit 10
PE2(config-route-map)#set community 65003:1
PE2(config-route-map)#exit
The route maps don't include any match statements because we are going to match all the routes coming from GOOGLE-1. The above route-maps will set the community value on all the routes that pass through it.
Now we need to configure the route-map for the BGP neighbor for GOOGLE-1 on PE2:
PE2(config)#router bgp 65991
PE2(config-router)#address-family ipv4 unicast
PE2(config-router-af)#neighbor 80.1.20.2 route-map RM_TAG_GOOGLE in
We are setting this in the inbound direction to catch the routes coming from the GOOGLE-1 router.
After clearing the BGP sessions to speed up the route updates. We can look and see the routes in the BGP table on PE2:
PE2#sh bgp ipv4 unicast
BGP table version is 16, local router ID is 6.6.6.6
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
0.0.0.0 0.0.0.0 0 i
*> 8.8.4.0/24 80.1.20.2 0 0 65003 i
*> 8.8.8.0/24 80.1.20.2 0 0 65003 i
* i 90.1.0.0/16 5.5.5.5 0 100 0 65001 i
*> 80.1.19.2 0 0 65001 i
*>i 100.1.1.0/24 7.7.7.7 0 100 0 65004 i
*>i 101.1.1.0/24 7.7.7.7 0 100 0 65004 i
We cant see anything different here, lets dig into a GOOGLE prefix:
PE2#sh bgp ipv4 unicast 8.8.8.0/24
BGP routing table entry for 8.8.8.0/24, version 16
Paths: (1 available, best #1, table default)
Advertised to update-groups:
2 3 6
Refresh Epoch 1
65003
80.1.20.2 from 80.1.20.2 (10.3.3.3)
Origin IGP, metric 0, localpref 100, valid, external, best
Community: 4260036609
rx pathid: 0, tx pathid: 0x0
From the above output, we can see that the 8.8.8.0/24 route from GOOGLE has been tagged with the community value. Notice this is different from the one we configured (65003:1) . You can see the decimal version above. To change this behavior, you can enter the below command into Global Configuration mode:
ip bgp-community new-format
We can now see that entering that command has changed the output:
PE2#sh bgp ipv4 unicast 8.8.8.0/24
BGP routing table entry for 8.8.8.0/24, version 16
Paths: (1 available, best #1, table default)
Advertised to update-groups:
2 3 6
Refresh Epoch 1
65003
80.1.20.2 from 80.1.20.2 (10.3.3.3)
Origin IGP, metric 0, localpref 100, valid, external, best
Community: 65003:1
rx pathid: 0, tx pathid: 0x0
Now we have tagged the routes on PE2. We need to create another route-map to catch the routes tagged with that community value, and manipulate the metric. We are setting a metric so that the CE devices prefer the route coming from PE2:
Note: The lower the metric the more preferred the route.
PE1(config)#ip community-list standard GOOGLE permit 65003:1
PE1(config)#route-map RM_IMPORT_GOOGLE permit 10
PE1(config-route-map)#match community GOOGLE
PE1(config-route-map)#set metric 100
PE1(config-route-map)#exit
PE2(config)#ip community-list standard GOOGLE permit 65003:1
PE2(config)#route-map RM_IMPORT_GOOGLE permit 10
PE2(config-route-map)#match community GOOGLE
PE2(config-route-map)#set metric 50
PE2(config-route-map)#exit
These will match on the community value and all other routes will be denied. Note that here we don't need to specify the default route to be allowed through.
Lets give BGP a couple of minutes to work out what has just happened and now check the BGP table on both CE routers.
CE_CUSTA_1:
CE_CUSTA_1#sh bgp ipv4 unicast
BGP table version is 17, local router ID is 80.1.17.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* i 0.0.0.0 10.1.19.2 0 100 0 65991 i
*> 80.1.17.1 0 65991 i
* 8.8.4.0/24 80.1.17.1 100 0 65991 65003 i
*>i 10.1.19.2 50 100 0 65991 65003 i
* 8.8.8.0/24 80.1.17.1 100 0 65991 65003 i
*>i 10.1.19.2 50 100 0 65991 65003 i
*> 90.1.0.0/16 0.0.0.0 0 32768 i
* i 10.1.19.2 0 100 0 i
Now we see something different on CE_CUSTA_1. We don't see any of the Microsoft prefixes as these have been filtered out, and we also see metrics on the Google prefixes. Looking specifically at the Google prefixes, we can see that the iBGP routes advertised by CE_CUSTA_2 have been chosen as best. We can tell this by looking at the 'Next Hop'.
Lets also have a quick look on CE_CUSTA_2:
CE_CUSTA_2#sh bgp ipv4 unicast
BGP table version is 38, local router ID is 80.1.19.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0 80.1.19.1 0 65991 i
* i 80.1.17.1 0 100 0 65991 i
*> 8.8.4.0/24 80.1.19.1 50 0 65991 65003 i
*> 8.8.8.0/24 80.1.19.1 50 0 65991 65003 i
* i 90.1.0.0/16 10.1.19.1 0 100 0 i
*> 0.0.0.0 0 32768 i
On CE_CUSTA_2, we can see a lot less routes. This is because BGP only advertises the best path. So the 8.8.4.0/24 and 8.8.8.0/24 prefixes were withdrawn afterwards:
Now we can test connectivity once again to make sure that we haven't broken anything in the path:
CE_CUSTA_1#ping 8.8.8.8 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/47/72 ms
CE_CUSTA_1#ping 8.8.4.4 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 48/64/80 ms
CE_CUSTA_1#trace 8.8.8.8 source loop0
Type escape sequence to abort.
Tracing the route to 8.8.8.8
VRF info: (vrf in name/id, vrf out name/id)
1 10.1.19.2 [AS 65991] 20 msec 20 msec 20 msec
2 80.1.19.1 [AS 65991] 40 msec 24 msec 40 msec
3 80.1.20.2 [AS 65991] 60 msec 48 msec 64 msec
CE_CUSTA_1#trace 8.8.4.4 source loop0
Type escape sequence to abort.
Tracing the route to 8.8.4.4
VRF info: (vrf in name/id, vrf out name/id)
1 10.1.19.2 [AS 65991] 16 msec 4 msec 40 msec
2 80.1.19.1 [AS 65991] 32 msec 28 msec 32 msec
3 80.1.20.2 [AS 65991] 48 msec 48 msec 40 msec
CE_CUSTA_2#ping 8.8.8.8 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 10.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/48/100 ms
CE_CUSTA_2#ping 8.8.4.4 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds:
Packet sent with a source address of 10.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/30/52 ms
CE_CUSTA_2#trace 8.8.8.8 source loop0
Type escape sequence to abort.
Tracing the route to 8.8.8.8
VRF info: (vrf in name/id, vrf out name/id)
1 80.1.19.1 [AS 65991] 8 msec 4 msec 32 msec
2 80.1.20.2 [AS 65991] 36 msec 32 msec 28 msec
CE_CUSTA_2#trace 8.8.4.4 source loop0
Type escape sequence to abort.
Tracing the route to 8.8.4.4
VRF info: (vrf in name/id, vrf out name/id)
1 80.1.19.1 [AS 65991] 44 msec 20 msec 20 msec
2 80.1.20.2 [AS 65991] 32 msec 32 msec 32 msec
We still have full connectivity and as we can see by the traceroute output on CE_CUSTA_1, the traffic to the Google prefixes is going via CE_CUSTA_2 and then to PE2. This is exactly what we wanted.
Lets also check that CE_CUSTA_1 can still access the Microsoft prefixes using the default route and verify the path:
CE_CUSTA_1#ping 101.1.1.1 source loop0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 101.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 64/92/156 ms
CE_CUSTA_1#trace 101.1.1.1 source loop0
Type escape sequence to abort.
Tracing the route to 101.1.1.1
VRF info: (vrf in name/id, vrf out name/id)
1 80.1.17.1 [AS 65991] 20 msec 24 msec 20 msec
2 80.1.14.1 [AS 65991] [MPLS: Label 16 Exp 0] 72 msec 72 msec 60 msec
3 80.1.12.2 [AS 65991] [MPLS: Label 16 Exp 0] 92 msec 80 msec 84 msec
4 80.1.13.1 [AS 65991] [MPLS: Label 18 Exp 0] 68 msec 52 msec 40 msec
5 80.1.15.2 [AS 65991] 72 msec 92 msec 72 msec
6 80.1.18.2 [AS 65991] 104 msec 92 msec 80 msec
We still have connectivity to the Microsoft router loopback and can see that the path is going out of CE_CUSTA_1 to PE1 as we would expect.
We have now configured what we set out to achieve. Send Customer A a partial BGP table so they can select a best path to a specific destination. If you don't want to use the MED on the PE devices to alter the behaviour on the CE routers.
You could instead configure a route-map on the CE side and set local preference to be advertised into iBGP between the CEs. I configure this in a YouTube video I have posted: Using BGP MED, next-hop-self and local pref Attributes to manipulate traffic flow - YouTube. This also shows you how to alter the return traffic too so that we don't cause asymmetric routing.
0 Comments