Three approaches to achieve ECMP with Route Reflection

Modern and well-built networks are path redundant. This redundancy not only brings higher fault tolerance but it always provide better traffic distribution as those redundant paths also are different paths that can be used to share the load across the network. Simply put, we have equal cost multiple paths.

Moreover, in medium/large networks we will probably have route reflectors to distribute routes within the routing domain.

By default, route reflection and ECMP are not great friends.

Let’s consider this reference topology:

Network 100/8 is reachable through both R1 and R2. That route is advertised, via iBGP, to Route Reflectors that reflect it to R3.

Our final goal is to have ECMP at R3. We want traffic destined to 100/8 to be equally shared among R3-R1 and R3-R2 links.

We configure all the network elements with a minimal basic configuration:

  • OSPF among routers (lo0 passive)
  • iBGP sessions between clients and RRs
  • RRs BGP configuration only includes “cluster” setting (0.0.0.1 and 0.0.0.2)

For simplicity, 100/8 is configured as a “discard static route” on both R1 and R2 and distributed to RRs.

RRs receive a copy of 100/8 from both R1 and R2:

root@rr1_re# run show route protocol bgp

inet.0: 24 destinations, 25 routes (24 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.0.0.0/8        *[BGP/170] 00:01:23, localpref 100, from 1.1.1.1
                      AS path: I, validation-state: unverified
                    > to 192.168.14.0 via ge-0/0/0.0
                    [BGP/170] 00:01:41, localpref 100, from 2.2.2.2
                      AS path: I, validation-state: unverified
                    > to 192.168.24.0 via ge-0/0/1.0

They only chose one. In this case the first one as it comes from the lowest peer (1.1.1.1):

root@rr1_re# run show route advertising-protocol bgp 3.3.3.3 extensive

inet.0: 24 destinations, 25 routes (24 active, 0 holddown, 0 hidden)
* 100.0.0.0/8 (2 entries, 1 announced)
 BGP group rr type Internal
     Nexthop: 1.1.1.1
     Localpref: 100
     AS path: [100] I
     Cluster ID: 0.0.0.1
     Originator ID: 1.1.1.1

root@rr2_re# run show route advertising-protocol bgp 3.3.3.3 extensive

inet.0: 24 destinations, 25 routes (24 active, 0 holddown, 0 hidden)
* 100.0.0.0/8 (2 entries, 1 announced)
 BGP group rr type Internal
     Nexthop: 1.1.1.1
     Localpref: 100
     AS path: [100] I
     Cluster ID: 0.0.0.2
     Originator ID: 1.1.1.1

R3 receives two copies, one from each RR, but they both point to the same next-hop (1.1.1.1):

root@r3_re# run show route protocol bgp

inet.0: 25 destinations, 26 routes (25 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.0.0.0/8        *[BGP/170] 00:03:50, localpref 100, from 11.11.11.11
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                    [BGP/170] 00:03:31, localpref 100, from 22.22.22.22
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0

As a result, we do not have ECMP, we lost it!

Adding multipath and oad balancing policy on RRs does not help. It brings ECMP on RRs (which is useless as RRs are not part of the forwarding path) but does not lead to multiple next-hops advertised to R3.

What can we do?
Here, I’m going to show three different approaches.

First one is to leverage a BGP feature called add-path.

Add Path requires configuration on both peers of a session.

On RRs we add:

root@rr1_re# show | compare rollback 1
[edit protocols bgp group rr]
+     family inet {
+         unicast {
+             add-path {
+                 send {
+                     path-count 4;
+                 }
+             }
+         }
+     }

where we basically tell BGP to advertise up to 4 paths for a given route.

On R1, R2 and R3 we add:

root@r3_re# show | compare rollback 1
[edit protocols bgp group rr]
+     family inet {
+         unicast {
+             add-path {
+                 receive;
+             }
+         }
+     }

this tells Junos to accept multiple paths.

Now RRs announces the route with multiple next-hops:

root@rr1_re# run show route advertising-protocol bgp 3.3.3.3

inet.0: 24 destinations, 25 routes (24 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
* 100.0.0.0/8             1.1.1.1                      100        I
                          2.2.2.2                      100        I

Let’s check on R3:

root@r3_re# run show route protocol bgp

inet.0: 25 destinations, 27 routes (25 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.0.0.0/8        *[BGP/170] 00:02:41, localpref 100, from 11.11.11.11
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                    [BGP/170] 00:02:37, localpref 100, from 22.22.22.22
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                    [BGP/170] 00:02:41, localpref 100, from 11.11.11.11
                      AS path: I, validation-state: unverified
                    > to 192.168.23.0 via ge-0/0/1.0

[edit]
root@r3_re# run show route forwarding-table destination 100.0.0.0/8
Routing table: default.inet
Internet:
Enabled protocols: Bridging,
Destination        Type RtRef Next hop           Type Index    NhRef Netif
100.0.0.0/8        user     0                    indr  1048575     2
                              192.168.13.0       ucst      513     6 ge-0/0/0.0

Not there yet!

On R3, we have load balancing on forwarding table but we still miss multipath on BGP:

root@r3_re# set protocols bgp group rr multipath

Now it works:

root@r3_re# run show route receive-protocol bgp 11.11.11.11

inet.0: 25 destinations, 26 routes (25 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
* 100.0.0.0/8             1.1.1.1                      100        I
                          2.2.2.2                      100        I

inet6.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

[edit]
root@r3_re# run show route receive-protocol bgp 22.22.22.22

inet.0: 25 destinations, 28 routes (25 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
  100.0.0.0/8             1.1.1.1                      100        I
                          2.2.2.2                      100        I

inet6.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

root@r3_re# run show route table inet.0 protocol bgp

inet.0: 25 destinations, 28 routes (25 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.0.0.0/8        *[BGP/170] 00:02:06, localpref 100, from 11.11.11.11
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                      to 192.168.23.0 via ge-0/0/1.0
                    [BGP/170] 00:00:50, localpref 100, from 22.22.22.22
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                    [BGP/170] 00:05:13, localpref 100, from 11.11.11.11
                      AS path: I, validation-state: unverified
                    > to 192.168.23.0 via ge-0/0/1.0
                    [BGP/170] 00:00:50, localpref 100, from 22.22.22.22
                      AS path: I, validation-state: unverified
                    > to 192.168.23.0 via ge-0/0/1.0

root@r3_re# run show route forwarding-table destination 100.0.0.0/8 table default
Routing table: default.inet
Internet:
Enabled protocols: Bridging,
Destination        Type RtRef Next hop           Type Index    NhRef Netif
100.0.0.0/8        user     0                    ulst  1048580     2
                                                 indr  1048579     2
                              192.168.23.0       ucst      514     6 ge-0/0/1.0
                                                 indr  1048575     2
                              192.168.13.0       ucst      513     6 ge-0/0/0.0

That’s it! ECMP via BGP!

For this approach to work, we need both peers to support Add Path.

It might happen that this is not the case. If so, we have to bet a bit creative!

Second approach is to use MED so that each RR announces different next-hops. As a consequence, RR clients will receive multiple BGP routes and will build ECMP locally.

Let’s see how to do this.

On R1, R2 and R3 we configure these policies:

set policy-options policy-statement med1000 then metric 1000
set policy-options policy-statement med1000 then accept
set policy-options policy-statement med2000 then metric 2000
set policy-options policy-statement med2000 then accept

First policy sets MED to 1000 while second one sets MED to 2000.

The idea behind this approach is that RR1 sees R1 as best next-hop while RR2 sees R2 as best next-hop. This way, they will advertise routes with different next-hops, unlike before where both RRs picks the same next-hop (lowest peer, 1.1.1.1).

Next, we configure export policies towards RRs.

On R1:

set protocols bgp group rr neighbor 11.11.11.11 export med1000
set protocols bgp group rr neighbor 22.22.22.22 export med2000

On R2:

set protocols bgp group rr neighbor 11.11.11.11 export med2000
set protocols bgp group rr neighbor 22.22.22.22 export med1000

Please notice:

  • MED1000 towards RR1 on R1 and towards RR2 on R2
  • MED2000 towards RR2 on R1 and towards RR1 on R1

That “policy inversion” makes the trick!

  • RR1 chooses 100/8 copy from R1 (MED 1000)
  • RR2 chooses 100/8 copy from R2 (MED 1000)

Thanks to this trick, R3 still receives two copies but, this time, next-hop are different. R3 applies best path selection algorithm, understand they are equal cost multi paths and installs ECMP routes.

As said, R3 gets routes with different next-hops:

root@r3_re# run show route receive-protocol bgp 11.11.11.11 100/8

inet.0: 25 destinations, 26 routes (25 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
* 100.0.0.0/8             1.1.1.1              1000    100        I

[edit]
root@r3_re# run show route receive-protocol bgp 22.22.22.22 100/8

inet.0: 25 destinations, 26 routes (25 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
  100.0.0.0/8             2.2.2.2              1000    100        I

And we end up with ECMP without Add Path:

root@r3_re# run show bgp neighbor 11.11.11.11 | match AddPath
  Peer does not support Addpath

[edit]
root@r3_re# run show bgp neighbor 22.22.22.22 | match AddPath
  Peer does not support Addpath

Summing up:

root@r3_re# run show route protocol bgp

inet.0: 25 destinations, 26 routes (25 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.0.0.0/8        *[BGP/170] 00:03:24, MED 1000, localpref 100, from 11.11.11.11
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                      to 192.168.23.0 via ge-0/0/1.0
                    [BGP/170] 00:03:24, MED 1000, localpref 100, from 22.22.22.22
                      AS path: I, validation-state: unverified
                    > to 192.168.23.0 via ge-0/0/1.0

root@r3_re# run show route forwarding-table table default destination 100/8
Routing table: default.inet
Internet:
Enabled protocols: Bridging,
Destination        Type RtRef Next hop           Type Index    NhRef Netif
100.0.0.0/8        user     0                    ulst  1048580     2
                                                 indr  1048579     2
                              192.168.23.0       ucst      514     6 ge-0/0/1.0
                                                 indr  1048575     2
                              192.168.13.0       ucst      513     6 ge-0/0/0.0

Are we done? Not yet. We still have one more approach.

The last way to achieve ECMP with route reflection is the usage of an anycast IP.

On R1 and R2, we configure a discard static route that will be used as anycast IP:

set routing-options static route 1.2.3.4/32 discard

We create a virtual router where we copy the routes remote routers have to see via ECMP path (in our case 100/8):

set routing-instances anycast-check instance-type virtual-router
set routing-instances anycast-check routing-options instance-import anycast-import
set policy-options policy-statement anycast-import term static from instance master
set policy-options policy-statement anycast-import term static from protocol static
set policy-options policy-statement anycast-import term static from prefix-list-filter to-rr exact
set policy-options policy-statement anycast-import term static then accept
set policy-options policy-statement anycast-import then reject
set policy-options prefix-list to-rr 100.0.0.0/8

Here, we used “from protocol static” as we emulated the “end route” as a local static route. Of course, based on the specific scenario we have to edit the policy properly (e.g. from protocol ospf, match a community, etc…).

We export the anycast route into OSPF:

set policy-options policy-statement exp-ospf term anycast from protocol static
set policy-options policy-statement exp-ospf term anycast from route-filter 1.2.3.4/32 exact
set policy-options policy-statement exp-ospf term anycast from condition anycast-check
set policy-options policy-statement exp-ospf term anycast then accept
set policy-options condition anycast-check if-route-exists 100.0.0.0/8 table anycast-check.inet.0
set protocols ospf export exp-ospf

As both R1 and R2 advertise anycast route into OSPF, R3 will have an ECMP route to the anycast route.

Last, we modify the export policy towards RRs on R1 and R2 so to set the next-hop to the anycast IP (1.2.3.4):

set policy-options policy-statement to-rr term ok from protocol static
set policy-options policy-statement to-rr term ok from prefix-list-filter to-rr exact
set policy-options policy-statement to-rr term ok then next-hop 1.2.3.4
set policy-options policy-statement to-rr term ok then accept
set policy-options policy-statement to-rr then reject

On R3, 1.2.3.4 is reachable via 2 ECMP paths (OSPF route):

root@r3_re# run show route 1.2.3.4 exact

inet.0: 26 destinations, 27 routes (26 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.2.3.4/32         *[OSPF/150] 00:01:17, metric 0, tag 0
                    > to 192.168.13.0 via ge-0/0/0.0
                      to 192.168.23.0 via ge-0/0/1.0

R3 receives route 100/8 from both RRs with next-hop 1.2.3.4:

root@r3_re# run show route protocol bgp extensive | match Proto
                Protocol next hop: 1.2.3.4
                        Protocol next hop: 1.2.3.4 Metric: 0
                Protocol next hop: 1.2.3.4
                        Protocol next hop: 1.2.3.4 Metric: 0

As a result, BGP route next-hop is resolved via the OSPF next-hop. As the OSPF next-hop is an ECMP one, the BGP route will leverage that ECMP next-hop as well:

root@r3_re# run show route protocol bgp

inet.0: 26 destinations, 27 routes (26 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.0.0.0/8        *[BGP/170] 00:00:22, localpref 100, from 11.11.11.11
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                      to 192.168.23.0 via ge-0/0/1.0
                    [BGP/170] 00:00:22, localpref 100, from 22.22.22.22
                      AS path: I, validation-state: unverified
                    > to 192.168.13.0 via ge-0/0/0.0
                      to 192.168.23.0 via ge-0/0/1.0

[edit]
root@r3_re# run show route forwarding-table table default destination 1.2.3.4
Routing table: default.inet
Internet:
Enabled protocols: Bridging,
Destination        Type RtRef Next hop           Type Index    NhRef Netif
1.2.3.4/32         user     0                    ulst  1048575     3
                              192.168.13.0       ucst      513     6 ge-0/0/0.0
                              192.168.23.0       ucst      514     6 ge-0/0/1.0

With this third approach, multipath on R3 bgp configuration is no longer needed (as we do not receive multiple routes from RRs; we receive one whose next-hop is locally resolved to an ECMP next-hop):

root@r3_re# delete protocols bgp group rr multipath

root@r3_re# run show route forwarding-table table default destination 1.2.3.4
Routing table: default.inet
Internet:
Enabled protocols: Bridging,
Destination        Type RtRef Next hop           Type Index    NhRef Netif
1.2.3.4/32         user     0                    ulst  1048575     3
                              192.168.13.0       ucst      513     6 ge-0/0/0.0
                              192.168.23.0       ucst      514     6 ge-0/0/1.0

[edit]
root@r3_re# run show route forwarding-table table default destination 100.0.0.0
Routing table: default.inet
Internet:
Enabled protocols: Bridging,
Destination        Type RtRef Next hop           Type Index    NhRef Netif
100.0.0.0/8        user     0                    indr  1048579     2
                                                 ulst  1048575     3
                              192.168.13.0       ucst      513     6 ge-0/0/0.0
                              192.168.23.0       ucst      514     6 ge-0/0/1.0

Before calling it a day, let’s spend few words on the condition we added to the ospf export policy:

set policy-options policy-statement exp-ospf term anycast from condition anycast-check
set policy-options condition anycast-check if-route-exists 100.0.0.0/8 table anycast-check.inet.0

Without it, 1.2.3.4/32 would always be advertised into ospf.

Anyhow, it might happen that, for example, R1 does not have a route for 100/8. If so, any traffic destined to 100/8 that R3 sent to R1 (legit as R1 is one of the ECMP next-hops) would get lost.

This way, instead, we advertise 1.2.3.4 into OSPF if and only if we have a route for 100/8. As a result, we “advertise ourselves as potential next-hop” for 100/8 only if we know we can reach 100/8.

And that’s it!

Ciao
IoSonoUmberto

One thought on “Three approaches to achieve ECMP with Route Reflection”

Leave a comment