RSVP lsps getting their own bandwidth

RSVP is typically associated to Traffic Engineering. Via CSPF we can control how lsps are built by imposing certain constraints like link colors or bandwidth.

When we specify a given bandwidth, CSPF looks for paths where that specified bandwidth is available.

This mechanism is pretty useful when we know that a lsp will carry a certain amount of traffic and we want to be sure that the tunnel is created where that capacity is free.

Anyhow, bandwidth is not constant over time but might change over time. This means that we might build a lsp with bw 100m that, for the whole afternoon, is traversed by 20m only.

Is this an issue? Somewhat yes…
When CSPF creates a lsp with a bandwidth requirement, that bw is statically reserved from the rsvp interfaces traversed by the lsp. It might happen that a new lsp cannot be built because those 100m are preventing us from having enough free bandwidth…even if those 100m are not used for real.

To overcome this, Junos provides a feature called auto-bandwidth.

The principle is fairly easy:

  • junos periodically collects statistics from auto-bw-enabled lsps
  • if measured bw shows a “relevant change”, CSPF is run to compute the new path with the new required bw

Let’s understand better how it works!

  • Junos collects lsp statistics every X seconds (configurable, interval)
  • Junos, if needed, adjusts an auto-bw-enabled lsp every Y seconds (configurable, adjust interval)
  • normally, Y is larger than X
  • an adjust interval includes several samples
  • every sample represents the avg bw of the lsp measured every X seconds (interval)
  • every Y seconds (adjust interval) Junos compares the max avg bw measured among samples and acts accordingly
  • this means re-signalling the lsp with a lower or higher bandwidth

To make an example:

  • a lsp is currently configured with BW 50m
  • lsp statistics are taken every 60 seconds (interval)
  • adjust interval is 300 seconds
  • this means that 5 samples are taken every adjust interval
  • let’s imagine samples have values: 70m, 80m, 95m, 85m, 85m
  • max avg bw is 95m
  • bandwidth increase is enough to trigger lsp re-signal with bw constraint set to 95m

We spoke about bw “relevant” change or “enough change”…but what does this mean?
That is a configurable value we will look at later. If not explicitly set, default values apply.
In general, this adjust threshold is a percentage (e.g. 10%). If the difference between the current lsp bw and max avg bw in the last adjust interval is bigger than that threshold, lsp is re-signaled.

Enough words….time to look at a real example.

We will make use of this simple topology:

We are going to define a lsp between r1a and r4a.

At the RSVP level, interface ge-0/0/0 on r1a is assigned bw 100m. This static allocation tells junos that interface ge-0/0/0 (regardless its actual bw, it might be 1G/10G etc…) can reserve at most 100m when building rsvp lsps.

Moreover, some servers running iperf are connected to lsp ingress and egress routers. We will use those servers to generate traffic through the lsp.

Let’s dive into the configuration.

On r1a we need to specify rsvp bw for ge-0/0/0:

set protocols rsvp interface ge-0/0/0.0 bandwidth 30m
set protocols rsvp interface ge-0/0/1.0

Next, we enable mpls statistics:

set protocols mpls statistics file mplstat
set protocols mpls statistics interval 60
set protocols mpls statistics auto-bandwidth

We turn on mpls statistcs for auto bandwidth. Data is collected every 60 seconds. We can see measures on a file called mplstat (stored into /var/log).

Next, we define a simple lsp to r4a (100.4.4.4):

set protocols mpls label-switched-path r4a-abw to 100.4.4.4
set protocols mpls label-switched-path r4a-abw auto-bandwidth

With the above configuration, Junos will implement auto-bw with default values and logic.

Server address (behind r4a) is reachable via a BGP route that goes through our lsp:

root@r1a# run show route protocol bgp

inet.0: 21 destinations, 21 routes (21 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.64.2.0/24      *[BGP/170] 00:43:49, localpref 100, from 100.4.4.4
                      AS path: I, validation-state: unverified
                    >  to 192.168.12.1 via ge-0/0/0.0, label-switched-path r4a-abw

The lsp has ge-0/0/0 interface (the one with bw 100m) as next-hop.

Next, we started iperf client with about 200m bw.

We check logs (i omitted some and left just the relevant ones) and see this:

Sep  1 03:50:46.704074 Adjust Autobw: LSP r4a-abw (id 2) curr adj bw 0bps updated with 0bps

Sep  1 03:51:19.568674 This is the first non-zero sample that has arrived and hence ignored
Sep  1 03:51:49.568394 Update curr max avg bw 0bps of LSP r4a-abw with new bw 148.038Mbps
Sep  1 03:52:19.571363 Update curr max avg bw 148.038Mbps of LSP r4a-abw with new bw 163.177Mbps
Sep  1 03:52:49.571536 Update curr max avg bw 163.177Mbps of LSP r4a-abw with new bw 164.456Mbps
Sep  1 03:54:19.575581 Update curr max avg bw 164.456Mbps of LSP r4a-abw with new bw 171.672Mbps
Sep  1 03:55:46.705156 Adjust Autobw: LSP r4a-abw (id 2) curr adj bw 0bps updated with 171.672Mbps
Sep  1 03:55:46.705250 mpls LSP r4a-abw Autobw change 171.672Mbps >= threshold 0bps
Sep  1 03:55:46.705263 mpls LSP r4a-abw Autobw change 171.672Mbps >= threshold absolute bw 0bps

Sep  1 03:55:46.705306 mpls LSP r4a-abw either current traffic(171.672Mbps) or signaled bandwidth (0bps) is greater than adjust threshold BW (0bps) and hence re-signal
Sep  1 03:55:46.705414 Change in TED since last CSPF run, new CSPF  needed for path r4a-abw(primary ) upto-date? 0
Sep  1 03:55:46.705446 CSPF adding path r4a-abw(primary ) to CSPF queue 2
Sep  1 03:55:46.705464 CSPF creating CSPF job
Sep  1 03:55:46.705562
Sep  1 03:55:46.705584 CSPF for path r4a-abw(primary ), begin at r1a.00 , starting
Sep  1 03:55:46.705641  bandwidth: CT0=171.672Mbps ; setup priority: 7; random
Sep  1 03:55:46.705706 CSPF final destination 100.4.4.4
Sep  1 03:55:46.705739 CSPF starting from r1a.00 (100.1.1.1) to 100.4.4.4, hoplimit 254
Sep  1 03:55:46.705765  constraint bandwidth: CT0=171.672Mbps
Sep  1 03:55:46.705926 CSPF ERO for r4a-abw(primary ) (2 hops)
Sep  1 03:55:46.705941  node 192.168.13.1/32
Sep  1 03:55:46.705948  node 192.168.34.1/32
Sep  1 03:55:46.706738 CSPF for path r4a-abw(primary ) done!
Sep  1 03:55:46.733915 RPD_MPLS_LSP_CHANGE: MPLS LSP r4a-abw change on primary() Route  192.168.13.1(Label=22) 192.168.34.1(Label=3) lsp bandwidth 171672496 bps
Sep  1 03:55:46.734261 Autobw Success: LSP r4a-abw ()  (old id 2 new id 3) update prev active bw 0 bps with 171672496 bps
Sep  1 03:55:46.734293 RPD_MPLS_PATH_BANDWIDTH_CHANGE: MPLS path  (lsp r4a-abw) bandwidth changed, path bandwidth 171672496 bps
Sep  1 03:55:47.657097 Restored Cross Connect for lsp r4a-abw, path
Sep  1 03:55:47.657130 LSP r4a-abw path  set metric info: te: 20, igp: 0, min delay: 20, max delay: 0, avg delay: 33554430
Sep  1 03:55:49.568236 r4a-abw      (LSP ID 3, Tunnel ID 34492)        11400 pkt       16192332 Byte  11400 pps 16192332 Bps Util 75.46% Reserved Bw 21459062 Bps

What do we see?
We have some logs where the current max avg is updated. This happens after mpls statistics interval expires and collected data is processed.
When adjust interval expires Junos realized the max avg bw was higher than lsp bw and triggered re-signal.
As the new signalled bw was around 170m interface ge-0/0/0 was no longer available so CSPF had to create the path through ge-0/0/1.

This is auto-bandwidth in action!

Similar info can be seen looking ad lsp details:

root@r1a# run show mpls lsp extensive
Ingress LSP: 1 sessions

100.4.4.4
  From: 100.1.1.1, State: Up, ActiveRoute: 0, LSPname: r4a-abw, LSPid: 3
  ActivePath:  (primary)
  LSPtype: Static Configured, Penultimate hop popping
  LoadBalance: Random
  Follow destination IGP metric
  Autobandwidth
  AdjustTimer: 300 secs
  Max AvgBW util: 183.83Mbps, Bandwidth Adjustment in 177 second(s).
  Overflow limit: 0, Overflow sample count: 3
  Underflow limit: 0, Underflow sample count: 0, Underflow Max AvgBW: 0bps
  Encoding type: Packet, Switching type: Packet, GPID: IPv4
  LSP Self-ping Status : Enabled
 *Primary                    State: Up
    Priorities: 7 0
    Bandwidth: 164.269Mbps
    SmartOptimizeTimer: 180
    Flap Count: 0
    MBB Count: 1
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 20)
 192.168.13.1 S 192.168.34.1 S
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.13.1(Label=18) 192.168.34.1(Label=3)
   20 Sep  1 03:28:40.539 Make-before-break: Cleaned up old instance: Hold dead expiry
   19 Sep  1 03:27:24.276 Make-before-break: Switched to new instance
   18 Sep  1 03:27:24.274 Self-ping ended successfully
   17 Sep  1 03:27:23.502 Up
   16 Sep  1 03:27:23.502 Automatic Autobw adjustment succeeded: BW changes from 0 bps to 164269456 bps
   15 Sep  1 03:27:23.502 Self-ping started
   14 Sep  1 03:27:23.502 Self-ping enqueued
   13 Sep  1 03:27:23.502 Record Route:  192.168.13.1(Label=18) 192.168.34.1(Label=3)
   12 Sep  1 03:27:23.472 LSP-ID: 2 created
   11 Sep  1 03:27:23.472 Originate make-before-break call
   10 Sep  1 03:27:23.472 CSPF: computation result accepted  192.168.13.1 192.168.34.1
    9 Sep  1 03:22:23.499 Selected as active path
    8 Sep  1 03:22:23.498 Self-ping ended successfully
    7 Sep  1 03:22:23.495 Up
    6 Sep  1 03:22:23.495 Self-ping started
    5 Sep  1 03:22:23.495 Self-ping enqueued
    4 Sep  1 03:22:23.495 Record Route:  192.168.12.1(Label=16) 192.168.24.1(Label=3)
    3 Sep  1 03:22:23.467 LSP-ID: 1 created
    2 Sep  1 03:22:23.467 Originate Call
    1 Sep  1 03:22:23.467 CSPF: computation result accepted  192.168.12.1 192.168.24.1

BGP route got updated:

root@r1a# run show route protocol bgp

inet.0: 21 destinations, 21 routes (21 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

100.64.2.0/24      *[BGP/170] 00:46:06, localpref 100, from 100.4.4.4
                      AS path: I, validation-state: unverified
                    >  to 192.168.13.1 via ge-0/0/1.0, label-switched-path r4a-abw

Now, let’s take a bit of control of how auto-bw is triggered.

We make our auto-bw config on the lsp more cmplex:

set protocols mpls label-switched-path r4a-abw auto-bandwidth adjust-interval 300
set protocols mpls label-switched-path r4a-abw auto-bandwidth adjust-threshold 50
set protocols mpls label-switched-path r4a-abw auto-bandwidth maximum-bandwidth 150m
set protocols mpls label-switched-path r4a-abw auto-bandwidth adjust-threshold-overflow-limit 3

First, we set adjust interval to 5 minutes (300 seconds).

Next, we set the adjust threshold to 50%. This means that re-signalling is triggered if the difference between current bw and new max avg bw is greater than 50%.

We also set a maximum bw that can be set when re-signaling the lsp.

To understand the last setting, let’s think about this sequence of samples (current lsp bw is 50): 100, 110, 105, 111, 100.
Adjust will be triggered. Anyhow, we had to wait 5 minutes to react to a trend that was already there before.
By setting the overflow limit to 3, we tell junos to immediately re-run cspf if 3 consecutive overflow samples are found. In this case, cspf will be run after 3 minutes (not 5).
Is this always a good idea? It depends…of course. It might be that after those 3 high samples, next samples would be very low…so reacting fast does not necessarily mean to reflect the actual bw situation. At the same time, even by waiting 5 minutes, the result would be the same considering auto-bw logic.

This opens another discussion about how to improve auto-bw and that’s something junos has worked on and I will talk about it in the future.

For now, let’s deal with and understand auto-bw standard logic.

Let’s assume lsp bw is initially 0.

We push 10m from iperf client.

Logs show this:

Sep 13 00:52:59.507942 r4a-abw      (LSP ID 7, Tunnel ID 39265)        20695 pkt       15708244 Byte    344 pps   261804 Bps Reserved Bw        0 Bps
Sep 13 00:52:59.507994 This is the first non-zero sample that has arrived and hence ignored
Sep 13 00:52:59.508022 Update curr max avg bw 0bps of LSP r4a-abw with new bw 0bps
Sep 13 00:53:59.506767 r4a-abw      (LSP ID 7, Tunnel ID 39265)       124795 pkt       94720144 Byte   1735 pps  1316865 Bps Reserved Bw        0 Bps
Sep 13 00:53:59.506790 Update curr max avg bw 0bps of LSP r4a-abw with new bw 10.5349Mbps
Sep 13 00:54:59.507813 r4a-abw      (LSP ID 7, Tunnel ID 39265)       225437 pkt      171107422 Byte   1677 pps  1273121 Bps Reserved Bw        0 Bps
Sep 13 00:55:59.508715 r4a-abw      (LSP ID 7, Tunnel ID 39265)       329461 pkt      250061638 Byte   1733 pps  1315903 Bps Reserved Bw        0 Bps
Sep 13 00:56:59.507991 r4a-abw      (LSP ID 7, Tunnel ID 39265)       430056 pkt      326413976 Byte   1676 pps  1272538 Bps Reserved Bw        0 Bps
Sep 13 00:56:59.508114 Adjust Autobw: LSP r4a-abw (id 7) curr adj bw 0bps updated with 10.5349Mbps
Sep 13 00:56:59.508185 mpls LSP r4a-abw Autobw change 10.5349Mbps >= threshold 0bps
Sep 13 00:56:59.508199 mpls LSP r4a-abw Autobw change 10.5349Mbps >= threshold absolute bw 0bps
Sep 13 00:56:59.508209 mpls LSP r4a-abw either current traffic(10.5349Mbps) or signaled bandwidth (0bps) is greater than adjust threshold BW (0bps) and hence re-signal
Sep 13 00:56:59.508281 Change in TED since last CSPF run, new CSPF  needed for path r4a-abw(primary ) upto-date? 0
Sep 13 00:56:59.508312 CSPF adding path r4a-abw(primary ) to CSPF queue 2
Sep 13 00:56:59.508348 CSPF creating CSPF job
Sep 13 00:56:59.508430
Sep 13 00:56:59.508437 CSPF for path r4a-abw(primary ), begin at r1a.00 , starting
Sep 13 00:56:59.508488  bandwidth: CT0=10.5349Mbps ; setup priority: 7; random
Sep 13 00:56:59.508553 CSPF credibility 0
Sep 13 00:56:59.508560 CSPF final destination 100.4.4.4
Sep 13 00:56:59.508590 CSPF starting from r1a.00 (100.1.1.1) to 100.4.4.4, hoplimit 254
Sep 13 00:56:59.508611  constraint bandwidth: CT0=10.5349Mbps
Sep 13 00:56:59.508693 CSPF Reached target
Sep 13 00:56:59.508710 CSPF completed in 0.000089s
Sep 13 00:56:59.508762 CSPF ERO for r4a-abw(primary ) (2 hops)
Sep 13 00:56:59.508782  node 192.168.13.1/32
Sep 13 00:56:59.508789  node 192.168.34.1/32
Sep 13 00:56:59.509604 CSPF for path r4a-abw(primary ) done!
Sep 13 00:56:59.539518 RPD_MPLS_LSP_CHANGE: MPLS LSP r4a-abw change on primary() Route  192.168.13.1(Label=53) 192.168.34.1(Label=3) lsp bandwidth 10534921 bps
Sep 13 00:56:59.540061 Autobw Success: LSP r4a-abw ()  (old id 7 new id 8) update prev active bw 0 bps with 10534921 bps
Sep 13 00:56:59.540091 RPD_MPLS_PATH_BANDWIDTH_CHANGE: MPLS path  (lsp r4a-abw) bandwidth changed, path bandwidth 10534921 bps
Sep 13 00:56:59.755681 Restored Cross Connect for lsp r4a-abw, path
Sep 13 00:56:59.755716 LSP r4a-abw path  set metric info: te: 20, igp: 0, min delay: 20, max delay: 0, avg delay: 33554430

After 5 minutes auto-bw is triggered and bw lsp brought to 10.5m.

Next, we increase iperf “gun” to 13m:

Sep 13 00:57:59.508913 r4a-abw      (LSP ID 8, Tunnel ID 39265)       112152 pkt       85123368 Byte   1900 pps  1442769 Bps Util 109.57% Reserved Bw  1316865 Bps
Sep 13 00:57:59.508932 LSP r4a-abw (id 8) ignore new bytes arrived
Sep 13 00:57:59.508951 Normalization occurred, sample bw 0bps on pvc r4a-abw will be ignored
Sep 13 00:58:59.505923 r4a-abw      (LSP ID 8, Tunnel ID 39265)       242910 pkt      184368690 Byte   2179 pps  1654088 Bps Util 125.61% Reserved Bw  1316865 Bps
Sep 13 00:58:59.505946 Update curr max avg bw 10.5349Mbps of LSP r4a-abw with new bw 13.2327Mbps
Sep 13 00:59:59.505864 r4a-abw      (LSP ID 8, Tunnel ID 39265)       373722 pkt      283654998 Byte   2180 pps  1654771 Bps Util 125.66% Reserved Bw  1316865 Bps
Sep 13 00:59:59.505896 Update curr max avg bw 13.2327Mbps of LSP r4a-abw with new bw 13.2382Mbps
Sep 13 01:00:59.513835 r4a-abw      (LSP ID 8, Tunnel ID 39265)       509026 pkt      386350734 Byte   2255 pps  1711595 Bps Util 129.98% Reserved Bw  1316865 Bps
Sep 13 01:00:59.513864 Update curr max avg bw 13.2382Mbps of LSP r4a-abw with new bw 13.6928Mbps
Sep 13 01:01:59.508608 Adjust Autobw: LSP r4a-abw (id 8) curr adj bw 10.5349Mbps updated with 13.6928Mbps
Sep 13 01:01:59.512883 r4a-abw      (LSP ID 8, Tunnel ID 39265)       639878 pkt      485667402 Byte   2180 pps  1655277 Bps Util 125.70% Reserved Bw  1316865 Bps
Sep 13 01:01:59.512906 Update curr max avg bw 13.6928Mbps of LSP r4a-abw with new bw 13.2422Mbps

In this case, when adjust interval expires, auto-bw is not triggered. This is because samples never reported a new bw higher than 50% of the current one.
If you look at statistics logs, utilization never reaches 150%.

Last, we increase end to end traffic to 50m:

Sep 13 01:25:00.505667 r4a-abw      (LSP ID 12, Tunnel ID 39265)       616162 pkt      467666958 Byte   2181 pps  1655733 Bps Util 125.81% Reserved Bw  1316106 Bps
Sep 13 01:25:00.505691 Update curr max avg bw 13.7013Mbps of LSP r4a-abw with new bw 13.2459Mbps

Sep 13 01:26:00.505727 r4a-abw      (LSP ID 12, Tunnel ID 39265)      1071303 pkt      813125574 Byte   7585 pps  5757643 Bps Util 437.48% Reserved Bw  1316106 Bps
Sep 13 01:26:00.505758 Update curr max avg bw 13.2459Mbps of LSP r4a-abw with new bw 46.0611Mbps
Sep 13 01:27:00.508810 r4a-abw      (LSP ID 12, Tunnel ID 39265)      1459122 pkt     1107492692 Byte   6463 pps  4906118 Bps Util 372.78% Reserved Bw  1316106 Bps
Sep 13 01:28:00.505699 r4a-abw      (LSP ID 12, Tunnel ID 39265)      1796753 pkt     1363758286 Byte   5627 pps  4271093 Bps Util 324.53% Reserved Bw  1316106 Bps
Sep 13 01:28:00.505836 Adjust Autobw: LSP r4a-abw (id 12) curr adj bw 13.7013Mbps updated with 46.0611Mbps
Sep 13 01:28:00.505888 mpls LSP r4a-abw Autobw change 35.5323Mbps >= threshold 5.26442Mbps
Sep 13 01:28:00.505899 mpls LSP r4a-abw Autobw change 35.5323Mbps >= threshold absolute bw 0bps
Sep 13 01:28:00.505910 mpls LSP r4a-abw either current traffic(46.0611Mbps) or signaled bandwidth (10528848bps) is greater than adjust threshold BW (0bps) and hence re-signal
Sep 13 01:28:00.505970 Change in TED since last CSPF run, new CSPF  needed for path r4a-abw(primary ) upto-date? 0
Sep 13 01:28:00.506000 CSPF adding path r4a-abw(primary ) to CSPF queue 2
Sep 13 01:28:00.506017 CSPF creating CSPF job
Sep 13 01:28:00.506097
Sep 13 01:28:00.506104 CSPF for path r4a-abw(primary ), begin at r1a.00 , starting
Sep 13 01:28:00.506155  bandwidth: CT0=46.0612Mbps ; setup priority: 7; random
Sep 13 01:28:00.506209 CSPF credibility 0
Sep 13 01:28:00.506216 CSPF final destination 100.4.4.4
Sep 13 01:28:00.506237 CSPF starting from r1a.00 (100.1.1.1) to 100.4.4.4, hoplimit 254
Sep 13 01:28:00.506260  constraint bandwidth: CT0=46.0612Mbps
Sep 13 01:28:00.506330 CSPF Reached target
Sep 13 01:28:00.506351 CSPF completed in 0.000083s
Sep 13 01:28:00.506387 CSPF ERO for r4a-abw(primary ) (2 hops)
Sep 13 01:28:00.506405  node 192.168.13.1/32
Sep 13 01:28:00.506423  node 192.168.34.1/32
Sep 13 01:28:00.507180 CSPF for path r4a-abw(primary ) done!
Sep 13 01:28:00.535690 RPD_MPLS_LSP_CHANGE: MPLS LSP r4a-abw change on primary() Route  192.168.13.1(Label=56) 192.168.34.1(Label=3) lsp bandwidth 46061148 bps
Sep 13 01:28:00.536118 Autobw Success: LSP r4a-abw ()  (old id 12 new id 13) update prev active bw 10528848 bps with 46061148 bps
Sep 13 01:28:00.536152 RPD_MPLS_PATH_BANDWIDTH_CHANGE: MPLS path  (lsp r4a-abw) bandwidth changed, path bandwidth 46061148 bps
Sep 13 01:28:01.507031 Restored Cross Connect for lsp r4a-abw, path
Sep 13 01:28:01.507066 LSP r4a-abw path  set metric info: te: 20, igp: 0, min delay: 20, max delay: 0, avg delay: 33554430

As you can see, as expected, lsp is re-signalled with new bw (46m). Anyhow, it did not take 5 minutes but only 3. This is because we hit the overflow limit 3 counter.

The following image sums up these scenarios:

Scenarios 1, 2 and 4 are the ones we have just seen.

Scenario 3 simply shows that overflow limit only works if the “overflow samples” are consecutive. There, we have 3 “high” samples but they are not consecutive so junos waits the whole adjust interval.

That should be enough for today

Ciao
IoSonoUmberto

Leave a comment