“Working around” NSSA to have TI-LFA working

In the last post we have seen as SR and TI-LFA works fine by default with NSSA ospf areas.

Anyhow, the game broke as soon as we added the no-summaries setting. As a result, our ABR was no longer able to compute the TI-LFA path.

Here, we are going to explore some workarounds.

The first one allows us to stick with our NSSA no-summaries area but does not provide a very fast failover in case of failure.

The idea is to manually define a static SR lsp that follows our intended TI-LFA path:

set protocols source-packet-routing preference 15
set protocols source-packet-routing segment-list r1-bkp-sl inherit-label-nexthops
set protocols source-packet-routing segment-list r1-bkp-sl auto-translate
set protocols source-packet-routing segment-list r1-bkp-sl r2 ip-address 192.168.23.0
set protocols source-packet-routing segment-list r1-bkp-sl r4 ip-address 192.168.24.1
set protocols source-packet-routing segment-list r1-bkp-sl r1 ip-address 192.168.14.0
set protocols source-packet-routing preserve-nexthop-hierarchy
set protocols source-packet-routing source-routing-path r1-bkp to 1.1.1.1
set protocols source-packet-routing source-routing-path r1-bkp binding-sid 1000111
set protocols source-packet-routing source-routing-path r1-bkp primary r1-bkp-sl

Basically, we define a colorless lsp (installed into inet.3 directly) that follows the path R3-R2-R4-R1. Ip addresses are given so that Junos automatically translates them into adjacency SIDs.
We also set preference 15 so that L-OSPF route stays primary by default.

This is the result:

root@r3# run show route 1.1.1.1 table inet.3

inet.3: 6 destinations, 7 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:06:03, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0
                    [SPRING-TE/15] 00:00:08, metric 1, metric2 1
                    >  to 192.168.23.0 via ge-0/0/1.0, Push 77, Push 23(top)

[edit]
root@r3# run show route 1.1.1.1 table inet.3 active-path

inet.3: 6 destinations, 7 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:06:09, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0

We have a backup path!

However, this solution is not perfect for at least two reasons:

  • we “hard-coded” the backup path. True, it follows the TI-LFA path that was computed before but what if R3-R2 failed along with the protected link (R3-R1)? In that case, our backup lsp would go down. To say it in other words, we get a backup path but we lost the dynamicity of the IGP and TI-LFA. Yet, we have to say that if both links fail simultaneously we would be in a double fault scenario and we might accept traffic to be lost (that’s up to how many failures we want to accept and cover)
  • the backup route is just another route towards that endpoint but it is not a real backup route. It is not pre-installed into FIB with a higher weight. This route becomes active when the L-OSPF one goes down but this is not instantaneous like a backup route.
    Moreover, if some seamless mpls must be provided, we might need to make the binding SID (1000111) available and known to other nodes
root@r3# run show route label 1000111

mpls.0: 28 destinations, 28 routes (28 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1000111            *[SPRING-TE/15] 00:05:34, metric 1, metric2 0
                    >  to 192.168.23.0 via ge-0/0/1.0, Swap 77, Push 23(top)

For this reason, this solution might work but I would not recommend it in a real environment.

Do we have an alternative? Yes we do.

The real issue here is the NSSA itself. NSSA was designed many years ago when networks very different and, more importantly, devices were less powerful. NSSA brought some resource saving along with some limitations (not related to SR) like the one we have seen here.

Nowadays, we can safely migrate from NSSA and its limitations and opt for regular areas. Migrating to regular areas might lead someone to shout “yes but…what about all the routes my lsdb will get? No-summaries was designed to reduce the size of the lsdb”. That’s what I think about this. As just said, NSSA came into the game in another “era”. Thinking about the size of the db and ways to limit flooding was relevant when devices scalability was a concern. Today, we can safely say we simply have “more space”. As a result, thinking about saving space this way might seem a bit anachronistic, “out of time”. It is not that saving resources is bad; you can still do it but is it really necessary? Especially, considering the limitations we bump into with things like NSSA.

Moreover, available features to configure ospf progressed as well and today we are able to use standard areas but, at the same time, to perform filtering and control on flooding.
This is the key concept behind this second solution, the recommended one.

First we transform area 1 from every router into a standard one:

delete protocols ospf area 0.0.0.1 nssa

On R1 (non ABR area 1 router) we check our db:

root@r1# run show ospf database | match Summ | count
Count: 34 lines

root@r1# run show ospf database | match Summ
Summary  3.3.3.3          3.3.3.3          0x80000001   114  0x22 0xa480  28
Summary  3.3.3.3          4.4.4.4          0x80000001   103  0x22 0x9a84  28
Summary  4.4.4.4          3.3.3.3          0x80000001   114  0x22 0x625a  28
Summary  4.4.4.4          4.4.4.4          0x80000001   103  0x22 0x58c4  28
Summary  5.5.5.5          3.3.3.3          0x80000001   114  0x22 0x52c9  28
Summary  5.5.5.5          4.4.4.4          0x80000001   103  0x22 0x34e3  28
Summary  6.6.6.6          3.3.3.3          0x80000001   114  0x22 0x24f3  28
Summary  6.6.6.6          4.4.4.4          0x80000001   103  0x22 0x60e   28
Summary  7.7.7.7          3.3.3.3          0x80000001   114  0x22 0xff13  28
Summary  7.7.7.7          4.4.4.4          0x80000001   103  0x22 0xe12d  28
Summary  8.8.8.8          3.3.3.3          0x80000001   114  0x22 0xd13d  28
Summary  8.8.8.8          4.4.4.4          0x80000001   103  0x22 0xb357  28
Summary  192.168.34.0     3.3.3.3          0x80000001   114  0x22 0xeb56  28
Summary  192.168.34.0     4.4.4.4          0x80000001   103  0x22 0xcd70  28
Summary  192.168.35.0     3.3.3.3          0x80000001   114  0x22 0xfea5  28
Summary  192.168.35.0     4.4.4.4          0x80000001   103  0x22 0xeab4  28
Summary  192.168.36.0     3.3.3.3          0x80000001   114  0x22 0xf3af  28
Summary  192.168.36.0     4.4.4.4          0x80000001   103  0x22 0xdfbe  28
Summary  192.168.45.0     3.3.3.3          0x80000001   114  0x22 0x7cb9  28
Summary  192.168.45.0     4.4.4.4          0x80000001   103  0x22 0x7224  28
Summary  192.168.46.0     3.3.3.3          0x80000001   114  0x22 0x71c3  28
Summary  192.168.46.0     4.4.4.4          0x80000001   103  0x22 0x672e  28
Summary  192.168.56.0     3.3.3.3          0x80000001   114  0x22 0x216d  28
Summary  192.168.56.0     4.4.4.4          0x80000001   103  0x22 0x387   28
Summary  192.168.57.0     3.3.3.3          0x80000001   114  0x22 0x1677  28
Summary  192.168.57.0     4.4.4.4          0x80000001   103  0x22 0xf791  28
Summary  192.168.58.0     3.3.3.3          0x80000001   114  0x22 0xb81   28
Summary  192.168.58.0     4.4.4.4          0x80000001   103  0x22 0xec9b  28
Summary  192.168.67.0     3.3.3.3          0x80000001   114  0x22 0xa7db  28
Summary  192.168.67.0     4.4.4.4          0x80000001   103  0x22 0x89f5  28
Summary  192.168.68.0     3.3.3.3          0x80000001   114  0x22 0x9ce5  28
Summary  192.168.68.0     4.4.4.4          0x80000001   103  0x22 0x7eff  28
Summary  192.168.78.0     3.3.3.3          0x80000001   114  0x22 0x1af9  28
Summary  192.168.78.0     4.4.4.4          0x80000001   103  0x22 0xfb14  28

We have 34 Summary LSAs. They are router loopbacks and p2p links. Right now, everything is flooded.

Next, on ABRs we add this:

set policy-options policy-statement lsa-filtering term ok from route-filter 0.0.0.0/0 prefix-length-range /32-/32
set policy-options policy-statement lsa-filtering term ok then accept
set policy-options policy-statement lsa-filtering then reject
set protocols ospf area 0.0.0.1 network-summary-export lsa-filtering

We use a feature called network-summary-export. Through a policy, we tell the ABR to only advertise certain Summary LSAs to other routers within area 1. Specifically, only /32 addresses (routers loopbacks) are accepted.

As a result, on R1:

root@r1# run show ospf database | match Summ | count
Count: 12 lines

[edit]
root@r1# run show ospf database | match Summ
Summary  3.3.3.3          3.3.3.3          0x80000001   229  0x22 0xa480  28
Summary  3.3.3.3          4.4.4.4          0x80000001   218  0x22 0x9a84  28
Summary  4.4.4.4          3.3.3.3          0x80000001   229  0x22 0x625a  28
Summary  4.4.4.4          4.4.4.4          0x80000001   218  0x22 0x58c4  28
Summary  5.5.5.5          3.3.3.3          0x80000001   229  0x22 0x52c9  28
Summary  5.5.5.5          4.4.4.4          0x80000001   218  0x22 0x34e3  28
Summary  6.6.6.6          3.3.3.3          0x80000001   229  0x22 0x24f3  28
Summary  6.6.6.6          4.4.4.4          0x80000001   218  0x22 0x60e   28
Summary  7.7.7.7          3.3.3.3          0x80000001   229  0x22 0xff13  28
Summary  7.7.7.7          4.4.4.4          0x80000001   218  0x22 0xe12d  28
Summary  8.8.8.8          3.3.3.3          0x80000001   229  0x22 0xd13d  28
Summary  8.8.8.8          4.4.4.4          0x80000001   218  0x22 0xb357  28

We reduced the db size but kept Summary LSAs for routers loopback.

This is fundamental because, unlike NSSA with no-summaries, R1 now has lsps to other routers (not just R2):

root@r1# run show route table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[L-OSPF/10/5] 00:04:19, metric 2
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1002
                       to 192.168.14.1 via ge-0/0/2.0, Push 1002
3.3.3.3/32         *[L-OSPF/10/5] 00:04:26, metric 1
                    >  to 192.168.13.1 via ge-0/0/1.0
4.4.4.4/32         *[L-OSPF/10/5] 00:04:19, metric 1
                    >  to 192.168.14.1 via ge-0/0/2.0
5.5.5.5/32         *[L-OSPF/10/5] 00:04:19, metric 2
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1005
                       to 192.168.14.1 via ge-0/0/2.0, Push 1005
6.6.6.6/32         *[L-OSPF/10/5] 00:04:19, metric 2
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1006
                       to 192.168.14.1 via ge-0/0/2.0, Push 1006
7.7.7.7/32         *[L-OSPF/10/5] 00:04:19, metric 3
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1007
                       to 192.168.14.1 via ge-0/0/2.0, Push 1007

As a consequence, magically, R3 is now able to compute the TI-LFA path (without the need of the static SPRING route):

root@r3# run show route 1.1.1.1 table inet.3

inet.3: 6 destinations, 7 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:05:55, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0
                       to 192.168.23.0 via ge-0/0/1.0, Push 1001, Push 1004(top)
                    [SPRING-TE/15] 00:05:37, metric 1, metric2 1
                    >  to 192.168.23.0 via ge-0/0/1.0, Push 79, Push 27(top)

[edit]
root@r3# deactivate protocols source-packet-routing

[edit]
root@r3# commit
commit complete

[edit]
root@r3# run show route 1.1.1.1 table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:06:08, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0
                       to 192.168.23.0 via ge-0/0/1.0, Push 1001, Push 1004(top)

[edit]
root@r3# run show route 1.1.1.1 table inet.3 extensive | match weigh
                Next hop: 192.168.13.0 via ge-0/0/0.0 weight 0x1, selected
                Next hop: 192.168.23.0 via ge-0/0/1.0 weight 0xf000

Here it is! Working TI-LFA and no more NSSA (but with the ability to control flooding).

Maybe it is time to move away from NSSA 🙂

Ciao
IoSonoUmberto

TI-LFA and OSPF NSSA

One great advantage brought by segment routing is TI-LFA. As already described in previous posts, TI-LFA allows a node to pre-compute post-convergence path and, leveraging the ability to define a label stack offered by SR, have it ready to be used even if a destination-based network would require time to converge after a failure.

To make an example, consider this network:

R7 has no alternate (LFA) paths to R8 (R7-R8 is the protected link) as both R6 and R5 have R7 as their next-hop to reach R8. Loops will appear.

When the network will converge R7 will compute the new path to R8 through R5. Anyhow, that path is not available during network convergence because, as we have just said, R5 and R6 still have R7 as next-hop to R8 (causing loops). Segment Routing allows us to pre-install the post-convergence path (TI-LFA) by imposing a label stack (made of adjacency or node SIDs) that makes this backup rout independent from the routing tables contents during failure (especially on node R5 and R6).

Having refreshed what TI-LFA is and why it is very useful, we can move to today lab.

This is our reference topology (each node loopback is x.x.x.x where x is Rx):

We will focus on this portion of the network:

Network is sliced into 2 ospf areas.

Unlike other times I dealt with SR, this time I used OSPF as IGP.

This is the minimal configuration needed to enable SR with OSPF:

set chassis network-services enhanced-ip
set protocols ospf backup-spf-options use-post-convergence-lfa maximum-labels 8
set protocols ospf backup-spf-options use-source-packet-routing
set protocols ospf traffic-engineering l3-unicast-topology
set protocols ospf traffic-engineering advertisement always
set protocols ospf source-packet-routing node-segment ipv4-index 3
set protocols ospf source-packet-routing srgb start-label 1000
set protocols ospf source-packet-routing srgb index-range 100

OSPF area and interface configuration is standard.

R3 (like R4) is an ABR:

set protocols ospf area 0.0.0.1 nssa no-summaries
set protocols ospf area 0.0.0.1 interface ge-0/0/0.0 interface-type p2p
set protocols ospf area 0.0.0.1 interface ge-0/0/0.0 post-convergence-lfa
set protocols ospf area 0.0.0.1 interface ge-0/0/1.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/2.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/2.0 metric 100
set protocols ospf area 0.0.0.0 interface ge-0/0/3.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface ge-0/0/4.0 interface-type p2p
set protocols ospf area 0.0.0.0 interface lo0.0 passive

Link protection (TI-LFA) configured on interface ge-0/0/0 (towards R1).

We also configure ABRs to inject a 0/0 route:

set protocols ospf area 0.0.0.1 nssa default-lsa default-metric 1

Let’s check R1.

It has 0/0 route:

root@r1> show route table inet.0 protocol ospf 0/0 exact

inet.0: 38 destinations, 39 routes (38 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[OSPF/150/10] 00:02:12, metric 2, tag 0
                    >  to 192.168.13.1 via ge-0/0/1.0
                       to 192.168.14.1 via ge-0/0/2.0

We have labelled routes towards other nodes within the network:

root@r1> show route table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[L-OSPF/10/5] 00:03:27, metric 2
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1002
                       to 192.168.14.1 via ge-0/0/2.0, Push 1002
3.3.3.3/32         *[L-OSPF/10/5] 00:03:51, metric 1
                    >  to 192.168.13.1 via ge-0/0/1.0
4.4.4.4/32         *[L-OSPF/10/5] 00:03:27, metric 1
                    >  to 192.168.14.1 via ge-0/0/2.0
5.5.5.5/32         *[L-OSPF/10/5] 00:03:27, metric 2
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1005
                       to 192.168.14.1 via ge-0/0/2.0, Push 1005
6.6.6.6/32         *[L-OSPF/10/5] 00:03:27, metric 2
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1006
                       to 192.168.14.1 via ge-0/0/2.0, Push 1006
7.7.7.7/32         *[L-OSPF/10/5] 00:03:27, metric 3
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1007
                       to 192.168.14.1 via ge-0/0/2.0, Push 1007

On R3 (ABR) same visibility on inet.3:

root@r3# run show route table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:08:31, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0
                       to 192.168.23.0 via ge-0/0/1.0, Push 1001, Push 1004(top)
2.2.2.2/32         *[L-OSPF/10/5] 00:09:00, metric 1
                    >  to 192.168.23.0 via ge-0/0/1.0
4.4.4.4/32         *[L-OSPF/10/5] 1w6d 20:13:06, metric 100
                    >  to 192.168.34.1 via ge-0/0/2.0
5.5.5.5/32         *[L-OSPF/10/5] 3w3d 23:35:57, metric 1
                    >  to 192.168.35.1 via ge-0/0/3.0
6.6.6.6/32         *[L-OSPF/10/5] 3w3d 23:34:43, metric 1
                    >  to 192.168.36.1 via ge-0/0/4.0
7.7.7.7/32         *[L-OSPF/10/5] 3w3d 21:20:06, metric 2
                    >  to 192.168.35.1 via ge-0/0/3.0, Push 1007
                       to 192.168.36.1 via ge-0/0/4.0, Push 1007

As you can see, we have all the endpoints. Moreover, TI-LFA path is there for 1.1.1.1.

This is the TI-LFA path:

R3 has to go to R2 first, then R4 and finally R1. Total cost is 30, which is lower than going through R1-R2 or R3-R4 links (that cost 100 each).

Let’s analyze the TI-LFA path:

root@r3# run show route table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:08:31, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0
                       to 192.168.23.0 via ge-0/0/1.0, Push 1001, Push 1004(top)

Traffic is sent to R2 as 192.168.23.0 is the link between R3 and R2.

On R2 label 1004 is processed:

root@r2> show route label 1004

mpls.0: 20 destinations, 20 routes (20 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1004               *[L-OSPF/10/5] 00:13:25, metric 1
                    >  to 192.168.24.1 via ge-0/0/2.0, Pop
1004(S=0)          *[L-OSPF/10/5] 00:13:25, metric 1
                    >  to 192.168.24.1 via ge-0/0/2.0, Pop

Label is popped and sent to R4 (192.168.24.1 is R2-R4 link).

Finally, on R4:

root@r4# run show route label 1001

mpls.0: 26 destinations, 26 routes (26 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1001               *[L-OSPF/10/5] 00:14:13, metric 1
                    >  to 192.168.14.0 via ge-0/0/0.0, Pop
1001(S=0)          *[L-OSPF/10/5] 00:14:13, metric 1
                    >  to 192.168.14.0 via ge-0/0/0.0, Pop

Label is popped and sent to R1 (192.168.14.0 is R1-R4 link).

The TI-LFA label stack forces packets to follow the intended path as desired!

As a result, having NSSA areas does not preserve backup paths to be found.

This is because NSSA only forbids external routes to enter the area (even if external routes generated within the NSSA itself are allowed if converted to type 7 LSAs). Other LSAs like Network and Summary are still there, allowing OSPF to have all the info to build backup paths.

This can be easily seen by checking OSPF database on R3 (ABR):

root@r3# run show ospf database

    OSPF database, Area 0.0.0.0
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len
Router  *3.3.3.3          3.3.3.3          0x800002f7   536  0x22 0xb8ae 108
Router   4.4.4.4          4.4.4.4          0x800002f8   680  0x22 0xe94b 108
Router   5.5.5.5          5.5.5.5          0x800002d8  2758  0x22 0xa293 108
Router   6.6.6.6          6.6.6.6          0x800002d8  2632  0x22 0x69bf 108
Summary *1.1.1.1          3.3.3.3          0x800002e5   984  0x22 0x3b0a  28
Summary  1.1.1.1          4.4.4.4          0x800002e4   961  0x22 0x1f23  28
Summary *2.2.2.2          3.3.3.3          0x800002e5   984  0x22 0xd34   28
Summary  2.2.2.2          4.4.4.4          0x800002e2   961  0x22 0xf44b  28
Summary  7.7.7.7          5.5.5.5          0x800002d0   543  0x22 0x1426  28
Summary  7.7.7.7          6.6.6.6          0x800002d1   616  0x22 0xf341  28
Summary  8.8.8.8          5.5.5.5          0x800002d5  1543  0x22 0xdb55  28
Summary  8.8.8.8          6.6.6.6          0x800002d5  1703  0x22 0xbd6f  28
Summary *192.168.12.0     3.3.3.3          0x800002fa   984  0x22 0xee6c  28
Summary  192.168.12.0     4.4.4.4          0x800002e1   961  0x22 0x36d   28
Summary *192.168.13.0     3.3.3.3          0x800002fb   984  0x22 0xf5c7  28
Summary  192.168.13.0     4.4.4.4          0x800002e0   961  0x22 0x18bb  28
Summary *192.168.14.0     3.3.3.3          0x800002e1   984  0x22 0x29ac  28
Summary  192.168.14.0     4.4.4.4          0x800002e3   961  0x22 0xfcd3  28
Summary *192.168.23.0     3.3.3.3          0x800002f8   984  0x22 0x8d29  28
Summary  192.168.23.0     4.4.4.4          0x800002de   961  0x22 0xad1e  28
Summary *192.168.24.0     3.3.3.3          0x800002f5   984  0x22 0x9225  28
Summary  192.168.24.0     4.4.4.4          0x800002e2   961  0x22 0x9037  28
Summary  192.168.57.0     5.5.5.5          0x800002d0  1472  0x22 0x2a8a  28
Summary  192.168.57.0     6.6.6.6          0x800002d0   545  0x22 0x1699  28
Summary  192.168.58.0     5.5.5.5          0x800002d0  1401  0x22 0x1f94  28
Summary  192.168.58.0     6.6.6.6          0x800002cf   917  0x22 0xda2   28
Summary  192.168.67.0     5.5.5.5          0x800002d0   472  0x22 0xc5e3  28
Summary  192.168.67.0     6.6.6.6          0x800002d0  1632  0x22 0x9d09  28
Summary  192.168.68.0     5.5.5.5          0x800002cf   829  0x22 0xbcec  28
Summary  192.168.68.0     6.6.6.6          0x800002d0  1560  0x22 0x9213  28
Summary  192.168.78.0     5.5.5.5          0x800002d0   401  0x22 0x2e0d  28
Summary  192.168.78.0     6.6.6.6          0x800002d0   474  0x22 0x1027  28
OpaqArea*1.0.0.1          3.3.3.3          0x800002ed   601  0x22 0x58cf  28
OpaqArea 1.0.0.1          4.4.4.4          0x800002ee   800  0x22 0x5ac4  28
OpaqArea 1.0.0.1          5.5.5.5          0x800002d3   258  0x22 0x949d  28
OpaqArea 1.0.0.1          6.6.6.6          0x800002d3   189  0x22 0x9891  28
OpaqArea*1.0.0.3          3.3.3.3          0x80000177   142  0x22 0x2d04  92
OpaqArea 1.0.0.3          4.4.4.4          0x800002dd   322  0x22 0xe4e4  92
OpaqArea 1.0.0.3          5.5.5.5          0x800002d3    43  0x22 0xed41  92
OpaqArea 1.0.0.3          6.6.6.6          0x800002d3  2774  0x22 0x37c3  92
OpaqArea*1.0.0.4          3.3.3.3          0x80000177    17  0x22 0x4345  92
OpaqArea 1.0.0.4          4.4.4.4          0x800002d5   253  0x22 0x8c77  92
OpaqArea 1.0.0.4          5.5.5.5          0x800002d0   114  0x22 0x169a  92
OpaqArea 1.0.0.4          6.6.6.6          0x800002d2   261  0x22 0xe04c  92
OpaqArea*1.0.0.5          3.3.3.3          0x80000176  2876  0x22 0xe4a0  92
OpaqArea 1.0.0.5          4.4.4.4          0x800002d5   183  0x22 0x1bf0  92
OpaqArea 1.0.0.5          5.5.5.5          0x800002d2  2901  0x22 0x9d5c  92
OpaqArea 1.0.0.5          6.6.6.6          0x800002d0   118  0x22 0xc4e2  92
OpaqArea*4.0.0.0          3.3.3.3          0x800002ef   894  0x22 0x189a  44
OpaqArea 4.0.0.0          4.4.4.4          0x800002ef   461  0x22 0xf9b4  44
OpaqArea 4.0.0.0          5.5.5.5          0x800002d3   329  0x22 0x14b2  44
OpaqArea 4.0.0.0          6.6.6.6          0x800002d2  2917  0x22 0xf7cb  44
OpaqArea*7.0.0.1          3.3.3.3          0x80000303   914  0x22 0x590e  92
OpaqArea 7.0.0.1          4.4.4.4          0x80000307   911  0x22 0x70e9  92
OpaqArea 7.0.0.1          5.5.5.5          0x800002d7  2115  0x22 0x28c8  68
OpaqArea 7.0.0.1          6.6.6.6          0x800002d7  1489  0x22 0x5691  68
OpaqArea 8.0.0.1          5.5.5.5          0x800002d3  2615  0x22 0xbe2f  48
OpaqArea 8.0.0.1          6.6.6.6          0x800002d3  2846  0x22 0x2fb0  48
OpaqArea 8.0.0.2          6.6.6.6          0x800002d2   403  0x22 0x6c8f  48
OpaqArea 8.0.0.3          5.5.5.5          0x800002d3  2472  0x22 0xb218  48
OpaqArea 8.0.0.3          6.6.6.6          0x800002d0    47  0x22 0x7e6e  48
OpaqArea 8.0.0.4          5.5.5.5          0x800002d0  2543  0x22 0xbd20  48
OpaqArea*8.0.0.6          3.3.3.3          0x8000017e   984  0x22 0x57dd  48
OpaqArea 8.0.0.6          4.4.4.4          0x80000176   601  0x22 0x46f4  48
OpaqArea*8.0.0.7          3.3.3.3          0x8000017d   984  0x22 0xaf7f  48
OpaqArea 8.0.0.7          4.4.4.4          0x80000176   531  0x22 0x67bf  48
OpaqArea*8.0.0.8          3.3.3.3          0x80000179   984  0x22 0xe1e   48
OpaqArea 8.0.0.8          4.4.4.4          0x80000176   392  0x22 0xbd62  48

    OSPF database, Area 0.0.0.1
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len
Router   1.1.1.1          1.1.1.1          0x8000031e   961  0x20 0x48b1 108
Router   2.2.2.2          2.2.2.2          0x8000031e   961  0x20 0x7b4d 108
Router  *3.3.3.3          3.3.3.3          0x8000030f   984  0x20 0x30ae  72
Router   4.4.4.4          4.4.4.4          0x800002f8   962  0x20 0x8762  72
Summary *3.3.3.3          3.3.3.3          0x80000001   984  0x20 0xc264  28
Summary  3.3.3.3          4.4.4.4          0x80000001   962  0x20 0xb868  28
Summary *4.4.4.4          3.3.3.3          0x80000002   343  0x20 0x7e3f  28
Summary  4.4.4.4          4.4.4.4          0x80000001   962  0x20 0x76a8  28
Summary *5.5.5.5          3.3.3.3          0x80000002   279  0x20 0x6eae  28
Summary  5.5.5.5          4.4.4.4          0x80000001   962  0x20 0x52c7  28
Summary *6.6.6.6          3.3.3.3          0x80000002   210  0x20 0x40d8  28
Summary  6.6.6.6          4.4.4.4          0x80000001   962  0x20 0x24f1  28
Summary *7.7.7.7          3.3.3.3          0x80000002    85  0x20 0x1cf7  28
Summary  7.7.7.7          4.4.4.4          0x80000001   962  0x20 0xff11  28
Summary *8.8.8.8          3.3.3.3          0x80000001   984  0x20 0xef21  28
Summary  8.8.8.8          4.4.4.4          0x80000001   962  0x20 0xd13b  28
Summary *192.168.34.0     3.3.3.3          0x80000001   984  0x20 0xa3a   28
Summary  192.168.34.0     4.4.4.4          0x80000001   962  0x20 0xeb54  28
Summary *192.168.35.0     3.3.3.3          0x80000001   984  0x20 0x1d89  28
Summary  192.168.35.0     4.4.4.4          0x80000001   962  0x20 0x998   28
Summary *192.168.36.0     3.3.3.3          0x80000001   984  0x20 0x1293  28
Summary  192.168.36.0     4.4.4.4          0x80000001   962  0x20 0xfda2  28
Summary *192.168.45.0     3.3.3.3          0x80000001   984  0x20 0x9a9d  28
Summary  192.168.45.0     4.4.4.4          0x80000001   962  0x20 0x9008  28
Summary *192.168.46.0     3.3.3.3          0x80000001   984  0x20 0x8fa7  28
Summary  192.168.46.0     4.4.4.4          0x80000001   962  0x20 0x8512  28
Summary *192.168.56.0     3.3.3.3          0x80000001   984  0x20 0x3f51  28
Summary  192.168.56.0     4.4.4.4          0x80000001   962  0x20 0x216b  28
Summary *192.168.57.0     3.3.3.3          0x80000001   984  0x20 0x345b  28
Summary  192.168.57.0     4.4.4.4          0x80000001   962  0x20 0x1675  28
Summary *192.168.58.0     3.3.3.3          0x80000001   984  0x20 0x2965  28
Summary  192.168.58.0     4.4.4.4          0x80000001   962  0x20 0xb7f   28
Summary *192.168.67.0     3.3.3.3          0x80000001   984  0x20 0xc5bf  28
Summary  192.168.67.0     4.4.4.4          0x80000001   962  0x20 0xa7d9  28
Summary *192.168.68.0     3.3.3.3          0x80000001   984  0x20 0xbac9  28
Summary  192.168.68.0     4.4.4.4          0x80000001   962  0x20 0x9ce3  28
Summary *192.168.78.0     3.3.3.3          0x80000001   984  0x20 0x38dd  28
Summary  192.168.78.0     4.4.4.4          0x80000001   962  0x20 0x1af7  28
NSSA    *0.0.0.0          3.3.3.3          0x80000001   914  0x20 0x2ff6  36
NSSA     0.0.0.0          4.4.4.4          0x80000001   912  0x20 0x1111  36
OpaqArea 1.0.0.1          1.1.1.1          0x800002eb   701  0x20 0x72c9  28
OpaqArea 1.0.0.1          2.2.2.2          0x800002eb   673  0x20 0x76bd  28
OpaqArea*1.0.0.1          3.3.3.3          0x800002f9   984  0x20 0x5ebf  28
OpaqArea 1.0.0.1          4.4.4.4          0x800002e7   115  0x20 0x86a1  28
OpaqArea 1.0.0.3          1.1.1.1          0x800002d9   158  0x20 0x818a  92
OpaqArea 1.0.0.3          2.2.2.2          0x800002d8  2306  0x20 0xb02   92
OpaqArea*1.0.0.3          3.3.3.3          0x8000017d   984  0x20 0xa123  92
OpaqArea 1.0.0.3          4.4.4.4          0x800002e3    45  0x20 0x1f34  92
OpaqArea 1.0.0.4          1.1.1.1          0x8000019b   985  0x20 0x8222  92
OpaqArea 1.0.0.4          2.2.2.2          0x800002e2   985  0x20 0x4008  92
OpaqArea*1.0.0.4          3.3.3.3          0x8000017d   984  0x20 0x9519  92
OpaqArea 1.0.0.4          4.4.4.4          0x800002e1   962  0x20 0x82be  92
OpaqArea 1.0.0.5          1.1.1.1          0x800002dd   961  0x20 0xec68  92
OpaqArea 1.0.0.5          2.2.2.2          0x800002dd   961  0x20 0xa79b  92
OpaqArea 4.0.0.0          1.1.1.1          0x800002eb  2307  0x20 0x7a46  44
OpaqArea 4.0.0.0          2.2.2.2          0x800002eb   374  0x20 0x5c60  44
OpaqArea*4.0.0.0          3.3.3.3          0x800002fd   472  0x20 0x1a8c  44
OpaqArea 4.0.0.0          4.4.4.4          0x800002e8   962  0x20 0x2691  44
OpaqArea 7.0.0.1          1.1.1.1          0x800002d6  2607  0x20 0x348a  44
OpaqArea 7.0.0.1          2.2.2.2          0x800002d6  2606  0x20 0x6253  44
OpaqArea*7.0.0.1          3.3.3.3          0x80000003   914  0x20 0x676d 140
OpaqArea 7.0.0.1          4.4.4.4          0x80000003   912  0x20 0x4f81 140
OpaqArea 8.0.0.1          1.1.1.1          0x800002ed   429  0x20 0xa012  48
OpaqArea 8.0.0.1          2.2.2.2          0x800002ec    76  0x20 0x644e  48
OpaqArea*8.0.0.1          3.3.3.3          0x80000180   914  0x20 0xad0e  48
OpaqArea 8.0.0.1          4.4.4.4          0x8000017e   912  0x20 0x1920  48
OpaqArea*8.0.0.2          3.3.3.3          0x80000180   914  0x20 0x121a  48
OpaqArea 8.0.0.2          4.4.4.4          0x8000017e   912  0x20 0x920   48
OpaqArea 8.0.0.39         1.1.1.1          0x80000001   985  0x20 0x6312  48
OpaqArea 8.0.0.39         2.2.2.2          0x80000001   985  0x20 0xef77  48
OpaqArea 8.0.0.40         1.1.1.1          0x80000001   961  0x20 0xb9b4  48
OpaqArea 8.0.0.40         2.2.2.2          0x80000001   961  0x20 0x461a  48

Now, let’s configure NSSA on ABRs as no-summaries:

set protocols ospf area 0.0.0.1 nssa no-summaries

Let’s go to R1 and see if everything changed.

0/0 is still there:

root@r1> show route 0/0 exact

inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[OSPF/150/10] 00:00:18, metric 2, tag 0
                    >  to 192.168.13.1 via ge-0/0/1.0
                       to 192.168.14.1 via ge-0/0/2.0

Table inet.3 is different:

root@r1> show route table inet.3

inet.3: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2.2.2.2/32         *[L-OSPF/10/5] 00:01:01, metric 2
                    >  to 192.168.13.1 via ge-0/0/1.0, Push 1002
                       to 192.168.14.1 via ge-0/0/2.0, Push 1002

Only a LSP to 2.2.2.2 (other non ABR router within area 1) is available.

Let’s check R1 ospf database:

root@r1> show ospf database

    OSPF database, Area 0.0.0.1
 Type       ID               Adv Rtr           Seq      Age  Opt  Cksum  Len
Router  *1.1.1.1          1.1.1.1          0x80000322   109  0x20 0x40b5 108
Router   2.2.2.2          2.2.2.2          0x80000322   110  0x20 0x7351 108
Router   3.3.3.3          3.3.3.3          0x80000311   114  0x20 0x2cb0  72
Router   4.4.4.4          4.4.4.4          0x800002fb   115  0x20 0x8165  72
NSSA     0.0.0.0          3.3.3.3          0x80000001   118  0x20 0x2ff6  36
NSSA     0.0.0.0          4.4.4.4          0x80000001   115  0x20 0x1111  36
OpaqArea*1.0.0.1          1.1.1.1          0x800002eb  1328  0x20 0x72c9  28
OpaqArea 1.0.0.1          2.2.2.2          0x800002eb  1301  0x20 0x76bd  28
OpaqArea 1.0.0.1          3.3.3.3          0x800002fa   118  0x20 0x5cc0  28
OpaqArea 1.0.0.1          4.4.4.4          0x800002e8   115  0x20 0x84a2  28
OpaqArea*1.0.0.3          1.1.1.1          0x800002d9   785  0x20 0x818a  92
OpaqArea 1.0.0.3          2.2.2.2          0x800002d9   106  0x20 0x903   92
OpaqArea 1.0.0.3          3.3.3.3          0x8000017e   114  0x20 0x9f24  92
OpaqArea 1.0.0.3          4.4.4.4          0x800002e4   115  0x20 0x1d35  92
OpaqArea*1.0.0.4          1.1.1.1          0x8000019c   117  0x20 0x8023  92
OpaqArea 1.0.0.4          2.2.2.2          0x800002e3   118  0x20 0x3e09  92
OpaqArea 1.0.0.4          3.3.3.3          0x8000017e   114  0x20 0x931a  92
OpaqArea 1.0.0.4          4.4.4.4          0x800002e3   115  0x20 0x7ec0  92
OpaqArea*1.0.0.5          1.1.1.1          0x800002de   114  0x20 0xea69  92
OpaqArea 1.0.0.5          2.2.2.2          0x800002de   115  0x20 0xa59c  92
OpaqArea*4.0.0.0          1.1.1.1          0x800002ec   481  0x20 0x7847  44
OpaqArea 4.0.0.0          2.2.2.2          0x800002eb  1002  0x20 0x5c60  44
OpaqArea 4.0.0.0          3.3.3.3          0x800002fe   119  0x20 0x188d  44
OpaqArea 4.0.0.0          4.4.4.4          0x800002ea   115  0x20 0x2293  44
OpaqArea*7.0.0.1          1.1.1.1          0x800002d7   233  0x20 0x328b  44
OpaqArea 7.0.0.1          2.2.2.2          0x800002d7   403  0x20 0x6054  44
OpaqArea*8.0.0.1          1.1.1.1          0x800002ed  1056  0x20 0xa012  48
OpaqArea 8.0.0.1          2.2.2.2          0x800002ec   703  0x20 0x644e  48
OpaqArea 8.0.0.1          3.3.3.3          0x80000181   114  0x20 0xa922  48
OpaqArea 8.0.0.1          4.4.4.4          0x8000017f   115  0x20 0x1534  48
OpaqArea 8.0.0.2          3.3.3.3          0x80000181   114  0x20 0xe2e   48
OpaqArea 8.0.0.2          4.4.4.4          0x8000017f   115  0x20 0x534   48
OpaqArea*8.0.0.41         1.1.1.1          0x80000001   117  0x20 0x4f24  48
OpaqArea 8.0.0.41         2.2.2.2          0x80000001   118  0x20 0xdb89  48
OpaqArea*8.0.0.42         1.1.1.1          0x80000001   114  0x20 0xa5c6  48
OpaqArea 8.0.0.42         2.2.2.2          0x80000001   115  0x20 0x322c  48

Router LSAs for all the routers within area 1 are there. Anyhow, no Summary LSAs are present. This is reasonable as we explicitly configured no-summaries.

Anyhow, it seems that without Summary LSAs lsps in inet.3 are not created.

Let’s have a look at R3:

root@r3# run show route table inet.3

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:05:15, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0
2.2.2.2/32         *[L-OSPF/10/5] 00:05:15, metric 1
                    >  to 192.168.23.0 via ge-0/0/1.0
4.4.4.4/32         *[L-OSPF/10/5] 1w6d 20:34:24, metric 100
                    >  to 192.168.34.1 via ge-0/0/2.0
5.5.5.5/32         *[L-OSPF/10/5] 3w3d 23:57:15, metric 1
                    >  to 192.168.35.1 via ge-0/0/3.0
6.6.6.6/32         *[L-OSPF/10/5] 3w3d 23:56:01, metric 1
                    >  to 192.168.36.1 via ge-0/0/4.0
7.7.7.7/32         *[L-OSPF/10/5] 3w3d 21:41:24, metric 2
                    >  to 192.168.35.1 via ge-0/0/3.0, Push 1007
                       to 192.168.36.1 via ge-0/0/4.0, Push 1007

[edit]
root@r3# run show ospf database | match 2.2.2.2
Summary *2.2.2.2          3.3.3.3          0x800002e6   332  0x22 0xb35   28
Summary  2.2.2.2          4.4.4.4          0x800002e3   333  0x22 0xf24c  28
Router   2.2.2.2          2.2.2.2          0x80000322   333  0x20 0x7351 108

root@r3# run show ospf database | match 1.1.1.1
Summary *1.1.1.1          3.3.3.3          0x800002e6   372  0x22 0x390b  28
Summary  1.1.1.1          4.4.4.4          0x800002e5   373  0x22 0x1d24  28
Router   1.1.1.1          1.1.1.1          0x80000322   373  0x20 0x40b5 108

Here (R3), we have Summary LSAs to both R1 and R2. This means that R3 has a lsp to R1 but R1 does not have a lsp to R3.

Let’s check the TI-LFA path:

root@r3# run show route 1.1.1.1 exact

inet.0: 37 destinations, 37 routes (37 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[OSPF/10/10] 00:07:17, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0

inet.3: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1.1.1.1/32         *[L-OSPF/10/5] 00:07:17, metric 1
                    >  to 192.168.13.0 via ge-0/0/0.0

There is no TI-LFA path!

Apparently no-summaries broke the mechanism.

This happens even if R3 would have all the info to build the TI-LFA path.

TED has all the links along with associated Adj-SIDs:

root@r3# run show ted link topology-type l3-unicast
ID                         ->ID                          LocalPath LocalBW
1.1.1.1                      3.3.3.3                             0 0bps
1.1.1.1                      4.4.4.4                             0 0bps
1.1.1.1                      2.2.2.2                             0 0bps
2.2.2.2                      3.3.3.3                             0 0bps
2.2.2.2                      4.4.4.4                             0 0bps
2.2.2.2                      1.1.1.1                             0 0bps
3.3.3.3                      5.5.5.5                             0 0bps
3.3.3.3                      6.6.6.6                             0 0bps
3.3.3.3                      4.4.4.4                             0 0bps
3.3.3.3                      2.2.2.2                             0 0bps
3.3.3.3                      1.1.1.1                             0 0bps
4.4.4.4                      3.3.3.3                             0 0bps
4.4.4.4                      5.5.5.5                             0 0bps
4.4.4.4                      6.6.6.6                             0 0bps
4.4.4.4                      2.2.2.2                             0 0bps
4.4.4.4                      1.1.1.1                             0 0bps
5.5.5.5                      3.3.3.3                             0 0bps
5.5.5.5                      6.6.6.6                             0 0bps
5.5.5.5                      4.4.4.4                             0 0bps
6.6.6.6                      3.3.3.3                             0 0bps
6.6.6.6                      5.5.5.5                             0 0bps
6.6.6.6                      4.4.4.4                             0 0bps

[edit]
root@r3# run show ted link topology-type l3-unicast | match bps | count
Count: 22 lines

[edit]
root@r3# run show ted link topology-type l3-unicast detail | match adj | count
Count: 22 lines

Anyhow, this is not enough as TI-LFA with NSSA no-summaries is not supported.

Does this mean no backup path? Not really, we can think of some workarounds and we will look in that direction next time.

Ciao
IoSonoUmberto

Travel back in time with Junos snapshots

Traveling back in time is not yet a reality…unless you look at a Junos device 🙂

Imagine something really bad happened to your router and you would love to go back to a functioning scenario. In that case, the way to go ii relying on snapshots!

Snapshots are pretty like a time machine. They take a “picture” of the router at a given moment and allow you to restore that exact moment when needed.

Including snapshot in your daily device management is fundamental!

Moreover, it is important to understand how snapshots work and which types of snapshots we have available…yep, there are different kinds of snapshots.

Let’s start from that. There are two types of snapshots:

  • non recovery snapshots
  • recovery snapshots

Non recovery snapshots are probably the ones most people are more familiar with. They are stored within the junos volume (/dev/gpt/junos), the one where junos boots and runs.
When taken, non recovery snapshots reference the set of packages and configuration found when creating the snapshots.
It is possible to take multiple non recovery snapshots. We might see them as the equivalent of “VM snapshots” we have in ESXi or KVM.
We can instruct Junos to reboot and boot from one of these snapshots.

On the other hand, recovery snapshots are stored in a totally different volume: the OAM volume.
It also references the set of packages and configuration when taken.
Anyhow, there are some differences.
First, we can only have one recovery snapshots, not multiple ones.
Second, as already mentioned, it is stored on a different location.

This second aspect is key to understand the difference between recovery and non-recovery snapshots.

Non recovery snapshots reside in the “normal” Junos volume, the router SSD. That is the volume the router will use by default to load junos and function.

Recovery snapshot, instead, resides on a different media. We will not find it on the SSD but on a separate flash memory. Roughly speaking, the recovery snapshot is a disk dump of the junos volume on another media: the OAM volume.
This type of snapshot represents a sort of last resort in case something really bad happens. By really bad, we mean scenarios where the ssd gets damaged and Junos can no longer start. The ssd can get damaged in different ways: physical or logical. No matter the exact fault, upon that kind of failure, the router will mount the OAM volume and boot from it, using the recovery snapshot.

For this reason, it is important to keep the recovery snapshot updated. By that, I mean that after a release upgrade, we should also take a recovery snapshot so that it also uses the new release.

Keeping the recovery snapshot not in-sync with the installed Junos release might be risky. Let’s assume the device comes to your lab with a recovery snapshot running release X. Then, you upgrade Junos to release Y but you do not create a new recovery snapshot. This means recovery snapshot runs an older release. Let’s assume the new release Y allows you to use a new MPC card that was unsupported with release X. Now, a severe power outage causes your router to go down and, when powering up again, boot from the OAM volume. As a result, the router will run Junos release X which is unable to make the new MPC working properly. This means that all the interfaces of that card will be down, leading to massive network issues.
All of this could have been avoided simply by having the recovery snapshot aligned with the current release.

A non-recovery snapshot instead, might be used to simply restore a previous scenario; no need to face failures like power outage, hardware failures and so on 🙂 For example, a release upgrade did not go well and we restore the system to a pre-upgrade situation by loading a non-recovery snapshot.

If you think about it, at least in my opinion, being sure to have meaningful recovery snapshots becomes fundamental!

Let’s see how to work with snapshots.

The following command shows all the available snapshots:

root@router> show system snapshot

Non-recovery snapshots:
Snapshot snap.20180911.122327:
Location: /packages/sets/snap.20180911.122327
Creation date: Sep 11 12:23:27 2018
Junos version: 16.1R6.7

Snapshot snap.20181115.152401:
Location: /packages/sets/snap.20181115.152401
Creation date: Nov 15 15:24:01 2018
Junos version: 16.1R7.7

Snapshot snap.20200615.141312:
Location: /packages/sets/snap.20200615.141312
Creation date: Jun 15 14:13:12 2020
Junos version: 16.1R7-S4.1

Snapshot snap.20200615.152129:
Location: /packages/sets/snap.20200615.152129
Creation date: Jun 15 15:21:29 2020
Junos version: 18.4R1-S7.1

Total non-recovery snapshots: 4

Recovery Snapshots:
Snapshots available on the OAM volume:
recovery.ufs
Date created: Mon Jun 15 14:17:47 CEST 2020
Junos version: 16.1R7-S4.1

Total recovery snapshots: 1

The output lists both non-recovery (we can have more than one) and recovery (we can only have one) snapshots.

It is possible to delete a non-recovery snapshot:

root@router> request system snapshot delete snap.20200615.141312
NOTICE: Snapshot 'snap.20200615.141312' deleted successfully

A key command is the one to create recovery snapshots. The suggestion is to create it on both routing engines (if you have a dual-re system):

root@router> request system snapshot recovery routing-engine both
re0:
--------------------------------------------------------------------------
Creating image ...
Compressing image ...
Image size is 2682MB
Recovery snapshot created successfully

re1:
--------------------------------------------------------------------------
Creating image ...
Compressing image ...
Image size is 2682MB
Recovery snapshot created successfully

If you need to load the recovery snapshot, simply run:

root@router> request system recover oam-volume

It might happen that snapshot creation fails with this error:

ERROR: The OAM volume is too small to store a snapshot

In this case, start a shell and check the following folder:

root@MX1-NAT44-RE0:/var/home/admin # cd /packages/sets/active/optional/
root@MX1-NAT44-RE0:/packages/sets/active/optional # ls -alth
total 12
drwxr-xr-x  3 root  wheel   512B Jun 15  2020 .
lrwxr-xr-x  1 root  wheel    73B Jun 15  2020 jpfe-wrlinux9 -> /packages/db/jpfe-wrlinux9-x86-32-20200513.174938_builder_junos_184_r1_s7
drwxr-xr-x  4 root  wheel   2.0K Jun 15  2020 ..
lrwxr-xr-x  1 root  wheel    71B Jun 15  2020 jpfe-MXSPC3 -> /packages/db/jpfe-MXSPC3-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    75B Jun 15  2020 junos-appidd-mx -> /packages/db/junos-appidd-mx-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    67B Jun 15  2020 jail-runtime -> /packages/db/jail-runtime-x86-32-20200430.3cd74ef_builder_stable_11
lrwxr-xr-x  1 root  wheel    40B Jun 15  2020 junos-install-mx-x86-64 -> /packages/db/junos-mx-x86-64-18.4R1-S7.1
lrwxr-xr-x  1 root  wheel    68B Jun 15  2020 sflow-mx -> /packages/db/sflow-mx-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    74B Jun 15  2020 junos-secintel -> /packages/db/junos-secintel-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    76B Jun 15  2020 junos-runtime-mx -> /packages/db/junos-runtime-mx-x86-32-20200513.174938_builder_junos_184_r1_s7
drwxr-xr-x  2 root  wheel   512B Jun 15  2020 boot
lrwxr-xr-x  1 root  wheel    77B Jun 15  2020 junos-net-mtx-prd -> /packages/db/junos-net-mtx-prd-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    76B Jun 15  2020 junos-modules-mx -> /packages/db/junos-modules-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    73B Jun 15  2020 junos-libs-mx -> /packages/db/junos-libs-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    82B Jun 15  2020 junos-libs-compat32-mx -> /packages/db/junos-libs-compat32-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    87B Jun 15  2020 junos-dp-crypto-support-mtx -> /packages/db/junos-dp-crypto-support-mtx-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    76B Jun 15  2020 junos-daemons-mx -> /packages/db/junos-daemons-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    36B Jun 15  2020 jsdn -> /packages/db/jsdn-x86-32-18.4R1-S7.1
lrwxr-xr-x  1 root  wheel    69B Jun 15  2020 jpfe-X960 -> /packages/db/jpfe-X960-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x  1 root  wheel    66B Jun 15  2020 jpfe-X -> /packages/db/jpfe-X-x86-32-20200513.174938_builder_junos_184_r1_s7

There, delete any file survived from old releases (e.g. packages from a 15/16 release).

Same can be done with a non-recovery snapshot:

root@router> request system snapshot load <name>

As said, before, recovery snapshot is stored on different media: the OAM volume.

Let’s see how we can locate it.

First, we run a shell as root:

root@router> start shell user root
Password:
root@router:/var/home/admin #

Next, we mount the oam volume and look for the snapshot file:

root@router:/var/home/admin # mount /dev/gpt/oam /oam

root@router:/var/home/admin # ls -la /oam
total 36
drwxr-xr-x   9 root  wheel   512 Jan 22 15:21 .
drwxr-xr-x  23 root  wheel   512 Jun 15  2020 ..
drwxr-xr-x   4 root  wheel  1024 Jan 22 15:22 boot
dr-xr-xr-x   2 root  wheel   512 Sep 10  2018 dev
dr-xr-xr-x   2 root  wheel   512 Sep 10  2018 etc
drwxr-xr-x   2 root  wheel   512 Sep 10  2018 mnt
drwxr-xr-x   2 root  wheel   512 Jan 22 15:23 snapshot
drwxrwxrwt   2 root  wheel   512 Sep 10  2018 tmp
drwxr-xr-x   2 root  wheel   512 Sep 10  2018 var

root@router:/var/home/admin # ls -la /oam/snapshot/
total 2747692
drwxr-xr-x  2 root  wheel         512 Jan 22 15:23 .
drwxr-xr-x  9 root  wheel         512 Jan 22 15:21 ..
-rw-r--r--  1 root  wheel          12 Jan 22 15:23 VERSION
-rwxr-xr-x  1 root  wheel  2812899328 Jan 22 15:22 recovery.ufs.uzip

At the end, remember to unmount the oam volume:

root@router:/var/home/admin # umount /dev/gpt/oam

Finally, let’s try to think how snapshot might be included in our maintenance/management procedures.

When upgrading the release we might follow these stages:

  • prepare new release packages
  • take non-recovery snapshot
  • take recovery snapshot
  • upgrade release
  • verify everything is working (if not you can load the previous non-recovery snapshot)
  • take non-recovery snapshot
  • take recovery snapshot

During normal operations and daily routines, we might think of:

  • taking recovery snapshots regularly (once a week, along with another tool backing up configuration)
  • taking snapshots upon any hardware change (e.g. new cards)
  • taking snapshots upon the introduction of new services

The key concept behind all those considerations is “try to have your snapshots as close as possible to the current situation of your router so that, upon failures, you can restore your device and have it in a status which close to the target one”.

This is important for at least two reasons:

  • even after booting from the OAM volume, the device and its configured services should work
  • it will not require a lot of effort to bring the device to the desired status (this is easier if additional procedures like “regular configuration backup” are in place, as suggested above)

So, what now? Simple, take snapshots!

Ciao
IoSonoUmberto