Traveling back in time is not yet a reality…unless you look at a Junos device 🙂
Imagine something really bad happened to your router and you would love to go back to a functioning scenario. In that case, the way to go ii relying on snapshots!
Snapshots are pretty like a time machine. They take a “picture” of the router at a given moment and allow you to restore that exact moment when needed.
![](https://www.startalkradio.net/wp-content/uploads/2015/11/Stewies-Time-Machine_Family-Guy_Copyright-2015-FOX.jpg)
Including snapshot in your daily device management is fundamental!
Moreover, it is important to understand how snapshots work and which types of snapshots we have available…yep, there are different kinds of snapshots.
Let’s start from that. There are two types of snapshots:
- non recovery snapshots
- recovery snapshots
Non recovery snapshots are probably the ones most people are more familiar with. They are stored within the junos volume (/dev/gpt/junos), the one where junos boots and runs.
When taken, non recovery snapshots reference the set of packages and configuration found when creating the snapshots.
It is possible to take multiple non recovery snapshots. We might see them as the equivalent of “VM snapshots” we have in ESXi or KVM.
We can instruct Junos to reboot and boot from one of these snapshots.
On the other hand, recovery snapshots are stored in a totally different volume: the OAM volume.
It also references the set of packages and configuration when taken.
Anyhow, there are some differences.
First, we can only have one recovery snapshots, not multiple ones.
Second, as already mentioned, it is stored on a different location.
This second aspect is key to understand the difference between recovery and non-recovery snapshots.
Non recovery snapshots reside in the “normal” Junos volume, the router SSD. That is the volume the router will use by default to load junos and function.
Recovery snapshot, instead, resides on a different media. We will not find it on the SSD but on a separate flash memory. Roughly speaking, the recovery snapshot is a disk dump of the junos volume on another media: the OAM volume.
This type of snapshot represents a sort of last resort in case something really bad happens. By really bad, we mean scenarios where the ssd gets damaged and Junos can no longer start. The ssd can get damaged in different ways: physical or logical. No matter the exact fault, upon that kind of failure, the router will mount the OAM volume and boot from it, using the recovery snapshot.
For this reason, it is important to keep the recovery snapshot updated. By that, I mean that after a release upgrade, we should also take a recovery snapshot so that it also uses the new release.
Keeping the recovery snapshot not in-sync with the installed Junos release might be risky. Let’s assume the device comes to your lab with a recovery snapshot running release X. Then, you upgrade Junos to release Y but you do not create a new recovery snapshot. This means recovery snapshot runs an older release. Let’s assume the new release Y allows you to use a new MPC card that was unsupported with release X. Now, a severe power outage causes your router to go down and, when powering up again, boot from the OAM volume. As a result, the router will run Junos release X which is unable to make the new MPC working properly. This means that all the interfaces of that card will be down, leading to massive network issues.
All of this could have been avoided simply by having the recovery snapshot aligned with the current release.
A non-recovery snapshot instead, might be used to simply restore a previous scenario; no need to face failures like power outage, hardware failures and so on 🙂 For example, a release upgrade did not go well and we restore the system to a pre-upgrade situation by loading a non-recovery snapshot.
If you think about it, at least in my opinion, being sure to have meaningful recovery snapshots becomes fundamental!
Let’s see how to work with snapshots.
The following command shows all the available snapshots:
root@router> show system snapshot
Non-recovery snapshots:
Snapshot snap.20180911.122327:
Location: /packages/sets/snap.20180911.122327
Creation date: Sep 11 12:23:27 2018
Junos version: 16.1R6.7
Snapshot snap.20181115.152401:
Location: /packages/sets/snap.20181115.152401
Creation date: Nov 15 15:24:01 2018
Junos version: 16.1R7.7
Snapshot snap.20200615.141312:
Location: /packages/sets/snap.20200615.141312
Creation date: Jun 15 14:13:12 2020
Junos version: 16.1R7-S4.1
Snapshot snap.20200615.152129:
Location: /packages/sets/snap.20200615.152129
Creation date: Jun 15 15:21:29 2020
Junos version: 18.4R1-S7.1
Total non-recovery snapshots: 4
Recovery Snapshots:
Snapshots available on the OAM volume:
recovery.ufs
Date created: Mon Jun 15 14:17:47 CEST 2020
Junos version: 16.1R7-S4.1
Total recovery snapshots: 1
The output lists both non-recovery (we can have more than one) and recovery (we can only have one) snapshots.
It is possible to delete a non-recovery snapshot:
root@router> request system snapshot delete snap.20200615.141312
NOTICE: Snapshot 'snap.20200615.141312' deleted successfully
A key command is the one to create recovery snapshots. The suggestion is to create it on both routing engines (if you have a dual-re system):
root@router> request system snapshot recovery routing-engine both
re0:
--------------------------------------------------------------------------
Creating image ...
Compressing image ...
Image size is 2682MB
Recovery snapshot created successfully
re1:
--------------------------------------------------------------------------
Creating image ...
Compressing image ...
Image size is 2682MB
Recovery snapshot created successfully
If you need to load the recovery snapshot, simply run:
root@router> request system recover oam-volume
It might happen that snapshot creation fails with this error:
ERROR: The OAM volume is too small to store a snapshot
In this case, start a shell and check the following folder:
root@MX1-NAT44-RE0:/var/home/admin # cd /packages/sets/active/optional/
root@MX1-NAT44-RE0:/packages/sets/active/optional # ls -alth
total 12
drwxr-xr-x 3 root wheel 512B Jun 15 2020 .
lrwxr-xr-x 1 root wheel 73B Jun 15 2020 jpfe-wrlinux9 -> /packages/db/jpfe-wrlinux9-x86-32-20200513.174938_builder_junos_184_r1_s7
drwxr-xr-x 4 root wheel 2.0K Jun 15 2020 ..
lrwxr-xr-x 1 root wheel 71B Jun 15 2020 jpfe-MXSPC3 -> /packages/db/jpfe-MXSPC3-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 75B Jun 15 2020 junos-appidd-mx -> /packages/db/junos-appidd-mx-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 67B Jun 15 2020 jail-runtime -> /packages/db/jail-runtime-x86-32-20200430.3cd74ef_builder_stable_11
lrwxr-xr-x 1 root wheel 40B Jun 15 2020 junos-install-mx-x86-64 -> /packages/db/junos-mx-x86-64-18.4R1-S7.1
lrwxr-xr-x 1 root wheel 68B Jun 15 2020 sflow-mx -> /packages/db/sflow-mx-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 74B Jun 15 2020 junos-secintel -> /packages/db/junos-secintel-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 76B Jun 15 2020 junos-runtime-mx -> /packages/db/junos-runtime-mx-x86-32-20200513.174938_builder_junos_184_r1_s7
drwxr-xr-x 2 root wheel 512B Jun 15 2020 boot
lrwxr-xr-x 1 root wheel 77B Jun 15 2020 junos-net-mtx-prd -> /packages/db/junos-net-mtx-prd-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 76B Jun 15 2020 junos-modules-mx -> /packages/db/junos-modules-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 73B Jun 15 2020 junos-libs-mx -> /packages/db/junos-libs-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 82B Jun 15 2020 junos-libs-compat32-mx -> /packages/db/junos-libs-compat32-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 87B Jun 15 2020 junos-dp-crypto-support-mtx -> /packages/db/junos-dp-crypto-support-mtx-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 76B Jun 15 2020 junos-daemons-mx -> /packages/db/junos-daemons-mx-x86-64-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 36B Jun 15 2020 jsdn -> /packages/db/jsdn-x86-32-18.4R1-S7.1
lrwxr-xr-x 1 root wheel 69B Jun 15 2020 jpfe-X960 -> /packages/db/jpfe-X960-x86-32-20200513.174938_builder_junos_184_r1_s7
lrwxr-xr-x 1 root wheel 66B Jun 15 2020 jpfe-X -> /packages/db/jpfe-X-x86-32-20200513.174938_builder_junos_184_r1_s7
There, delete any file survived from old releases (e.g. packages from a 15/16 release).
Same can be done with a non-recovery snapshot:
root@router> request system snapshot load <name>
As said, before, recovery snapshot is stored on different media: the OAM volume.
Let’s see how we can locate it.
First, we run a shell as root:
root@router> start shell user root
Password:
root@router:/var/home/admin #
Next, we mount the oam volume and look for the snapshot file:
root@router:/var/home/admin # mount /dev/gpt/oam /oam
root@router:/var/home/admin # ls -la /oam
total 36
drwxr-xr-x 9 root wheel 512 Jan 22 15:21 .
drwxr-xr-x 23 root wheel 512 Jun 15 2020 ..
drwxr-xr-x 4 root wheel 1024 Jan 22 15:22 boot
dr-xr-xr-x 2 root wheel 512 Sep 10 2018 dev
dr-xr-xr-x 2 root wheel 512 Sep 10 2018 etc
drwxr-xr-x 2 root wheel 512 Sep 10 2018 mnt
drwxr-xr-x 2 root wheel 512 Jan 22 15:23 snapshot
drwxrwxrwt 2 root wheel 512 Sep 10 2018 tmp
drwxr-xr-x 2 root wheel 512 Sep 10 2018 var
root@router:/var/home/admin # ls -la /oam/snapshot/
total 2747692
drwxr-xr-x 2 root wheel 512 Jan 22 15:23 .
drwxr-xr-x 9 root wheel 512 Jan 22 15:21 ..
-rw-r--r-- 1 root wheel 12 Jan 22 15:23 VERSION
-rwxr-xr-x 1 root wheel 2812899328 Jan 22 15:22 recovery.ufs.uzip
At the end, remember to unmount the oam volume:
root@router:/var/home/admin # umount /dev/gpt/oam
Finally, let’s try to think how snapshot might be included in our maintenance/management procedures.
When upgrading the release we might follow these stages:
- prepare new release packages
- take non-recovery snapshot
- take recovery snapshot
- upgrade release
- verify everything is working (if not you can load the previous non-recovery snapshot)
- take non-recovery snapshot
- take recovery snapshot
During normal operations and daily routines, we might think of:
- taking recovery snapshots regularly (once a week, along with another tool backing up configuration)
- taking snapshots upon any hardware change (e.g. new cards)
- taking snapshots upon the introduction of new services
The key concept behind all those considerations is “try to have your snapshots as close as possible to the current situation of your router so that, upon failures, you can restore your device and have it in a status which close to the target one”.
This is important for at least two reasons:
- even after booting from the OAM volume, the device and its configured services should work
- it will not require a lot of effort to bring the device to the desired status (this is easier if additional procedures like “regular configuration backup” are in place, as suggested above)
So, what now? Simple, take snapshots!
Ciao
IoSonoUmberto