So, I screwed up pretty bad. I decided to upgrade the JunOS release in this Juniper SRX210 router to the one (at the time I type this) recommended by Juniper, 11.4R10.3. When it booted up after the install, it crashed during the boot process. Well, I could have spent the time kicking myself but I am doing this upgrade off-hours and I did account for things going badly in my downtime estimate. And, this router is part of a redundant router setup using the Virtual Router Redundancy Protocol (VRRP); being down will not affect production. In other words, this is more of an annoyance than a real issue. Since I have to deal with this, how about if we learn how to restore the OS in this juniper router?
I tried a few ways and thought that the easiest one was to use a USB drive. Of course, it will not work well if you are not physically close to said router (other things will also not work well in these circumstances but that is another topic), but since I can I am doing the USB upgrade.
Procedure
- Get a USB drive. I know, this is a pretty obvious step but it is step 1. Ideally use a 1GB/2GB USB drive, formatted as fat16/fat32. Honestly I do not know how critical that is, but my experience with Cisco, which seems not to like the higher capacity ones, made me be leery. On the plus side, you should be able to find those rather easily as people replace their old ones with newer larger ones. If not, there are always the usual sources such as ebay or amazon.
- Download and copy OS image you are going to use, say junos-srxsme-11.4R10.3-domestic.tgz, into USB drive. If you are smarter than me, you would have gone to the Juniper downloads site and got all the OS images you need, placing them in your file server. I wasn't so I had to go the SRX210 download page and fetch it.
- Have your trusty serial cable and connect it to the router's console port. The default setup is the time-honored 9600 8N1. If you changed it, make sure you wrote than somewhere. I am lazy and I kinda like that setting.
- Connect USB drive to router.
- Reboot router after you attack the usb drive to it. It needs to know the drive exists as it boots up. Otherwise, it will bark like this:
loader> install file:///junos-srxsme-11.4R10.3-domestic.tgz cannot open package (error 22) loader>
When you try to install it.
Now, if you boot with USB already connected to router, it will first say something like this:
Running U-Boot CRC Test... OK. Flash: 4 MB USB: scanning bus for devices... 4 USB Device(s) found scanning bus for storage devices... 2 Storage Device(s) found Clearing DRAM........ done BIST check passed.
Some of you noticed the 2 storage devices message. It is talking about the inboard one (probably where the OS should be) and the external drive.
- Now, when you see
POST Passed Press SPACE to abort autoboot in 1 seconds
Please keep your fingers in your pockets. If you press space here, you will end up in the => prompt (U-boot). If you wait you will then see
Protected 1 sectors Loading /boot/defaults/loader.conf /kernel data=0xb0f9c0+0x134788 DA(some hot action happening here)
have your space-bar finger on standby for the next message will be
Hit [Enter] to boot immediately, or space bar for command prompt.
Then you will press space bar and get the loader> prompt. And now, it will start doing the install thingie:
loader> install file:///junos-srxsme-11.4R10.3-domestic.tgz /kernel data=0xae82f0+0x12d2b8 syms=[0x4+0x88ce0+0x4+0xc6af6] Kernel entry at 0x801000d8 ... init regular console GDB: debug ports: uart GDB: current port: uart KDB: debugger backends: ddb gdb KDB: current backend: ddb Copyright (c) 1996-2013, Juniper Networks, Inc. All rights reserved. Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. JUNOS 11.4R10.3 #0: 2013-11-15 06:56:20 UTC [...]
- After a while (I got bored and went to make me some tea), you will see it recreate the ssh key pairs and then finally be ready for business (apologies for the bad cut-n-pasting but my terminal console was being cute):
| | | | .o .. | |.+o .o.o. |X . .. .. E | |oo .. | | .+ | |.-+ root@uranus% omplete Setting initial options: . Starting optface configuration: additional daemons: eventd. Additional rout;/boot/modules -> /bo; kld netpfe drv: ifpfed_dialer default_adtwork setup:. Starting final network daemons:. setting ldconfig. Initial rc.mips initialization:. Local package initializationup access . kern.securelevel: -1 -> 1 Creating JAIL MFS partitirade.uboot="0xBFC00000" boot.upgrade.loader="0xBFE00000" Boot mILE SYSTEM CLEAN; SKIPPING CHECKS clean, 78249 free (17 frags, ar 20 16:46:25 CDT 2014 uranus (ttyu0)
Note that it remembered the hostname for the router. I still went through the configs before letting it join the router cluster. But that is pretty much it! Router is back in business.
Closing Thoughts
- The universe is Murphian; things will go wrong. Try not to stress about that.
- When you schedule downtime for upgrades, account for things going badly in your time estimates.
- The hardest thing to do is figuring out what can go wrong. But, you could ask yourself "If this upgrade halts server or just this service, what would be my backup plan?" and then see if you can answer that question.
- Next time I need to upgrade the OS in this or another router, I will have the firmware/OS on standby in a USB drive. I do not know about you but I found out when I am prepared everything works out perfectly.
- If you can afford it, redundancy is a wonderful thing.
- Always save your configs somewhere, well, safe. Having to recreate them from scratch is a bit of a drag.