In Late 2007 I took a position doing reverse engineering, mostly on embedded systems. RE was something I wanted to get into on a professional grade for some time but could never find a segway into it. Now being in the thick of it I’ve come to learn quite a bit through my experience. Reverse engineering takes a special breed. It takes a lot of patience to stare at a debugger or disassembler all day long. There are times I walk out of work and my eyes are blood shot from staring at bindiff or IDA all day long. This is the primary reason my blog has fallen off course. By the time I get home as of late my desire to sit being the computer isn’t always there. I mean, I want to do it but my brain tells me no! Here are a few items off the top of my head about reverse engineering embedded systems. Sometimes I’d rather take obfuscated malware then this stuff…
Learning
The ability to learn fast and get spun up on something such as an architecture is essential for doing this. Quite often when it comes to reverse engineering positions the subject matter is dealing with malware, specifically, malware on x86. Although malware can be seen on mac and linux the majority of it is found on intel based windows systems. You need a concrete knowledge of a single operating system and architecture and it will generally serve you well. When it comes to embedded systems however, you are talking about dozens of operating systems over a handful of architectures. You really need to be able to pick up core concepts of operating systems and architectures really fast.
Architectures
For some reasons the developers of the systems I have worked on can’t make up their mind. One device is x86 and then the next version is ARM, then they hopped over to PPC for last years release and this years device is x86 again. WTF! It gets confusing hoping back and forth between languages. What makes it worse is when you have to find differences in certain features such as protocol implementations or the way the device reads in data. Makes DIFFING a little bit trickier.
For embedded reversing your major architectures are: x86, MIPs, ARM, and PPC. Despite what some of my amigos think PowerPC is far from dying. I say this because it is the predominate architecture that I see in the devices I’ve worked on.
Algorithms
Data structures and algorithms help out because you can start to see patterns in disassembly and will be able to know whats going on a lot faster. Aside from that, just being able to identify structures in disassembly will often bring large portions of code together for you and make your life a lot easier.
Symbols
Every so often a vendors development team will screw up and forget to strip symbols from an updated firmware image they push out. When this happens you better be on your game because they will pull it from their site in minutes upon realizing what they did. Most often firmware images do not contain symbols which makes life a lot harder. When you have 40-60,000 unnamed functions, no imports, no exports….nothing it makes life a bitch. Sometimes you can get around pretty well with just string references and figure out whats going on. Any little bit helps but sometimes it would just be so much easier to have symbols ):
Slow Roll Your Analysis
In July of last year Cody Pierce wrote a blog post on DVLABS about cross references. One of the things he bought up was identifying common functions and clearing them out early on. As you are going through a firmware image that has 60,000 functions would you prefer to repeatedly see CALL loc_67499 or would you rather see CALL print_to_term. Instead of going straight to my objective I run an idapython script that loops through and counts the number of xrefs to each function. What I will do is start at the top and work my way down. Usually performing analysis on the first 20 or so functions because they are the most xrefed functions in the image. Later you realize the pay off from this as you are going through code and you see named functions instead of CALL loc_addr/sub_addr names.
Magic Numbers
Get intimately familiar with magic numbers! From ELF to compression they will come in handy if you have them embedded into your brain. Most firmware images are compressed with some algorithm, in some cases you will see numerous compression blocks. Being able to identify these numbers in a hexeditor will save you a lot of time when trying to find what you are really looking for. Many times the first few segments are not compressed and consist of bootloader code and the decompression routine. The meat of what you are looking for is most likely compressed!
HexEditor
This is another thing you need to have a good relationship with. Unlike PE files and ELF files you can’t just drop a binary image into IDA Pro and get magical results. You will need to open the binary in a hex editor first, identify structs and code from the bootloader and decompression routine. Your hex editor will also come in very handy when mapping the general layout of the image. Your next logical place in the hex editor is to remove the data before your compressed code so you can decompress it.