This is the second post in a series meant to help improve your static reverse engineering skills. If you missed the first post, it’s available here.
There’s nothing new under the sun
With the exception of the blockchain, we haven’t had a new fundamental data structure in 30 years. The languages and tools we’re using have changed but how we construct software hasn’t. We still use arrays, vectors, lists, hash maps, queues, etc. It’s considered good practice to write the most boring and straightforward code you can -- it’s good for maintainability, for your junior teammates, and for your own future sanity, and often allows the compiler to produce better code.
Put yourself in the developer’s shoes
The best reverse engineer is also a very capable software developer. Why you may ask? Because almost no one reinvents the wheel. You should know how a developer should have solved a specific problem and then use your reverse engineering skills to confirm those assumptions.
Don’t let your hubris get in the way. You will be a much better reverse engineer by learning how to program in the language and discipline of your target software. Push back on anyone telling you otherwise -- those that disagree may be able to get the job done, but are they really effective?
As a novice, the absolute best way to get better at reverse engineering is to write code, envision the assembly created, and check the disassembly to verify your assumptions. Start by implementing the most common libc
functions – the ones that relate to user-supplied data or external data. We’re going to concern ourselves with functions like memcpy
, memcmp
, strcpy
, strlen
, and strcmp
. For each, write implementations and analyze their disassembly.
memcpy
is one of the libc
standard data copying functions. The POSIX standard specifies memcpy
shall:
The memcpy() function shall copy n bytes from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
With a function prototype of void *memcpy(void *restrict s1, const void *restrict s2, size_t n);
.
A straightforward memcpy
implementation could look like this:
void *memcpy(void *restrict s1, const void *restrict s2, size_t n)
{
char *dst = s1;
const char *src = s2;
while (n--) {
*dst++ = *src++;
}
return s1;
}
The disassembly graphs should be fairly similar, they’re both copying one byte at a time to a given length.
- In each function, we check if the number of bytes specified is zero and return early if so.
- We enter the bytewise copy loop
- Check to see if we’ve hit our specified limit
- Return the destination buffer pointer
The only difference (5) is in the strncpy
, it checks for null bytes and breaks out of the loop when it encounters one – indicating that it expects to be working with C strings.
Putting it together
After you get a handle on how the classic datastructures and their manipulation functions compile to native code, you'll be able to instantly recognize structure of the data and then more quickly determine its purpose.
Homework
- What would the graph for
strcpy
look like? - How does
strcpy
differ frommemcpy
andstrncpy
? - Take a shot at implementing
strchr
and comparing its graph to the others.