CS 485 Lab 2
Reverse engineering with gdb
In this lab, you will use gdb and other
tools to reverse-engineer a function (provided as object code only)
and rewrite it in C.
Due: 11:59:59 PM, 12 February, 2016.
Reverse engineering
If "engineering" is the process of taking a specification and turning it into a
design and then a product, then "reverse engineering" is the process of taking
a product and determining its design or specification. In the case of software,
one goal of reverse engineering is to take a program and, by both inspecting it
and observing its behavior, to determine what that program does. The term
"reverse engineering" may bring to mind industrial espionage (and a bad Ben Affleck
movie from 2003), but it has far wider applicability:
-
Analyzing a piece of malware to determine what it does, where
it originated, and how to stop it.
-
Deciphering an undocumented or poorly documented file format,
whether to convert legacy data or to provide compatibility with
an existing software package.
-
Customizing a piece of software (like your router's firmware)
that was never intended to be customized.
-
Reimplementing an ancient program when the source code has been
lost or is in a very low-level language.
Your goal in this exercise is to use gdb
to reverse engineer a piece of binary object code: trace a function
instruction-by-instruction, determine what it does,
and write C code that replicates its behavior.
Steps to Perform
YOU MUST PERFORM THIS EXERCISE ON YOUR VM.
SSH access will be sufficient for this assignment.
This is a collaborative assignment. You may work
in teams to trace the program, but must write up your answers yourself.
-
Prepare a plain text file in which you will answer ten questions
from this assignment.
-
You may edit this file either on your computer or on the
VM. It will probably be easier if you edit the file in
a separate window from where you are running commands.
- Q0:
Questions to answer in the text file appear in a box like this one.
Include the question number with your answer.
For question 0, list the names of classmates you worked
with on this assignment.
-
Download the source code:
wget http://www.cs.uky.edu/~neil/485/labs/2/cs485-lab2.tar.gz
-
Then extract the tar file:
tar xzvf cs485-lab2.tar.gz and
change into the directory it created.
-
You should have a Makefile, a header file, two object files, and
two C source files.
-
Look at main.c, and in particular
the call to the
mystery1
function. Look at the
function's prototype in mystery.h.
-
Build the program using make.
- Q1: What is the full
command executed by the Makefile to link together object
files and produce the mystery1
executable?
(You can copy-paste the line from make's output).
-
Run ./mystery1 to see the
program's result.
-
Run the program in a debugger: gdb ./mystery1
- Set a breakpoint on the
mystery1
function:
break mystery1
-
Tell gdb to start executing the program: run
-
You will find yourself back at the prompt with the program stopped
at the breakpoint. But, without source code for the function,
it will take a little more work to see what is happening.
- To see the assembly code for the function, disassemble
- To see the register contents, info reg
-
Q2:
Which registers hold the values of the three function parameters
a
, b
, and c
?
Remember that main
called mystery1(100, 200, 300)
-
Use the display command to show the current instruction and some
relevant register values while you are stepping through the
function's code:
display/i *$rip
display $rax
display $rdx
display $rsi
display $rdi
-
Single-step through the machine code with the
nexti command (abbreviated
ni), until you get to the
retq
instruction.
-
One of the instructions executed was a conditional
jump. Did it take the jump or not (look at the
target of the jump and the address of the next instruction)?
What was being compared in the preceding
cmp
instruction?
- Q3:
What could have been different about the inputs
to make the jump go the other way?
- Q4:
After the jump, the function executed a few arithmetic
instructions. Write an arithmetic expression for the
return value of this code path (the value stored in
%rax
at the end), in terms of the three
function parameters a
, b
, and
c
. For example (this is not the actual
answer):
a*a + b + c/2
-
Now use cont to continue executing
the program, then q to quit
gdb
-
Edit the call to
mystery1
in main.c
so that the conditional jump from Q3 will take the other path.
- Q5:
What arguments did you use for this call? You can copy-paste the
function call from
main.c
.
-
Now recompile the program (make) and
single-step through the function again (steps D.1–5).
- Q6:
This time there should have been two conditional jumps.
What would have made the second jump go the other way?
- Q7:
Write an arithmetic expression for the return value of
this code path. The expression will be similar, but not identical,
to the expression in Q4.
-
Now repeat step E, this time so that the second jump takes the
other path.
- Q8:
What arguments did you use for this call? You can copy-paste the
function call from
main.c
.
- Q9:
Write an arithmetic expression for the return value of
this third code path.
-
Using what you learned about the function's execution, including
the two comparisons and the three different expressions for the
return value, write a C or C++ version of
mystery1
.
Remember that it takes three long
arguments and
returns a long
.
- Q10:
Paste the C or C++ code for your function. It should have
a
if ... else if ... else
structure, with three
return statements.
-
Upload your text file to csportal,
under assignment "Lab 2".
(Not for credit):
If you have time when you finish, try your hand at reverse-engineering
mystery2
(which is called in main2.c
and implemented in mystery2.o). This function
takes an array and its length, and uses a loop to... do something.
In this function,
the registers %rdx
, %rax
, and %rcx
are used as local variables. Some things to consider:
- Where is the beginning and end of the loop? What causes the loop
to exit? How many times does it execute?
- Inside the loop, where did the values being compared by
cmp %rax, %rcx come from?
-
What happens if the jge is false
(that is, if the branch is not taken)? How would you write that
in C or C++?
-
It turns out that
mystery2
implements a commonly-used
operation. What would be a better name for it?