CS 485 Lab 2
Reverse engineering with gdb

In this lab, you will use gdb and other tools to reverse-engineer a function (provided as object code only) and rewrite it in C.

Due: 11:59:59 PM, 12 February, 2016.

Reverse engineering

If "engineering" is the process of taking a specification and turning it into a design and then a product, then "reverse engineering" is the process of taking a product and determining its design or specification. In the case of software, one goal of reverse engineering is to take a program and, by both inspecting it and observing its behavior, to determine what that program does. The term "reverse engineering" may bring to mind industrial espionage (and a bad Ben Affleck movie from 2003), but it has far wider applicability:

Your goal in this exercise is to use gdb to reverse engineer a piece of binary object code: trace a function instruction-by-instruction, determine what it does, and write C code that replicates its behavior.

Steps to Perform

YOU MUST PERFORM THIS EXERCISE ON YOUR VM. SSH access will be sufficient for this assignment.

This is a collaborative assignment. You may work in teams to trace the program, but must write up your answers yourself.

  1. Prepare a plain text file in which you will answer ten questions from this assignment.
    1. You may edit this file either on your computer or on the VM. It will probably be easier if you edit the file in a separate window from where you are running commands.
    2. Q0: Questions to answer in the text file appear in a box like this one. Include the question number with your answer.

      For question 0, list the names of classmates you worked with on this assignment.

  2. Download the source code: wget http://www.cs.uky.edu/~neil/485/labs/2/cs485-lab2.tar.gz
    1. Then extract the tar file: tar xzvf cs485-lab2.tar.gz and change into the directory it created.
    2. You should have a Makefile, a header file, two object files, and two C source files.
    3. Look at main.c, and in particular the call to the mystery1 function. Look at the function's prototype in mystery.h.
  3. Build the program using make.
    1. Q1: What is the full command executed by the Makefile to link together object files and produce the mystery1 executable?
      (You can copy-paste the line from make's output).
    2. Run ./mystery1 to see the program's result.
  4. Run the program in a debugger: gdb ./mystery1
    1. Set a breakpoint on the mystery1 function: break mystery1
    2. Tell gdb to start executing the program: run
    3. You will find yourself back at the prompt with the program stopped at the breakpoint. But, without source code for the function, it will take a little more work to see what is happening.
      1. To see the assembly code for the function, disassemble
      2. To see the register contents, info reg
      3. Q2: Which registers hold the values of the three function parameters a, b, and c? Remember that main called mystery1(100, 200, 300)
    4. Use the display command to show the current instruction and some relevant register values while you are stepping through the function's code:
          display/i *$rip
          display $rax
          display $rdx
          display $rsi
          display $rdi
      • If you make a typo, you can use display to find the number of a display, and undisplay number to disable it; or plain undisplay to disable all of them.
      • If you want to save typing later, you can create a ~/.gdbinit file and define a new command to run those four display commands:
            define mydisplay
              display/i *$rip
              display $rax
              display $rdx
              display $rsi
              display $rdi
            end
        Then you can run mydisplay whenever you want to enable those displays.
    5. Single-step through the machine code with the nexti command (abbreviated ni), until you get to the retq instruction.
      1. One of the instructions executed was a conditional jump. Did it take the jump or not (look at the target of the jump and the address of the next instruction)? What was being compared in the preceding cmp instruction?
      2. Q3: What could have been different about the inputs to make the jump go the other way?
      3. Q4: After the jump, the function executed a few arithmetic instructions. Write an arithmetic expression for the return value of this code path (the value stored in %rax at the end), in terms of the three function parameters a, b, and c. For example (this is not the actual answer):
        a*a + b + c/2
    6. Now use cont to continue executing the program, then q to quit gdb
  5. Edit the call to mystery1 in main.c so that the conditional jump from Q3 will take the other path.
    1. Q5: What arguments did you use for this call? You can copy-paste the function call from main.c.
    2. Now recompile the program (make) and single-step through the function again (steps D.1–5).
      1. Q6: This time there should have been two conditional jumps. What would have made the second jump go the other way?
      2. Q7: Write an arithmetic expression for the return value of this code path. The expression will be similar, but not identical, to the expression in Q4.
  6. Now repeat step E, this time so that the second jump takes the other path.
    1. Q8: What arguments did you use for this call? You can copy-paste the function call from main.c.
    2. Q9: Write an arithmetic expression for the return value of this third code path.
  7. Using what you learned about the function's execution, including the two comparisons and the three different expressions for the return value, write a C or C++ version of mystery1. Remember that it takes three long arguments and returns a long.
    1. Q10: Paste the C or C++ code for your function. It should have a if ... else if ... else structure, with three return statements.
  8. Upload your text file to csportal, under assignment "Lab 2".

(Not for credit): If you have time when you finish, try your hand at reverse-engineering mystery2 (which is called in main2.c and implemented in mystery2.o). This function takes an array and its length, and uses a loop to... do something. In this function, the registers %rdx, %rax, and %rcx are used as local variables. Some things to consider:

  1. Where is the beginning and end of the loop? What causes the loop to exit? How many times does it execute?
  2. Inside the loop, where did the values being compared by cmp %rax, %rcx come from?
  3. What happens if the jge is false (that is, if the branch is not taken)? How would you write that in C or C++?
  4. It turns out that mystery2 implements a commonly-used operation. What would be a better name for it?