In this lab, you will use build a shared library that uses run-time interpositioning to log and to replace calls to a standard library function in existing programs, without modifying those programs.
Due: 11:59:59 PM, 11 March, 2016.
Learning outcomes: Demonstrate an understanding of how specific high-level language program constructs (shared libraries) are implemented. Demonstrate an understanding of overall computer system structure and operating system design.
Your goal in this exercise is to develop a shared library that can be loaded into an existing library to do two things:
For reference, you may want to look over the last several slides from Lecture 10.
YOU MUST PERFORM THIS EXERCISE ON YOUR VM. SSH access will be sufficient for this assignment.
This is a collaborative assignment. You may work in teams (up to three people) to develop and test your library. Each member of the team should submit a copy of the assignment, but for this lab it is okay to submit the same file.
Create a file team.txt in your directory, containing the names of your team members.
Download the source code and Makefile for the first part of the lab:
wget http://www.cs.uky.edu/~neil/485/labs/3/trace-memcpy.c
wget http://www.cs.uky.edu/~neil/485/labs/3/Makefile
Read through the downloaded source file. It contains an implementation of a memcpy function that 1. finds the real memcpy function and saves a pointer to it; 2. prints a message to stderr with fprintf; then 3. calls the real memcpy function.
Read the man page for the memcpy function. What arguments does it take, what types are those arguments, and what does it return? Verify that the memcpy function in trace-memcpy.c matches the prototype shown in the man page.The global variable real_memcpy is a function pointer. It is initialized to NULL, but on the very first call to our memcpy we make it point to the "real" version of memcpy. Verify that the type of real_memcpy does in fact match the return and parameter types of memcpy. Remember that, because it is a function pointer rather than a function, there will be an extra (*...) around the name of the variable.
The function dlsym looks up a symbol from a shared library and returns a pointer to that object. According to man dlsym, what does the special value RTLD_NEXT mean? Why will this call find the real version of memcpy rather than the tracing version in trace-memcpy.c?
Use make to compile a shared library from this code. What happens if you run ./libmemcpy.so directly?
The dynamic linker and loader is responsible for loading a program's executable code, global data, and shared libraries into memory, and performing any necessary relocations. On GNU/Linux, the dynamic linker is named ld-linux.so. Take a brief look at its man page.
Here's a picture of what happens when running
ls normally:
And when we LD_PRELOAD our library:
To set an environment variable like LD_PRELOAD for the execution
of a single program, you can use the shell syntax
VAR=value program.
Try that with a simple command like ls:
LD_PRELOAD=./libmemcpy.so ls
That should be all on one line; make sure you do not have spaces around the equal sign. How many times did ls command call memcpy?
The program ltrace uses a different
technique to trace all the library calls that a program makes.
Try running ltrace on ls. It
will produce a lot of output, so tell it to send the trace to a file:
ltrace -o trace.out ls
Keep this trace file; it will be part of your submission. Look through the file. Do you recognize any of the library functions being called?
Take a look at man readdir. You'll be coming back to this man page, particular the synopsis, but also the definition of struct dirent.
readdir takes (a pointer to) a directory object (DIR *dirp) that was returned by the opendir function. It returns a pointer to a struct dirent ("directory entry"), which contains information about a file in the directory. Each time readdir is called, it returns information on the next file in the directory. After it has reached the last file, readdir returns NULL.
Usually readdir is used in a loop:
/* Somewhere in the code for ls... */ struct dirent *de; DIR *dir = opendir("some/directory"); while ((de = readdir(dir)) != NULL) do something with de->d_name etc.
Modify the program so that the function is named readdir, with the same parameters and return type as described in the man page. Also change the definition of real_memcpy to real_readdir, also with those parameters and return types.
Update all references to memcpy and real_memcpy to refer to readdir and real_readdir instead. You will also need to update the call to real_readdir so that it passes the correct arguments (the same ones your fake readdir received), and adjust the call to fprintf accordingly as well.
Remember to keep the (*...) around the name in the definition of real_readdir, because it is a function pointer, not a function!
Add a rule to the Makefile to build a library named libhide.so from your hide-readdir.c. Also be sure to add libhide.so as a prerequisite of the all: rule.
Verify that your library builds correctly, and correct the errors if it does not.
Now run ls with your library preloaded:
LD_PRELOAD=./libhide.so ls
If everything worked correctly, ls should run normally, printing some trace messages. If it crashes, double-check that you are passing the correct arguments to real_readdir, and that your function and function pointer have the correct parameter and return types (according to the man page).
Now let's do something sneaky, and make readdir skip over certain files. Then, when a process that has our library preloaded tries to list the files in a directory, it won't see that particular file.
Each time readdir is called, it will return a struct dirent * (directory entry) with information on the next file in the directory. We'll use that information to decide whether to return the directory entry to the caller, or skip ahead to the next file.
At this point, your call to real_readdir looks something like:
return real_readdir(dirp);
Modify this to save the return value in a variable and return that:
struct dirent *entry = real_readdir(dirp);
return entry;
While you're at it, remove the call to fprintf.
Now, in between calling readdir and returning, we want to inspect the directory entry to see if the file is named "secret". If you look at the definition of struct dirent in man readdir, you will see that one of the members of the struct is the filename, as an array of characters.
If the filename that real_readdir returned was "secret",
then we'll do something different instead of returning. However, we have
to be careful: real_readdir might have returned a NULL pointer,
so we have to check that first:
if (entry != NULL && strcmp(entry->d_name, "secret") == 0) ...
What to do if we did see the file "secret"? We could replace the filename
with something else, but that would leave a suspicious imposter in the
file's place. Instead, we'll just call real_readdir again on the
same directory, to return the next file. Inside your if:
return real_readdir(dirp);
Now compile your library again with make, and run ls with your library preload. It should appear to run normally again.
Create a file named "secret" (with no extension) and run ls again,
both with and
without your library preload.
touch secret
LD_PRELOAD=./libhide.so ls
ls
If all went well, you should see the file when running ls normally, and not see it when running with the preload.
What might a library like this be useful for? How could you use these powers for good? What other library calls might it be interesting to interposition?