CS 115 Program 4 Steganography Spring 2014

NOTE: due dates have changed for Design and Source

Due Dates:
Individual Test Cases: Tuesday, April 8 midnight
Team Test Cases: Wednesday, April 9 during lab
Individual Design: Tuesday, April 15 midnight now Tuesday April 22
Team Design: Wednesday, April 16 during lab now Wednesday April 23
Individual Source: Sunday April 20 midnight now Sunday April 27

The educational goals of this program are to use the concepts of

Steganography is the art of concealing a message in a medium. That's a very general definition. People have implemented it in many many different ways. There is a list of resources at the bottom of the assignment for more reading if you want. Today it is being implemented on computers. Messages are stored in picture files, word processing documents, sound files, just about every kind of file that exists. Generally it is relying on "security by obscurity". The fact that there IS a message is not at all obvious, so no one searches for it. Obviously that is not the safest assumption. Usually messages to be inserted/hidden in files are first encrypted by secure methods, then inserted. Even if someone knows there is a message in the file, after they extract it, they still have a mishmash unless they can also decrypt it.

There are many algorithms for steganography. The algorithm following is similar to some real ones, but it is simpler. Steps are given first, then a worked out example. Note that this one is also a simplified version of the final algorithm you will implement, but it is the place to start.

As an example, here is a "before and after" capture of two image files. One of these pictures has a message embedded in it. Can you tell which one? Hint: look at the filenames.

Algorithm for inserting a hidden message into a file

  1. Obtain a file with many numbers in it. The more numbers the better because the longer the message you will be able to insert into it. These numbers are assumed to be between 0 and 255.
  2. Obtain your message.
  3. Find the length of the message (number of characters in it). This will be the first value inserted into the file. For this algorithm, this length can be no greater than 999.
  4. Turn the length of the message into 3 separate digits, the hundreds, the tens and the units. (All of these will be digits from 0 to 9.)
  5. Insert the three digits into the first numbers in the file in the usual order, hundreds then tens and then ones.
  6. How to insert a digit into a number:
    1. Remove the ones digit from the number - this can be done with arithmetic operations.
    2. Add the digit which is to be inserted to the number. In other words, the digit inserted takes the place of the number's one's place.
  7. Now, repeat this process of insertion for each of the characters in the message. The characters have to be represented as numbers. Think ASCII code.

Example of insertion of a message into a file:

  1. Given: the file of numbers is as follows:
    123 206 12 157 244 28 113 1 48 75 45 8 23 9 4 100 4 12 244 244 28 311
  2. Given: the message to insert is HELLO
    The length of the message is 5, so the first number to insert is 5.
  3. Break the length into 3 digits, which will be 0, 0 and 5.
  4. Looking at the first number from the file, 123. Change it so it is 120. (That is, drop the digits place number.)
  5. Add the first digit of the number to be inserted, 0, to it. That gives 120.
  6. The next number is 206. Change it so it is 200. Now add 0 to it, giving 200.
  7. The next number is 12 (or 012). Change it so it is 10. Now add the third digit to be inserted, 5. That gives 15.
  8. Pause now to see what the file data looks like:
    120 200 15 157 244 28 113 1 48 75 45 8 23 9 4 100 4 12 244 244 28 311
    The numbers which have changed have been highlighted.
  9. The ASCII codes of HELLO are 72 69 76 76 79
  10. Repeat this process for each character's ASCII code of the message.
    1. After the H is inserted (remember it's 072)
      120 200 15 150 247 22 113 1 48 75 45 8 23 9 4 100 4 12 244 244 28 311
    2. After the E is inserted (it's 069)
      120 200 15 150 247 22 110 6 49 75 45 8 23 9 4 100 4 12 244 244 28 311
    3. After the L is inserted (it's 076)
      120 200 15 150 247 22 110 6 49 70 47 6 23 9 4 100 4 12 244 244 28 311
    4. After the second L is inserted (it's 076)
      120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 4 12 244 244 28 311
    5. After the O is inserted (it's 079)
      120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 7 19 244 244 28 311
  11. Save this data to a file and you are done.

This method can be used to insert a message of any length into the data in your file, only limited by the size of the file. Any ASCII character can be used, not just letters of the alphabet.

Algorithm for Extraction of a message from a file

Basically the insertion method is reversed.

  1. Get the data from a file.
  2. From the first 3 numbers in the file, find each of their ones' digits. Use those for respectively, the hundreds place, the tens place and the ones place of the length of the message.
  3. Then repeat the extraction process for as many times as the length of the message tells you to.
    1. get the ones' digits of the next three numbers from the file
    2. use those 3 digits as parts of a number, hundreds first, then tens then ones.
    3. That number is the ASCII code for the character of the message. Save it somewhere convenient.

Example of extraction

  1. Assume this is the data from the file
    120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 7 19 244 244 28 311
  2. Take the 3 ones' digits from the first three numbers: 0 and 0 and 5.
    120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 7 19 244 244 28 311
  3. Those together are 005 (the message is 5 characters long)
  4. The next 3 numbers, 150, 247, 22, yield 0, 7 and 2. Together, those give 072 or 72. That is the ASCII of H. Save the H somewhere.
    120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 7 19 244 244 28 311
  5. The next 3 numbers, 110, 6, 49, give 0 6 and 9 or 69. That is the ASCII code for E. Now the message is HE
    120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 7 19 244 244 28 311
  6. The next 3 numbers, 70, 47, 6, give 0, 7 and 6, which is 076 or 76. That is the letter L. Now the message is HEL
    120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 7 19 244 244 28 311
  7. The next 3 numbers give another 076 or another L, so the message is HELL
    120 200 15 150 247 22 110 6 49 70 47 6 20 7 6 100 7 19 244 244 28 311
  8. The next 3 numbers give 079, which is the ASCII for O. Message is HELLO

One note about this method: The information that was discarded from the numbers when the message was inserted (the original ones' digits) is not restored by the extraction method. That is not a major drawback of the method.

Now take it to the next level:

By the way, the actual algorithm you will implement for insertion and extraction will only use every third number from the file. In other words (for now), it will use the first, fourth, seventh, tenth, thirteenth, sixteenth, etc. numbers. The others are left unchanged.

The Assignment - First Phase

Write a program which ask the user whether they want to insert a message or extract a message. If they say insert, then the program asks for a filename and a message. The program will then produce another file named "secret" which will contain the message hidden as described above. If they say extract, the program asks for filename. It then proceeds to extract the hidden message inside.

A Sample Run

Steganography Assistant
Do you want to insert a message or extract one (I/E)? I

Inserting a message

Filename? blocks.ppm
Message? This is a test

Done insertion
Thanks for your business
The program accepts the filename and the message and inserts the message into the data in the file. A new file is created with a name "coded-blocks.ppm". See the other example below for samples of what is in the original file and the coded file.

Another Sample Run

Steganography Assistant
Do you want to insert a message or extract one (I/E)? E

Extracting a hidden message

Filename? coded-blocks.ppm

The length of the message is 14
The message is This is a test
Thanks for your business
The program took in a file called coded-blocks.ppm and extracted from it "This is a test". To be helpful the start of the file that contains the message is included here - it is not actually output by the program to the screen. Note that this is a PPM file, it has the header at the start. The image was 38 pixels wide and 36 pixels tall, with a maximum color value of 255.
P3 38 36 255 250 255 255 251 252 252 254 251 251 250 255 255 258 255 255 254 255 255 241 249 249 240 249 249 254 255 255 241 249 249 250 255 255 255 250 250 251 255 255 251 250 250 255 250 250 250 255 255 253 255 255 242 251 255 251 249 255 250 248 255 255 234 255 251 242 255 251 245 253 255 246 243 250 253 241 253 
Compare this to the original file, blocks.ppm, before the message was inserted. Note that the original file does have newlines in the file at the end of every row of the image. This is not required for our purposes. It is ok if your coded file does not have them, as long as the values are separated by whitespace of some kind.
P3
38 36
255
255 255 255 252 252 252 251 251 251 255 255 255 255 255 255 255 255 255 249 249 249 249 249 249 255 255 255
249 249 249 255 255 255 250 250 250 255 255 255 250 250 250 250 250 250 254 255 255 251 255 255
249 251 255 253 249 255 255 248 255 255 234 255 255 242 255 255 245 253 255 246 243 255 253 241

Another Sample Run

Steganography Assistant
Do you want to insert a message or extract one (I/E)? a
Please enter an I or an E
Do you want to insert a message or extract one (I/E)? k
Please enter an I or an E
Do you want to insert a message or extract one (I/E)? R
Please enter an I or an E
Do you want to insert a message or extract one (I/E)? i

Inserting a message

Filename? who
File will not open
Filename? nofile
File will not open
Filename?
File will not open
Filename? one.ppm
Message? Hello

Done insertion
Thanks for your business

This shows some of the data validation being done.

Testing

READING: You should read a page on Testing Files. This will give you some ideas on how to test programs which use files.

Read the assignment carefully. Look at how the program is supposed to behave. You do not know what the code looks like yet - that is fine. Look at the example run given. Consider places where the program can have a bug.

NEW DO THIS FIRST!

Make a test plan for this program.
Save this doc file and fill in the table with test cases. You should have a total of 14 non-redundant cases. Put your name and the section at the top.

Submit this .doc file with your individual test cases at the link here.

Choose the menu choices of "TestCases" and "Program 4". This will be due on Tuesday April 8, midnight before the team development of test cases in lab on Wednesday April 9. Remember to bring this file with you to lab the next day to contribute to your team's effort.

Specifications

There are some specifications that your program needs to meet. These will affect the design and the implementation.

Your program must have and use the following functions (in addition to the main function):

Design


Decide on what steps you will need to perform to solve this problem. Write the steps in pseudocode (NOT Python!) in comments in a Python file. Save this Python file as "design3.py".

There should be at least 16 steps in the design. This does count each step inside the loops separately. You MUST state all the control structures you will use to solve this problem.

Your design must have separate headers for EACH function that you write, as usual, including pre- and post-conditions, etc.

Submit this .py file with the link here.

Choose the menu choices of "Design" and "Program 4". This is due by Tuesday April 15, midnight. Now April 22 Also bring this file with you to lab on Wednesday April 16, now April 23 to contribute to your team's effort.

Assignment Phase 2

Read an introduction to the PPM format (Portable Pixmap).

Why? because steganography is often inserted into image files or audio files. The modern formats for these kinds of files are fairly complicated to deal with (binary). But PPM is a format which can be used and manipulated pretty easily. The file is actually a text file! A PPM image file can actually be created by something like Notepad. So this is added to the assignment. The data files used for insertion and extraction will not just be runs of numbers; they will be actual images. You can see that the PPM format calls for a small header at the start of the file. This can be ignored when you read the file. The numbers after the header are triples representing the RGB values of each pixel. You will do your insertions into the R number of each RGB triple.

This change should be doable with MINIMAL changes to the code you have already written!

More discussion will appear here soon. More discussion about PPM

One nice thing about the header info of a ppm file is that it includes two numbers which indicate the number of columns of pixels in the picture and the number of rows (how wide it is and how tall it is). We don't need to deal with these numbers for display purposes, but you could use them to calculate how many pixels are in the picture and determine if the message will fit or not. If the message doesn't fit, you can cut the message off so it will fit, and tell the user that the message was truncated.

An example with a ppm file

Implement the design

Individually write a Python program to implement your design. Start with a copy of the Python file you have that has the design in it (possibly updated with improvements you or your team came up with) and write your Python code between the commented lines of the design. Make sure you eliminate any syntax and semantics errors. Here is where test cases come in handy! Verify that the program does come out with the correct behaviors.

Submit your individual source code (.py file) with the link here.

Choose the menu choices of "Code" and "Program 4". This is due by Sunday April 20, midnight. Now Sunday April 27

Please read the documentation standard on the class web page. As you can see from looking at the grading page, we will be looking to see how you meet these standards. Note particularly that we require a header comment!

Submissions:

  • Individual Test cases (.doc file) due Tuesday April 8 midnight
  • Individual Design due (.py file) Tuesday April 15 midnight Now Tuesday April 22
  • individual source code due (.py file) Sunday April 20 midnight Now Sunday April 27