Deep copy

Python deals with most data structures as objects. This includes lists. You have to be careful about how you refer to these objects. The "aliasing" problem can cause subtle bugs in your program.

Example: If you did these steps in the Python Shell

>>>mylist = [1,2,3]
>>>mylist
[1,2,3]
>>>newlist = mylist   
>>>newlist
[1,2,3]
No surprises there, at least nothing obvious. But...
>>>newlist[1] = 7
>>>newlist
[1,7,3]
>>>mylist
[1,7,3]
BOTH lists seem to be changed! The reason for this behavior is the statement "newlist = mylist". It does NOT create another copy of the list known as mylist, and call it newlist. Instead it creates a new label "newlist" and makes it refer to the SAME list. Essentially you now have two variables which refer to the same place in memory. We say newlist is an "alias" of mylist. It is also said that newlist is a "shallow copy" of mylist.

For many programs, this is not a problem. But if you actually NEED a different, separate copy of the data structure, you have a problem.

There are (at least) two ways to fix this problem. One is "brute force". Create another data structure that is empty and copy everything from the original data structure to the new one.

>>> mylist = [1,2,3]
>>> newlist = []
>>> for i in range(len(mylist)):
	newlist.append(mylist[i])
>>> newlist
[1, 2, 3]
>>> mylist
[1, 2, 3]
>>> newlist[1] = 8
>>> newlist
[1, 8, 3]
>>> mylist
[1, 2, 3]
>>> 
This shows that there are two separate lists, mylist and newlist.

Another variation on this fix:

>>> mylist = [1,2,3]
>>> newlist = [0,0,0]
>>> for i in range(len(mylist)):
	newlist[i] = mylist[i]
>>> newlist
[1, 2, 3]
>>> mylist
[1, 2, 3]
>>> newlist[1] = 8
>>> newlist
[1, 8, 3]
>>> mylist
[1, 2, 3]
>>>
If you know how large your original data structure is, you can create another one the same size and copy over each individual element.

This is a bit tedious and will take some time if the data structures are large. There is also the problem that if the elements of the data structure are not simple data items (like integers, floats, strings), then the copy of each element may have to be done in steps also. If the elements were lists themselves, for example, they would also have to be copied one piece at a time.

An easier way to fix this problem is to use a function "deepcopy" available in a library called "copy".

>>> mylist = [1,2,3]
>>> from copy import *
>>> newlist = deepcopy(mylist)
>>> newlist
[1, 2, 3]
>>> mylist
[1, 2, 3]
>>> mylist[0] = 5
>>> mylist
[5, 2, 3]
>>> newlist
[1, 2, 3]
>>> 
Again, you can see that mylist and newlist are really separate lists. Changing one does not change the other.

"Deepcopy" is an actual term used in object-oriented programming. It means to make a completely separate copy of a data structure.

An analogy that might help: In Windows you can create a "shortcut" to a file. This shortcut is an icon you can double-click on and open / operate on / execute the actual file. The shortcut is NOT the file, it is just a pointer to the file. It takes up very little room, regardless of the size of the actual file. This is the same idea as a "shallow copy". Both the actual filename and the shortcut refer to the same place in storage. If the actual file is erased, the shortcut does not work any more. There is really only ONE copy of the data. In Windows you can right-click on a file and choose Copy from the menu and then choose Paste. This creates an actual different copy of the file. This would be making a "deep copy" of the file. One copy of the file could be erased; the other copy would still be there.

More references on deepcopy: