Skip to main content

Python Standard Library Diving: copy

| 10.06.2022

Python Standard Library Diving: copy

One of the first things you learn in Python is that it passes references around when assigning things. It does not copy things in memory. So you can do fun stuff like this!

In [1]: a = [1, 2, 3]

In [2]: b = a

In [3]: b.append(10)

In [4]: b
Out[4]: [1, 2, 3, 10]

In [5]: a
Out[5]: [1, 2, 3, 10]

In [6]:

SUCH Fun! Lists are mutable and in the statement b = a we actually assign the reference to the list in a to b, and not a copy of the list. So when we edit the elements of b we're actually editing the underlying list (there is only one) that a and b refer to. This is a common stumbling block for a lot of beginner programmers to get your head around.

There are, however, a number of ways of ensuring that you copy things instead of just passing the reference (but I don't recommend doing this unless you really need to). The method is quite different for different objects though. For example for lists you can do the following.

In [6]: b = a[:]

In [7]: b.append(11)

In [8]: a
Out[8]: [1, 2, 3, 10]

In [9]: b
Out[9]: [1, 2, 3, 10, 11]

In [10]:

But for more general purposes you can use copy!

Introducing copy

copy is one of those single-purpose packages that are common in the standard library, like getpass and glob. It only has two methods.

copy.copy(x)

  • Return a shallow copy of x.

copy.deepcopy(x[, memo])

  • Return a deep copy of x.

Taken from the docs.

It's pretty self-explanatory, you can either copy.copy() to perform a "shallow" copy or copy.deepcopy() to perform a "deep" copy.

The difference only matters for compound objects, such as a list of dictionaries. The "shallow" copy makes a copy of the original list, but does not make copies of the elements. They are still references to the original elements, so if you alter those elements, the original elements also change.

The "deep" copy copies everything, and as such, mutable elements are wholly new and there are no references. See the code block below in IPython that demonstrates all three behaviours.

In [1]: import copy

In [2]: a = [{'a': 1}, {'b': 2}]

In [3]: b = a

In [4]: c = copy.copy(a)

In [5]: b.append({'c': 3})

In [6]: b
Out[6]: [{'a': 1}, {'b': 2}, {'c': 3}]

In [7]: a
Out[7]: [{'a': 1}, {'b': 2}, {'c': 3}]

In [8]: c
Out[8]: [{'a': 1}, {'b': 2}]

In [9]: c[0]['a'] = 10

In [10]: c
Out[10]: [{'a': 10}, {'b': 2}]

In [11]: a
Out[11]: [{'a': 10}, {'b': 2}, {'c': 3}]

In [12]: b
Out[12]: [{'a': 10}, {'b': 2}, {'c': 3}]

In [13]: d = copy.deepcopy(a)

In [14]: d[0]['a'] = 12

In [15]: d
Out[15]: [{'a': 12}, {'b': 2}, {'c': 3}]

In [16]: a
Out[16]: [{'a': 10}, {'b': 2}, {'c': 3}]

In [17]:

If you're interested in reading about this in more detail I recommend checking out this article on Real Python about it.

Conclusion

copy can be very useful and hopefully I've shown how it can be used best, however, most of the time you should actually avoid using it. Only use copy if you absolutely have to. Generally, in my opinion, it is better to refactor your program to not have to rely on this because copy is a way of getting around a way Python is built, so using it is an indication that you aren't necessarily thinking Pythonically! Although, there are some rare cases where you need to use it they aren't common.