Skip to main content

Getting started with classes in Python Part 4: Hashing & Mutability

james.derrick@ansys.com | 11.18.2024

In Part 3 we concluded the "basic" knowledge that you need to understand classes. This article and subsequent ones will cover topics on more of an 'intermediate' level. Starting with the (im)mutability of dataclasses.

dataclasses aren't hashable

In Python it's important to know if things are 'hashable' or not. A "hash" is a fixed-size (usually small) value that is used as a unique label for something else. However, for the label to be unique the thing it is pointing to mustn't change, otherwise that means two things could be the same but have different hashes and that would break a lot of stuff. Only fixed objects that can not be changed can be hashed, in other words only "immutable" objects can be hashed. Data classes are "mutable" by default and thus can be changed and can not be hashed. However hashing is very useful and keeping things immutable has other benefits as well. Hashes are used to make memory searches of collections much faster as well as enforce uniqueness in a collection. There are two very common objects in Python that rely on hashing: sets and dictionaries. sets in Python are collections of hashes and you can never have non-unique values in a set, which has many uses. dicts extend this to what's known as a "Hash map". dicts contain a collection of hashes that map onto whatever object you choose to point them at. All dictionary keys must be hashable, but the values don't have to be. For example, take the Coordinate and ThreeDimensionalSpace classes from the previous parts.

from dataclasses import dataclass


@dataclass
class Coordinate:
    x: int
    y: int
    z: int


@dataclass
class ThreeDimensionalSpace:
    coordinates: set[Coordinate]

# Sets are characterised by being contained within curly braces {...} unlike the square brackets of lists [...]
# This won't work because Coordinate is mutable
my_coords = {Coordinate(i, j, k) for k in range(3) for j in range(3) for i in range(3)}

space = ThreeDimensionalSpace(my_coords)

It would make sense that coordinates is a set and not a list because there should never be multiple coordinates referring to the same point in 3D space. That's an impossibility, so we reflect that in the class definition. However, as is, this won't work! Data classes are mutable by default. So if we want it to we need to fix Coordinate and make it immutable. We can do this very simply, by including an additional parameter in the dataclass decorator on Coordinate: frozen=True.

from dataclasses import dataclass


@dataclass(frozen=True)
class Coordinate:
    x: int
    y: int
    z: int


@dataclass
class ThreeDimensionalSpace:
    coordinates: set[Coordinate]


# This will now work!
my_coords = {Coordinate(i, j, k) for k in range(3) for j in range(3) for i in range(3)}
space = ThreeDimensionalSpace(my_coords)

It works! Sets and dictionaries are very useful for maintaining uniqueness and great for searching for things. For example, we could now make a new method "is_2d_position_in_space" on ThreeDimensionalSpace that takes two values, an x and a y value, and tells you if this coordinate is present at any z of the 3D space. Such a method might look like this.

from dataclasses import dataclass


@dataclass(frozen=True)
class Coordinate:
    x: int
    y: int
    z: int


@dataclass
class ThreeDimensionalSpace:
    coordinates: set[Coordinate]

    def is_2d_position_in_space(self, x: int, y: int):
        # get a unique set of all the z coordinates because they're all going to be repeated a LOT.
        all_z_coords = {c.z for c in self.coordinates}
        # Go over each one. Does the x, y coordinate exist at that z?
        for z in all_z_coords:
            coord = Coordinate(x, y, z)
            # if coordinate exists at this z return True and exit (it only has to be True once in this case)
            if coord in self.coordinates:
                return True
        # If the coordinate is never found, it can't exist in the space, so return False
        return False

my_coords = {Coordinate(i, j, k) for k in range(3) for j in range(3) for i in range(3)}
space = ThreeDimensionalSpace(my_coords)
print(space.is_2d_position_in_space(2, 2))
print(space.is_2d_position_in_space(1, 3))

This is pretty useful. Note that there is no restriction on space being cubic. This code would work just as well for spherical, cylindrical, or even toroidal space. All you would need to do is change the properties stored.

Part 5: classmethod and staticmethod

With mutability out of the way you should be getting better and better at using data classes. Next up is all about the non-standard methods that do not use self like most classes do. In particular, the next part will cover classmethod and staticmethod, two tools for better classes.

For a complete look at the articles in this series, have a look at this overview page: Getting started with classes in Python