Classes in programming are one of the last things you learn about as a beginner, despite being one of the most important parts of most languages. In Part 1 I went over what the various parts of a class are called and the basics of what they look like. In this part I will be going over how to construct one of the simplest and most useful types of class, data classes and why you might want to at all.
What are data classes for?
When coding it is extremely common to find that you have a bunch of data that all is associated with a single thing and it would make sense to store all of that together. For example, a coordinate.
A cartesian coordinate in 3D space consists of an x
position, a y
position and a z
position. When working with lots of geometries you may often be working with lots of coordinates. Thousands, or even millions. You could store these in lists, or in tuples, but that would be harder to read.
# This is just the number of coordinates in a simple 3 x 3 x 3 cube.
>>> my_coords = [(i, j, k) for k in range(3) for j in range(3) for i in range(3)]
>>> my_coords
[(0, 0, 0),
(1, 0, 0),
(2, 0, 0),
(0, 1, 0),
(1, 1, 0),
...
(0, 2, 2),
(1, 2, 2),
(2, 2, 2)]
And accessing the different coordinates leaves a lot to be desired as well. For example, if I wanted the y-coordinate of the 3rd coordinate I would have to do this.
>>> my_coords[2][1]
0
Which is fine... but it's not clear when reading the code what's going on. The nested indexing is hard to parse visually and doesn't tell us much. We can do a lot better with a data class.
Building a data class
dataclasses
is a built-in Python library introduced in version 3.7, designed to assist users with class-creation. We need the dataclass
decorator from the library to do what we need to.
from dataclasses import dataclass
@dataclass
class Coordinate:
x: int
y: int
z: int
We create the name of our class Coordinate
and then follow up with a list of class properties. These must follow the same rules as variables and should follow the same typing conventions as well, that is snake_case_naming
. When creating a dataclass
, however you need to specify the expected types of each property in the class definition. In this case each property will be an integer.
And that's it! We can now use this class and access properties on it.
>>> from dataclasses import dataclass
>>> @dataclass
class Coordinate:
x: int
y: int
z: int
>>> my_coords = [Coordinate(i, j, k) for k in range(3) for j in range(3) for i in range(3)]
>>> my_coords
[Coordinate(x=0, y=0, z=0),
Coordinate(x=1, y=0, z=0),
Coordinate(x=2, y=0, z=0),
Coordinate(x=0, y=1, z=0),
Coordinate(x=1, y=1, z=0),
...
Coordinate(x=0, y=2, z=2),
Coordinate(x=1, y=2, z=2),
Coordinate(x=2, y=2, z=2)]
>>> my_coords[2].y
0
You can nest classes too. For example, let's create a class to store all the coordinates in.
>>> @dataclass
class ThreeDimensionalSpace:
coordinates: list[Coordinate]
>>> space = ThreeDimensionalSpace(my_coords)
>>> space.coordinates[-1].z # get the last coordinate's z-coord
2
You can also create default values for your classes using the =
syntax you are used to in functions. For example, if we wanted to add a time
property to coordinate and have it default to 0 for now, our new class would look like this.
from dataclasses import dataclass
@dataclass
class Coordinate:
x: int
y: int
z: int
time: int = 0
new = Coordinate(1, 2, 3)
print(new.time)
which would print 0
. And that's really all you need to know. Creating and using data classes is simple and rewarding.
Part 3: Methods and properties
Whilst this article concludes everything that is needed to get started, there is still a lot more to learn. The next step is to start adding your own methods to your data classes. It often makes sense associate functions that operate on a particular class with that class. And methods on a class have access to all the properties on that class by default.
For a complete look at the articles in this series, have a look at this overview page: Getting started with classes in Python