Python: how (not) to reset a dataclass
In Python, it can be useful to reset a dataclass back to its initial values, but this can be easy to get subtly wrong.
Resetting a dataclass (the wrong way) §
Can you spot the mistake?
Given this simple dataclass:
import dataclasses
@dataclasses.dataclass
class MyData:
widgets = 12
issues = []
We follow these steps to modify and then (try) to reset it:
Create an instance of the
MyData
dataclass.>>> data = MyData() >>> data.widgets 12 # ✔️ This is at the default value. >>> data.issues [] # ✔️ This is at the default value.
Modify both the
widgets
andissues
fields:>>> data.widgets = 99 >>> data.issues.append("blah")
Check the fields have changed:
>>> data.widgets 99 # ✔️ This has changed as expected. >>> data.issues ['blah'] # ✔️ This has changed as expected. >>>
(Attempt to) reset the dataclass by assigning a new instance:
>>> data = MyData()
Check the fields have been reset:
>>> data.widgets 12 # ✔️ This has reset to the default value as expected. >>> data.issues ['blah'] # 💥 Uh oh! This has *NOT* reset to the default?
What just happened? The widgets
field reset but the issues
field did not!
I’ll give you a moment to think about it.
🕛…
🕐…
🕑…
🕒…
🕒…
🕔…
🕕…
🕖…
🕗…
🕘…
🕙…
🕚…
🕛…Okay!
The problem §
The problem is that the default value for issues
(a list) is mutable.
The Python docs on dataclasses has a whole section about this, with some great examples.
The key piece of information is:
Python stores default member variable values in class attributes.
The default value is a class attribute, which gets shared between all instances of the class. So the class attribute does not get reset because all instance of the MyData
class share the same issues
attribute.1
Why typing is good §
Did you know that type annotations will catch this issue?
If you annotate issues: list = []
then Python will helpfully raise a ValueError
!
import dataclasses
@dataclasses.dataclass
class MyData:
widgets = 12
issues: list = [] # Added type annotation.
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Users\squirrel\scoop\apps\python310\current\lib\dataclasses.py", line 1184, in dataclass
return wrap(cls)
File "C:\Users\squirrel\scoop\apps\python310\current\lib\dataclasses.py", line 1175, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
File "C:\Users\squirrel\scoop\apps\python310\current\lib\dataclasses.py", line 955, in _process_class
cls_fields.append(_get_field(cls, name, type, kw_only))
File "C:\Users\squirrel\scoop\apps\python310\current\lib\dataclasses.py", line 812, in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'list'> for field issues is not allowed: use default_factory
There’s a good reason to continue adding type hints to your Python code!
The solution §
As the ValueError
tells us, use a default_factory
as follows:
import dataclasses
@dataclasses.dataclass
class MyData:
widgets: int = 12
issues: list = dataclasses.field(default_factory=list)
Then if we run the same steps as before:
>>> # 1. Create
>>> data = MyData()
>>> data.widgets
12
>>> data.issues
[]
>>> # 2. Modify
>>> data.widgets = 99
>>> data.issues.append("blah")
>>> # 3. Check
>>> data.widgets
99
>>> data.issues
['blah']
>>> # 4. Reset
>>> data = MyData()
>>> # 5. Check
>>> data.widgets
12
>>> data.issues
[] # ✔️ This has reset to the default value as expected.
All good!
Yes, this is very similar to why you should not use mutable default arguments in Python functions. They get shared too. ↩︎