smhk

Python: how (not) to reset a dataclass

In Python, it can be useful to reset a dataclass back to its initial values, but this can be easy to get subtly wrong.

Resetting a dataclass (the wrong way) §

Can you spot the mistake?

Given this simple dataclass:

import dataclasses

@dataclasses.dataclass
class MyData:
    widgets = 12
    issues = []

We follow these steps to modify and then (try) to reset it:

  1. Create an instance of the MyData dataclass.

    >>> data = MyData()
    >>> data.widgets
    12          # ✔️ This is at the default value.
    >>> data.issues
    []          # ✔️ This is at the default value.
  2. Modify both the widgets and issues fields:

    >>> data.widgets = 99
    >>> data.issues.append("blah")
  3. Check the fields have changed:

    >>> data.widgets
    99          # ✔️ This has changed as expected.
    >>> data.issues
    ['blah']    # ✔️ This has changed as expected.
    >>>
  4. (Attempt to) reset the dataclass by assigning a new instance:

    >>> data = MyData()
  5. Check the fields have been reset:

    >>> data.widgets
    12          # ✔️ This has reset to the default value as expected.
    >>> data.issues
    ['blah']    # 💥 Uh oh! This has *NOT* reset to the default?

What just happened? The widgets field reset but the issues field did not!

I’ll give you a moment to think about it.

🕛…

🕐…

🕑…

🕒…

🕒…

🕔…

🕕…

🕖…

🕗…

🕘…

🕙…

🕚…

🕛…Okay!

The problem §

The problem is that the default value for issues (a list) is mutable.

The Python docs on dataclasses has a whole section about this, with some great examples.

The key piece of information is:

Python stores default member variable values in class attributes.

The default value is a class attribute, which gets shared between all instances of the class. So the class attribute does not get reset because all instance of the MyData class share the same issues attribute.1

Why typing is good §

Did you know that type annotations will catch this issue?

If you annotate issues: list = [] then Python will helpfully raise a ValueError!

import dataclasses

@dataclasses.dataclass
class MyData:
    widgets = 12
    issues: list = []  # Added type annotation.
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\sam\scoop\apps\python310\current\lib\dataclasses.py", line 1184, in dataclass
    return wrap(cls)
  File "C:\Users\sam\scoop\apps\python310\current\lib\dataclasses.py", line 1175, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
  File "C:\Users\sam\scoop\apps\python310\current\lib\dataclasses.py", line 955, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
  File "C:\Users\sam\scoop\apps\python310\current\lib\dataclasses.py", line 812, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'list'> for field issues is not allowed: use default_factory

There’s a good reason to continue adding type hints to your Python code!

The solution §

As the ValueError tells us, use a default_factory as follows:

import dataclasses

@dataclasses.dataclass
class MyData:
    widgets: int = 12
    issues: list = dataclasses.field(default_factory=list)

Then if we run the same steps as before:

>>> # 1. Create
>>> data = MyData()
>>> data.widgets
12
>>> data.issues
[]
>>> # 2. Modify
>>> data.widgets = 99
>>> data.issues.append("blah")
>>> # 3. Check
>>> data.widgets
99
>>> data.issues
['blah']
>>> # 4. Reset
>>> data = MyData()
>>> # 5. Check
>>> data.widgets
12
>>> data.issues
[]  # ✔️ This has reset to the default value as expected.

All good!


  1. Yes, this is very similar to why you should not use mutable default arguments in Python functions. They get shared too. ↩︎