Wave-Particle Duality of Mutable Data Structures

We all heard about the problems with mutable data structures, but it wasn’t until today when I made one of the biggest mistakes a noob programmer can make, to drive the problem with mutable data structures home.

I’ll give you a simplified version of what happened by creating a toy version of the double-slit experiment, where instead of firing photons or electrons, we are going to fire tests!

Introducing the Test Cannon

The test cannon is a simple function that returns a dictionary response depending on whether we want to observe our tests or not:

def fire(observe=False):
    output = {"mutable_data": "wave?"}
    if observe:
        output["mutable_data"] = "particle?"
    return output

Now we will create our first test to see what happens when we don’t observe our test case mid-flight:

class WaveParticleTestCase(unittest.TestCase):
    def test_without_observing(self):
        message = {"mutable_data": "wave?"}
        output = fire()
        self.assertEqual(output, message)

When we run this test, we understand that our test is exhibiting wave-like qualities.

Ok, so what happens when we run another test where we do observe our test case mid-flight?

    def test_with_observing(self):
        message = {"mutable_data": "particle?"}
        output = fire(observe=True)
        self.assertEqual(output, message)

This time our test case exhibits particle-like qualities.

When we run them together, both tests succeed, and we can happily stop coding, knowing that our little experiment works as planned.

DRYing the Swamp

As you might notice right now, our code is a wee bit wet, not something to cause significant concerns. Still, if this were a big test base, we would probably realize we should refactor our dictionaries, so we don’t have to copy and paste them every time we want to write a new test case.

Let’s extract the default dictionary, and work with it to make sure that our maps are consistent across the board.

Because the default way to run the experiment is unobserved, we will make the unobserved message the default, and change our code accordingly:

import unittest

default_message = {"mutable_data": "wave?"}
observed = "particle"


def fire(observe=False):
    output = default_message
    if observe:
        output["mutable_data"] = observed
    return output


class WaveParticleTestCase(unittest.TestCase):
    def test_without_observing(self):
        message = default_message
        output = fire()
        self.assertEqual(output, message)

    def test_with_observing(self):
        message = default_message
        message["mutable_data"] = observed
        output = fire(observe=True)
        self.assertEqual(output, message)


if __name__ == '__main__':
    unittest.main()

Running the tests, we are again passing them both, so we can wrap this up, right? Right?

Is It a Wave or a Particle?

There is only one problem with our tests now. They behave differently in different situations. Which is super bad for test cases!

When running test_without_observing alone
When running both test cases

I know, I know, this is a noob mistake, and there are a few ways to rectify the situation:

  • Copying the default dictionary before we want to change it
  • Revert back to not relying on information from outside the functions, as we did in the first version of the code
  • NEVER CHANGE A GLOBAL VARIABLE

But how did it come to this?

Let’s use The 5 whys on this problem!

The problem
Our tests are behaving differently when running in different ways.

Why did our tests behave differently?
When running in a multithreaded environment, the test_with_observing() function changed the default message, which then test_without_observing() used.

Why did one function affect the data in another function?
Because we mutated a global variable.

Now we might be satisfied with this answer and move on, or we might dig one level deeper and ask why we were allowed to mutate global variables?

I think that in the same way we moved from allocating memory to garbage collection (to being forced to handle memory correctly as in Rust?), we can move from mutable data structures to immutable ones.

Mutable data structures can get us into all sorts of trouble; many of them might be really hard to detect.

Immutable data structures make our code easier to reason about, which makes it less buggy and easier to maintain.