Purity and Testability

Only pure functions are truly testable. Moreover, the relationship between purity and testability forms a solid justification for functional programming.

Let me explain…

For starters, what is a pure function? A function without side effects. A function in the mathematical sense. Now, a function is a type of relation (set of ordered pairs), where each element of the domain corresponds to exactly one element of the range. A (total) function is also defined for each element of the domain, unlike a partial function, which is not.

So, what does this mean for testability? It means that I can potentially write a test which runs the function over each element of the domain, and asserts that the correct element of the range is returned. One simple way of doing this is to actually spell out the pairs that consitute the expected definition of the function, then map the function over its domain and zip the results with the domain, and then compare the two.

I’ll give an example with a boolean domain, as it has a small domain. I’ll use Haskell for simplicity.

Let’s look at the function isNot, which flips a boolean.

isNot :: Bool -> Bool
isNot x = if (x == True) False else True

Alternatively, pattern matching gives us a syntax which looks very much like the relation pairs:

isNot’ :: Bool -> Bool
isNot’ True = False
isNot’ False = True

Now, we could test this using the following functions – the both evaluate to True:

testTrue = (isNot True) == False
testFalse = (isNot False) == True

The domain is {True, False}, and the expected relation is {(True, False), (False, True)}. So, let’s do the map and zip thing. The ‘spellOut’ function does this for any function and any domain.

spellOut :: (a -> b) -> [a] -> [(a, b)]
spellOut f domain = zip domain (fmap f domain)

So, then we can say this, which is a complete test of this function.

test = (spellOut isNot [True, False]) == [(True, False), (False, True)]

Pretty cool, huh? Of course, there are many ways to test a function, this just works well for the demonstration.

Now, what happens if we throw a side-effect into the mix? The mathematical notion of a function has no concept of effects – it’s just a (total) relation between a domain and a range. Mathematical functions are defined, they don’t do things, they just exist. The concept of a side effect is introduced just so that software can do useful things.

In the pure function example, we tested by computing the (domain-value, range-value) pairs, and comparing with our expectations. An impure function still has a domain and range, but it also has other stuff that happens. We can compare domain/range pairs… but how do we test the other stuff?

There is nothing in the return value that indicates what side effect has been performed (I’ll get to effect tracking in a minute). Imagine if we could get a value that definitively represented a side effect that was performed… if that value was part of the return value, then we could compare that value against our expected. No such concept exists – a side effect is something that happened, it can be described, but it cannot be bottled up into a value.

So, what can we do instead? Well, we can execute the side-effecting function, then look for evidence that the expected side effect happened. This is better than nothing, but it is still not a true test of the side effect – you are only looking at some state that could have been influenced by the expected side effect. It’s like investigating crime – looking for clues in a crime scene, relying on witness testimony, maybe you’ll get a conviction, maybe not. Maybe someone is lying. Did a crucial piece of evidence go missing? Did someone alter the crime scene? What if we send an innocent person to jail, or let the guilty party go free?

Perhaps the effect didn’t fire, but some background process did a similar thing (false positive). Perhaps the effect did it’s job, but then a background process reverted it (false negative). There is no way of knowing.

Worse, the Observer Effect comes into play – observing an event changes the event. In our case, in order to observe the side effect, we are performing another side effect. As such, there are two effects occurring, whereas in production there would only be one. Your extra effect could give you false positives or false negatives.

It gets even worse than this! You are looking for clues that a certain effect was performed, but what if the function you’re testing performed another effect? You cannot test that a function does not perform any other effects in addition to the one you are observing.

Now, there is some good news here, in the form of Effect Tracking. Effect Tracking is compiler-enforced acknowledgement of side effects. That is, if a function contains code that can performs a side effect of type X, then X must be part of its type signature. This will tell you that no effects of type Y can occur… but you still have many of the problems above when looking for evidence of type X. Effect Tracking also lets you easily see which functions are pure and which are not. It also encourages you to write more pure functions. And, in Haskell, effects are tracked using monads, so you have all the normal monadic tools available to help you deal with them. The flipside is that, at least initially, it feels like you are spending a lot of time fighting with the effect tracking and going in and out of monadic code… there’s a learning curve, but it does become easier, and the benefits are massive. And I’m definitely up for learning something hard if it improves my code.

So, given all of these problems, I’m of the opinion that side effecting code is not testable. Of course, this comes down to the definition of “testable” and, furthermore, the definition of “test”. What does it mean to test a piece of code? What does it mean for a piece of code to be “tested”?

In the first example, I wrote a test that definitively and exhaustively proved the relationship between the domain and range. That code is tested. No arguments there.

What about a side-effecting function? We can write a “test” that executes the code and does some assertions, but it’s not the same. We haven’t proved anything. Certain defects may have resulted in test failures, and certain correct code may have passed, but we have no guarantees. That’s a far weaker position to be in… but maybe that’s the best we can do. But if it’s the best we can do, we want to write code in a way that minimizes how often we are in this position.

This really highlights the importance of keeping your code as pure as possible. Keeping a clear separation between logic and effect… and pushing that boundary hard. This is the essence of functional programming.

For me, testability is the number one justification for functional programming, and is how I got into it. TDD got me into writing tests, and I eventually realised that the tests were far easier to write, and far more reliable, when the code was pure. Of course, there are many other benefits of functional programming, but this has got to be the knockout punch.

2 Responses to “Purity and Testability”

  1. Daniel Waterworth Says:

    Functional purity is certainly important to testability, but it’s not the whole story. You said, “It means that I can potentially write a test which runs the function over each element of the domain”, but the domain can be – and in fact usually is – infinite, because recursive types are more common than non-recursive types.

    So there’s a continuum of testability:

    impure code < pure code with recursive types < pure code with no recursive types

  2. techtangents Says:

    Good point. Thanks.

    Recursive types may not be exhaustively testable due to infinite domain, but properties of them can still be proved inductively and tested over large sample sets a la QuickCheck. While your continuum is correct, I think the testability penalty incurred in introducing recursive types is much lower than when introducing impurity.