QSEN #1 – Testing scientific code

Hello!

Welcome to our very first e-mail of the Quantum Software Newsletter! We’re glad you’re here :)

I remember this one time when a colleague approached me asking for help. He had a bug in the algorithm he was re-implementing from a paper. He was about 3 months into the project, had one huge Python script and one huge notebook and had no clue where the problem might be.

Do you have any tests?
Sure!

So he showed me this single test with a very elaborate test case.

How does this test help you?
It doesn’t really, I just heard I need to have some tests so I added it.

And then we had a loooong conversation…

Today we wanted to share some strategies about testing scientific code. Testing software is very broad topic, so here we’ll just scratch the surface and focus on a specific case.

Our scenario

Imagine you’re in the situation similar to my colleague’s. You are implementing some new algorithm and you are at the stage where you’re playing with the idea, rapidly prototyping, trying out different scenarios. The code you have is very dirty, things constantly change. You think: “People tell me I should be writing tests, but come on, this code is way too unstable to write any tests. Writing tests now is waste of time – I don’t even know exactly what the output of this algorithm should be, so I’ll be rewriting them over and over. I’ll write tests once I know how the algorithm should look like.”

This is not an unreasonable thing to say! However, 2 months later you might find that you’re in exactly same spot. Your code, now thousands lines long, does not work. There is something wrong, and you have no clue what it is.

Writing tests is not a silver bullet, but from our experience we can tell you that having at least rudimentary test suite may save you a lot of headache. Not only do tests improve reliability of your code and prevent unexpected regressions, but actively thinking about the testability of your code makes it more robust and modular.

Approach 1 – high-level tests with generated data

Let’s start with a fairly common approach. It is better than nothing, but unfortunately comes with a lot of downsides.

It basically works like this – your algorithm is pretty complicated and it produces some numerical data, which is somewhat hard to interpret. Perhaps an array of numbers, perhaps even a single value, but with high precision.

So what you do? You run the algorithm, save the inputs, save the outputs and then write a simple tests which looks like this:

def test_my_algorithm():
	inputs = [0.1, 0.5, 0.3, 0.1]
	target_outputs = [0.00053123, 0.00133322, 0.995454, 0.00268155]
	outputs = my_algorithm(inputs)
	assert outputs == target_outputs
	# or even better if you expect some numerical issues:
	np.testing.assert_almost_equal(outputs, target_outputs, decimal=4)

Let’s take a look at pros and cons:

Pros:

You have a test! If my_algorithm changes its logic, this test will fail.
It is a good introduction to even setting up some tests for your code. Once you have one tests, it’s much easier to add the second one! This is especially important if you

Cons:

But it will fail even if the changes are insignificant – e.g. they change numerical precision.
You risk that this test will be based on the incorrect data – if your current implementation is buggy, then when you fix the bugs the test will start failing. And tests are supposed to work the other way round.
When this happens, you are likely to just replace target_outputs with some new outputs.
It’s hard to say where exactly the failure happens.
Overall, you usually will have very low confidence in such testing scheme.
Not the best way if your algorithm has some dose of randomness (though you can use seeds to somewhat alleviate this problem)

Approach 2 – high-level test with toy examples

Here’s a bit more involed, but often much more effective way to tackle this problem.

Rather than generating data from the algorithm that you’ve coded up already, come up with some toy examples for which you can calculate the output by hand.

Start from the most trivial cases – all inputs equal zero, small 2x2 identity matrix, whatever the input for your algorithm is. Try to coming up with bigger and more complex examples, but such, that you can still calculate what the result should be by hand.

Then do the calculations, write them in a form of a test, and voila, you have a decent set of test cases!

Pros:

You have a set of tests which you can really trust.
They will allow you to debug quickly, as you can always follow your manual calculations.
They will later serve as a good documentation and examples of what your algorithm is doing.

Cons:

They might not cover some more complex parts of your algorithm
There might be cases where you can’t calculate anything by hand even for the toy example.
Might require some extra effort of going through manual calculations.

Approach 3 – property testing

“Property testing” is a very useful concept in a lot of traditional software engineering. Basically, it puts you in the framework, where you think less about what the input and output data should be, but rather what should be the properties of the output, given the input.

For example, let’s say that you’ve written an algorithm for multiplication. What are some of the properties of multiplying numbers “A*B = C”?

If A & B have the same sign, C will be positive
If A & B have opposite signs, C will be negative
If A & B are both greated than 1, C will be greater than A and B
…

You see where it is going. Even if you have a very complex algorithm, it usually still has some properties you can test. There is a Python library for property testing, hypothesis. While we have not used it, perhaps it will be a useful tool for you.

Pros:

You can write them even if your algorithm is very complex.
Most of the time they are relatively easy to write.
If they fail they quickly give you information about what fundamental aspect of your algorithm is not working.

Cons:

They don’t check if the result you get is correct or not – even if they pass, the algorithm might be still wrong

It’s worh mentioning that in proper property testing, say for function f, one generates (typically) random inputs and tests that some predicate p(f(x)) is satisfied. To call something “property testing” this generative aspect is mandatory.

Approach 4 – detailed unit tests

Ideally, your code would be nice and modular. You will have many functions, each responsible for doing one thing. Each with well defined inputs and outputs. In such world, you want to have separate unit tests for each of the modules you’re working with. This allows you to tame the complexity of your algorithm.

This is the way.

Pros:

You can pinpoint exactly which function is wrong and what requires fixing
You can have very high trust that the algorithm works correctly
You can some behaviours in the code which occur rarely

Cons:

No cons, this solution is perfect.
Just kidding, see the next session ;)

Pragmatic approach

The world is not ideal. There are deadlines and constraints. You might lack experience and skills to write beautiful code.

So while we should all strive to have clean, modularized code where we have 100% code coverage, it takes a lot of time and a lot of skill to do it well.

Here is a proposition how you might approach writing tests for your project. This is definitely suboptimal, but it should gradually increase reliability of your code while never being a huge burden. Going through this list might take one day or a couple of weeks – that depends on the project.

Start from approach 1 – take your existing algorithm, run it, save the output and create a very high-level unit test out of that. Ideally do it for a couple of different sets of inputs and outputs. Make sure to add a comment in this test that will make it clear, that these numbers are not reliable, as they have been generated by an early version of your algorithm!
Remember to run this test every time you change anything in your code.
Turn off your computer, take a piece of paper and work through some toy examples.
Once you did the math, turn on the computer again, code it as unit tests and see if they pass.
Keep fixing bugs until they pass ;)
Whenever you run into an issue and you fix it, try to write a test that will allow you to reproduce this issue. This way, if you ever change your code in such way, that you hit this problem again, you’ll immediately spot it.
Think about some properties of your algorithm. Try adding test based on them to your test suite.
You have a decent test suite, congrats! The beauty of this is that now you should be much more confident when making changes in your code – your tests will yell at you when you break something accidentally.
You should consider deleting the tests you have introduced in step 1.
You have also worked with your algorithm long enough that you should have a better idea of which parts are fairly independent and can be moved to separate functions/modules. So you can start refactoring your code into smaller parts and then have small unit tests for those parts. Congrats, you’re on your path to realizing approach 4!

Obviously the details will depend on your particular project – please use your judgmenent :)

Again – this is not how we would approach implementing and testing an algorithm from scratch. This is our best attempt to give you some advice if you find yourself in the middle of writing some scientific code and would like to do it a bit better.

Well, that was quite lengthy, but we hope also helpful!

In case you wonder what happened with that colleague of mine – during our the long conversation we went over his whole project. Not even in the code, just on the whiteboard. His single test case didn’t give him any insight on which part of the algorithm works and which doesn’t. We discussed how to modularize his code and it became immediately obvious that while the whole algorithm was indeed quite daunting, coming up with test cases for specific functions was much easier.

A couple of days later he told me that he added more tests and found the bug.

And don’t get me wrong – his tests would still not win any beauty contest for code. But they didn’t have to – they helped him solve his problem and stayed there to protect him (and others) in future.

Closing notes

If you ever were in the situation similar to the one described above and you have some other ways of solving this problem, please let us know by replying to this e-mail!

And uf you found this e-mail useful or insightful, please share this link with your colleagues who might also benefit from it: https://www.qse-newsletter.com .

Have a great day!

Michał & Konrad