2016-06-17 Edit: Use py.test
instead of nose
.
I would like to explain how I carry out testing when I write code for scientific computing. It's partly using techniques that are common to the general philosophy of testing, and partly techniques that are specific to scientific computing.
The General Idea
The fundamental idea is this: unless you are Chuck Norris, you will test your code. The problem is that you are likely to do that manually. The idea of testing is to automate that process.
In order to make sure that you never indulge into manual testing, the golden rule of Python programming should be:
Programming Commandment
Thou shalt not have an interpreter (neither shell nor notebook) open during development
This will force you to test even the most minute things with tests, and use automated testing.
Unit vs. Functional
There are two kinds of tests, and, as a beginner in automated testing, the tendency is to do functional instead of unit tests.
So, what is the difference?
Let me take the analogy of car manufacturing. A car consists of many small components, such as brakes, gear box, joints between doors, and so on.
Functional testing for the car product should be obvious: be able to start the car, drive around, brake, change gears, and so on.
Unit tests in the car project would be testing small components such as the brake, or the window joints, work as they should, in isolation.
Gary Bernhardt advocates that less that 10 % of all the test should be functional. In other words: write unit tests, not functional tests. At least, most of the time.
Another big advantage in writing unit tests is that it will impose modularity in your code. To take the car analogy again, it will force you to develop the brake in isolation, otherwise you cannot test it in isolation.
Writing and Running Tests
Directory Structure
Before we continue, the directory structure is quite relevant for testing. My typical choice is to have a Python project structured as follows:
pycar/
setup.py
README.md
pycar/
brake.py
...
tests/
test_brake.py
...
This makes sure that the tests will not be bundled in the library.
How to Write Tests?
My choice is to use the standard Python testing library to write tests (but not to run them, as we will see).
So you start by creating a file called test_something.py
, and the header of the file should import unittest:
import unittest
With the unittest
library, all the tests must be methods of a class.
The idea is that classes gather tests of the same kind.
Let's write some test:
class TestThing(unittest.TestCase):
def test_thing(self):
assert False
Great! We have a test. Now let's see it fail.
How to run tests?
My choice is to use py.test to run the tests.
You could also use py.test
to write the tests, but I don't use it that way.
The reason is that other people can run my tests with some other tools of their choosing (as unittest
is generally compatible with any other testing library out there).
You run the tests by running either:
py.test tests/test_thing.py
or simply:
py.test
as py.test
will find all the test files by itself.
Debugging
Simple bugs
After you ran the tests as above, you should get a failure message. In this case, what is wrong is quite obvious, as we constructed a failing test on purpose, but in general, the causes of failure might be less clear. How to debug that? Remember, you do not have any interactive Python session opened!
Well, you can call py.test
with the flags --pdb
to jump in debugger mode upon error or failure.
For instance, run again py.test --pdb test_thing
, and you will end up at the assert line that (quite obviously in this case) causes the problem.
Even better, if you have several failing tests and just want to debug that particular one, run instead
> py.test --pdb tests/test_something.py::TestThing::test_thing
The format is <path>::<class>::<method>
.
You can benefit from the better debugger pdb++ by running
> pip install pdbpp
Hardcore bugs
Sometimes, this is not enough. For instance, this won't do if the error is completely unexpected, so the code is doing something completely different that what you thought. In that case, it is legitimate to run the code using an interpreter.
So, what we want to do is to simply execute all the code, for instance in a Jupyter notebook.
You could try to execute the test directly:
my_test = TestThing() # this does NOT work
my_test.test_thing()
However, due to how unittest
is designed, this won't work.
Here is the solution instead:
my_test = TestThing(methodName='test_thing')
my_test.debug()
Now you are simply running the code that leads to the test. You can use any number of standard debugging tricks to find out what is going on.
Comparisons in Scientific computing
There are two issues when testing scientific computing software. Both issues are already discussed in that blog post.
Comparing floats
The main, general issue is that direct comparison of floats is impossible. This is simply because the internal representation of floats, as well as round-off errors, introduce minute errors everywhere. For instance, what do you expect is the oucome of the following?
0.1 + 0.2 == 0.3 # False!!
Fortunately, this issue is solved in NumPy. The comparison should be written instead:
np.allclose(0.1 + 0.2, 0.3) # True
Comparing arrays
What about arrays? There are two issues with arrays.
The first issue is float comparison, but there is another one. Try this if you haven't before:
A = np.array([1.,2.])
if A == A: # raises an exception although the arrays are obviously equal!
pass
The solution is, again, to use allclose
:
np.allclose(A, A) # True
NumPy testing module
You could now simply use allclose
to test equality of floats or matrices.
Something like
assert np.allclose(A, B)
But there is a better way, using a very important, somewhat overlooked, component of NumPy, namely numpy.testing.
In any test file, I import numpy.testing
as
import numpy.testing as npt
Now, the tests for the comparisons above can be written:
npt.assert_allclose(0.1 + 0.2, 0.3)
npt.assert_allclose(A, A)
Other aspects
Other aspects of testing in scientific computing are coverage, and continuous integration, which I hope to cover in later posts.
Another interesting issue is that of automatic generation of tests, or even generation of random tests. This is essentially solved in py.test fixtures and py.test parametrizations. I hope to write more on that in subsequent posts.