2016-06-17 Edit: Use
py.test instead of
I would like to explain how I carry out testing when I write code for scientific computing. It's partly using techniques that are common to the general philosophy of testing, and partly techniques that are specific to scientific computing.
The fundamental idea is this: unless you are Chuck Norris, you will test your code. The problem is that you are likely to do that manually. The idea of testing is to automate that process.
In order to make sure that you never indulge into manual testing, the golden rule of Python programming should be:
This will force you to test even the most minute things with tests, and use automated testing.
There are two kinds of tests, and, as a beginner in automated testing, the tendency is to do functional instead of unit tests.
So, what is the difference?
Let me take the analogy of car manufacturing. A car consists of many small components, such as brakes, gear box, joints between doors, and so on.
Functional testing for the car product should be obvious: be able to start the car, drive around, brake, change gears, and so on.
Unit tests in the car project would be testing small components such as the brake, or the window joints, work as they should, in isolation.
Gary Bernhardt advocates that less that 10 % of all the test should be functional. In other words: write unit tests, not functional tests. At least, most of the time.
Another big advantage in writing unit tests is that it will impose modularity in your code. To take the car analogy again, it will force you to develop the brake in isolation, otherwise you cannot test it in isolation.
Before we continue, the directory structure is quite relevant for testing. My typical choice is to have a Python project structured as follows:
pycar/ setup.py README.md pycar/ brake.py ... tests/ test_brake.py ...
This makes sure that the tests will not be bundled in the library.
My choice is to use the standard Python testing library to write tests (but not to run them, as we will see).
So you start by creating a file called
test_something.py, and the header of the file should import unittest:
unittest library, all the tests must be methods of a class.
The idea is that classes gather tests of the same kind.
Let's write some test:
class TestThing(unittest.TestCase): def test_thing(self): assert False
Great! We have a test. Now let's see it fail.
My choice is to use py.test to run the tests.
You could also use
py.test to write the tests, but I don't use it that way.
The reason is that other people can run my tests with some other tools of their choosing (as
unittest is generally compatible with any other testing library out there).
You run the tests by running either:
py.test will find all the test files by itself.
After you ran the tests as above, you should get a failure message. In this case, what is wrong is quite obvious, as we constructed a failing test on purpose, but in general, the causes of failure might be less clear. How to debug that? Remember, you do not have any interactive Python session opened!
Well, you can call
py.test with the flags
--pdb to jump in debugger mode upon error or failure.
For instance, run again
py.test --pdb test_thing, and you will end up at the assert line that (quite obviously in this case) causes the problem.
Even better, if you have several failing tests and just want to debug that particular one, run instead
> py.test --pdb tests/test_something.py::TestThing::test_thing
The format is
You can benefit from the better debugger pdb++ by running
> pip install pdbpp
Sometimes, this is not enough. For instance, this won't do if the error is completely unexpected, so the code is doing something completely different that what you thought. In that case, it is legitimate to run the code using an interpreter.
So, what we want to do is to simply execute all the code, for instance in a Jupyter notebook.
You could try to execute the test directly:
my_test = TestThing() # this does NOT work my_test.test_thing()
However, due to how
unittest is designed, this won't work.
Here is the solution instead:
my_test = TestThing(methodName='test_thing') my_test.debug()
Now you are simply running the code that leads to the test. You can use any number of standard debugging tricks to find out what is going on.
There are two issues when testing scientific computing software. Both issues are already discussed in that blog post.
The main, general issue is that direct comparison of floats is impossible. This is simply because the internal representation of floats, as well as round-off errors, introduce minute errors everywhere. For instance, what do you expect is the oucome of the following?
0.1 + 0.2 == 0.3 # False!!
Fortunately, this issue is solved in NumPy. The comparison should be written instead:
np.allclose(0.1 + 0.2, 0.3) # True
What about arrays? There are two issues with arrays.
The first issue is float comparison, but there is another one. Try this if you haven't before:
A = np.array([1.,2.]) if A == A: # raises an exception although the arrays are obviously equal! pass
The solution is, again, to use
np.allclose(A, A) # True
You could now simply use
allclose to test equality of floats or matrices.
assert np.allclose(A, B)
But there is a better way, using a very important, somewhat overlooked, component of NumPy, namely numpy.testing.
In any test file, I import
import numpy.testing as npt
Now, the tests for the comparisons above can be written:
npt.assert_allclose(0.1 + 0.2, 0.3) npt.assert_allclose(A, A)
Other aspects of testing in scientific computing are coverage, and continuous integration, which I hope to cover in later posts.
Another interesting issue is that of automatic generation of tests, or even generation of random tests. This is essentially solved in py.test fixtures and py.test parametrizations. I hope to write more on that in subsequent posts.