简体   繁体   中英

Testing a function that can return non-deterministic results using Python unittest

I am writing a small job scheduler in Python. The scheduler can be given a series of callables plus dependencies, and should run the callables, making sure that no task is run before any of its predecessors.

I am trying to follow a test-driven approach, and I have run into an issue testing dependency handling. My test code looks like this:

def test_add_dependency(self):
    """Tasks can be added with dependencies"""
    # TODO: Unreliable test, may work sometimes because by default, task
    #       running order is indeterminate.
    self.done = []
    def test(id):
        self.done.append("Test " + id)
    s = Schedule()
    tA = Task("Test A", partial(test, "A"))
    tB = Task("Test B", partial(test, "B"))
    s.add_task(tA)
    s.add_task(tB)
    s.add_dependency(tA, tB)
    s.run()
    self.assertEqual(self.done, ["Test B", "Test A"])

The problem is that this test (sometimes) worked even before I added the dependency handling code. This is because the specification does not state that tasks have to be run in a particular order. So the correct order is a perfectly valid choice even if the dependency information is ignored.

Is there a way of writing tests to avoid this sort of "accidental" success? It seems to me that this is a fairly common sort of situation, particularly when taking the test-driven "don't write code without a failing test" approach.

You are in the situation of every researcher looking at a collection of imperfect data and trying to say whether the hypothesis about it is true or not.

If the results vary between runs, then rerunning many times will give you a sample which you can apply statistics to to decide whether it is working or not. However, if a batch of runs will give you similar results, but a different batch on a different day gives you a different result, then your non-determinism is dependent on events outside the program itself, and you'll need to find a way to control them, ideally so that they maximise the chances of tripping up a bad algorithm.

This is the cost of non-determinism; you have to resort to statistics and you have to get the statistics right. You need to be able to accept the hypothesis with some confidence level, and also reject the null hypothesis. This requires fewer samples if you can maximise the variance of the results; have a varying CPU load, or IO interrupts, or schedule a task with random sleeps in.

Finding out what such a scheduler is affected by would probably be advisable for the purpose of defining a worthwhile test anyway.

One option would be to use a different, deterministic, version of the Schedule class (or add an option to make the existing version deterministic) for testing purposes, but that might defeat the purpose of the unit test.

Another option would be to not bother writing test cases for non-deterministic results.


In general, though, the answer to your question...

Is there a way of writing tests to avoid this sort of "accidental" success?

...is probably "no", other that being particularly vigilant when writing them. Although if you have the capacity to be vigilant enough to avoid writing questionable test cases, and you applied that vigilance to writing the code in the first place, then, arguably, you don't even need unit tests. ;-)

If the point of unit tests is to detect bugs in code, then how do you detect bugs in the unit tests?

You could write 'meta' unit tests for your unit tests, but then how do you detect bugs in the 'meta' unit tests? And so on...

Now, that's not to say that unit tests can't be useful, but they're not sufficient, in isolation, to 'prove' that the code is 'correct'. In practice, I find peer-based code reviews to be a much more effective means of detecting flaws in code.

This strategy works much of the time:

First, eliminate any external source of entropy (set your thread pool to use a single thread; mock any RNGs with pre-seeded PRNGs etc.) Then, exercise your test repeatedly to produce every combination of outputs, changing only the inputs to the machinery under test:

from itertools import permutations
def test_add_dependency(self):
    """Tasks can be added with dependencies"""
    for p in permutations("AB"):
        self.done = []
        def test(id):
            self.done.append("Test " + id)
        s = Schedule(threads=1)
        tasks = {id: Task("Test " + id, partial(test, id)) for id in "AB"}
        s.add_task(tasks['A'])
        s.add_task(tasks['B'])
        s.add_dependency(tasks[p[0]], tasks[p[1]])
        s.run()
        self.assertEqual(self.done, ["Test " + p[1], "Test " + p[0]])

This test will fail if Schedule fails to use the information from add_dependency , as that is the only source of entropy (ie information) that differs between the test runs.

I would recommend that you determine what needs to be tested before writing the test.

In your code sample above, what is being tested is that a specific sequence of tasks is generated by the scheduler, even though the actual sequence is non deterministic according to your description of the scheduler, so the test is not really providing any assurance about the code: sometimes it'll pass, sometimes it won't, and when it does, it'll be just by accident.

on the other hand, a more valuable test could be to assert the presence (or absence) of tasks in the results without asserting anything about their position: "is in set" vs "is at array position"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM