简体   繁体   中英

How do you measure the quality of your unit tests?

If you (or your organization) aspires to thoroughly unit test your code, how do you measure the success or quality of your efforts?

  • Do you use code coverage, what percentage do you aim for?
  • Do you find that philosophies like TDD have a better impact than metrics?

My tip is not a way to determine whether you have good unit tests per se, but it's a way to grow a good test suite over time.

Whenever you encounter a bug, either in your development or reported by someone else, fix it twice. You first create a unit test that reproduces the problem. When you've got a failing test, then you go and fix the problem.

If a problem was there in the first place it's a hint about a subtlety about the code or the domain. Adding a test for it lets you make sure it's never going to be reintroduced in the future.

Another interesting aspect about this approach is that it'll help you understand the problem from a higher level before you actually go and look at the intricacies of the code.

Also, +1 for the value and pitfalls of test coverage already mentioned by others.

Code coverage is a useful metric but should be used carefully. Some people take code coverage, specially the percentage covered, a bit too seriously and see it as THE metric for good unit testing.

My experience tells me that more important than trying to get 100% coverage, which is not that easy, people should focus on checking the critical sections are covered. But even then you may get false positives.

I am very much pro-TDD, but I don't place much importance in coverage stats. To me, the success and usefulness of unit tests is felt over a period of development time by the development team, as the tests (a) uncover bugs up front, (b) enable refactoring and change without regression, (c) help flesh out modular, decoupled design, (d) and whatnot.

Or, as Martin Fowler put it, the anecdotal evidence in support of unit tests and TDD is overwhelming, but you cannot measure productivity. Read more on his bliki here: http://www.martinfowler.com/bliki/CannotMeasureProductivity.html

If it can break, it should be tested. If it can be tested, it should be automated.

To attain a full measure of confidence in your code you need different levels of testing: unit, integration and functional. I agree with the advice given above that states that testing should be automated (continuous integration) and that unit testing should cover all branches with a variety of edge case datasets. Code coverage tools (eg Cobertura, Clover, EMMA etc) can identify holes in your branches, but not in the quality of your test datasets. Static code analysis such as FindBugs, PMD, CPD can identify problem areas in your code before they become an issue and go a long way towards promoting better development practices.

Testing should attempt to replicate the overall environment that the application will be running in as much as possible. It should start from the simplest possible case (unit) to the most complex (functional). In the case of a web application, getting an automated process to run through all the use cases of your website with a variety of browsers is a must so something like SeleniumRC should be in your toolkit.

However, software exists to meet a business need so there is also testing against requirements. This tends to be more of a manual process based on functional (web) tests. Essentially, you'll need to build a traceability matrix against each requirement in the specification and the corresponding functional test. As functional tests are created they are matched up against one or more requirements (eg Login as Fred, update account details for password, logout again). This addresses the issue of whether or not the deliverable matches the needs of the business.

Overall, I would advocate a test driven development approach based on some flavour of automated unit testing (JUnit, nUnit etc). For integration testing I would recommend having a test database that is automatically populated at each build with a known dataset that illustrates common use cases but allows for other tests to build on. For functional testing you'll need some kind of user interface robot (SeleniumRC for web, Abbot for Swing etc). Metrics about each can easily be gathered during the build process and displayed on the CI server (eg Hudson) for all developers to see.

If your primary way of measuring test quality is some automated metric, you've already failed.

Metrics can be misleading, and they can be gamed. And if the metric is the primary (or worse yet, only) means of judging quality they will be gamed (perhaps unintentionally).

Code coverage, for example, is deeply misleading because 100% code coverage is nowhere near complete test coverage. Also, a figure like "80% code coverage" is just as misleading without context. If that coverage is in the most complex bits of code and just misses the code which is so simple it's easy to verify by eye then that's significantly better than if that coverage is biased in the reverse way.

Also, it's important to distinguish between the test-domain of a test (it's feature-set, essentially) and its quality. Test quality is not determined by how much it tests just as code quality isn't determined by a laundry list of features. Test quality is determined by how well a test does its job in testing. That's actually very difficult to sum up in an automated metric.

The next time you go to write a unit test, try this experiment. See how many different ways you can write it such that it has the same code coverage and tests the same code. See whether its possible to write a very poor test that meets these criteria and a very good test as well. I think you may be surprised at the results.

Ultimately there's no substitute for experience and judgment. A human eye, hopefully several eyes, needs to look at the test code and decide if it's good or not.

Code coverage is to testing as testing is to programming. It can only tell you when there is a problem, it can't tell you when everything works. You should have 100% code coverage and beyond. Branches of code logic should be tested with several input values, fully exercising normal, edge, and corner cases.

I normally do TDD, so I write the tests first, which helps me see how I want to be able to use the objects.

Then, when I'm writing the classes, for the most part I can spot common pitfalls (ie assumptions that I'm making, eg a variable being of a particular type, or range of values) and when these come up I write a specific test for that specific case.

Aside from that, and getting as good as code coverage as possible (sometimes it's not possible to get 100%), you're more or less done. Then, if any bugs do come up in the future, you just make sure you write a test case for it that exposes it first, and will pass when fixed. Then fix as per normal.

监控代码覆盖率可能很有用,但我没有关注任意目标利率(80%,90%,100%?),我发现随着时间的推移瞄准积极趋势是有用的。

I think some best practices for unit tests are:

  • They must be self-contained, ie not require too much configuration and external dependencies to run. Let tests build their own dependencies like files and Web sites required for the tests to run.
  • Use unit tests to reproduce bugs before fixing them. This helps prevent the bugs from surfacing again in the future.
  • Use a code coverage tool to spot critical code that is not exercised by any unit tests.
  • Integrate unit tests with nightly builds and release builds.
  • Publish test result reports and code coverage reports to a Web site where everyone in the team can browse them. The publishing should ideally be automated and integrated into the build system.

Do not expect to reach 100% code coverage unless you develop mission critical software. It can be very costly to reach this level and will for most projects not be worth the effort.

An additional technique I try to use is to partition your code into two parts. I've recently blogged about it here . The short description is to maintain your production code in two sets of libraries where one set (hopefully the larger set) has 100% line coverage (or better if you can measure it) and the other set (hopefully a tiny amount of code) has 0% coverage, yes zero percent coverage.

Your designs should allow this partitioning. This should make it easy to see the code that is not covered. Over time you may have ideas about how to move code from the smaller set to the larger set.

The concept of mutation testing seems promising as a way to measure (test?) the quality of your unit tests. Mutation testing basically means making small "mutations" to your production code and then seeing if any unit test fails. Small mutations typically means changing and to or or < to <= . If one ore more unit tests fail it means the "mutant" was caught. If the mutant survives your unit test suite it means you missed a test. When I apply mutation testing to code with 100% line and branch coverage it typically will find a few spots where I missed tests.

See https://en.wikipedia.org/wiki/Mutation_testing for a description of the concept and links to tools.

  • In addition to TDD, I found myself writing more sane tests since BDD (eg http://rspec.info/ )
  • The everlasting discussion is always to mock or not to mock. Some mocks might become more complex than the code it is testing (which usually points to bad separation of concerns).
    • Therefore I like the idea of having a metric like: test complexity per code complexity. or simplified: the number of test lines per lines of code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM