如何测试需要文件存在的方法？

Question

For the first time, I'm using Python to create a library, and I'm trying to take the opportunity in this project to learn unit testing. 我第一次使用Python创建了一个库，我正试图利用这个项目中的机会来学习单元测试。 I've written a first method and I want to write some unit tests for it. 我写了第一个方法，我想为它编写一些单元测试。 (Yes, I know that TDD requires I write the test first, I'll get there, really.) （是的，我知道TDD要求我先写测试，我会到那里，真的。）

The method is fairly simple, but it expects that the class has a file attribute set, that the attribute points to an existing file, and that the file is an archive of some sort (currently only working with zip files, tar, rar, etc., to be added later). 该方法相当简单，但它希望该类具有file属性集，该属性指向现有文件，并且该文件是某种类型的存档（目前仅使用zip文件，tar，rar等）。，稍后补充）。 The method is supposed to return the number of files in the archive. 该方法应该返回存档中的文件数。

I've created a folder in my project called files that contains a few sample files, and I've manually tested the method and it works as it should so far. 我在我的项目中创建了一个名为files的文件夹，其中包含一些示例文件，我已经手动测试了该方法，并且它到目前为止仍然有效。 The manual test looks like this, located in the archive_file.py file: 手动测试如下所示，位于archive_file.py文件中：

if __name__ == '__main__':
    archive = ArchiveFile()

    script_path = path.realpath(__file__)
    parent_dir = path.abspath(path.join(script_path, os.pardir))
    targ_dir = path.join(parent_dir, 'files')
    targ_file = path.join(targ_dir, 'test.zip' )

    archive.file = targ_file

    print(archive.file_count())

All I do then is make sure that what's printed is what I expect given the contents of test.zip . 我所做的就是确保打印的内容是我对test.zip内容的期望。

Here's what file_count looks like: 这是file_count样子：

def file_count(self):
    """Return the number of files in the archive."""
    if self.file == None:
        return -1

    with ZipFile(self.file) as zip:
        members = zip.namelist()
        # Remove folder members if there are any.
        pruned = [item for item in members if not item.endswith('/')]
        return len(pruned)

Directly translating this to a unit test seems wrong to me for a few reasons, some of which may be invalid. 由于某些原因，我直接将其转换为单元测试似乎是错误的，其中一些原因可能无效。 I'm counting on the precise location of the test files in relation to the current script file, I'll need a large sample of manually created archive files to make sure I'm testing enough variations, and, of course, I'm manually comparing the returned value to what I expect because I know how many files are in the test archive. 我指望测试文件相对于当前脚本文件的精确位置，我需要大量手动创建的存档文件样本，以确保我测试的变量足够多，当然，我是手动将返回值与我期望的值进行比较，因为我知道测试档案中有多少文件。

It seems to me that this should be automated as much as possible, but it also seems that doing so is going to be very complicated. 在我看来，这应该尽可能自动化，但似乎这样做会非常复杂。

What's the proper way to create unit tests for such a class method? 为这样的类方法创建单元测试的正确方法是什么？

Answer 1

There are so many different way to approach this. 有很多不同的方法可以解决这个问题。 I like to think what would be valuable to test, off the top of my head, I can think of a couple things: 我喜欢思考什么是有价值的测试，从我的头脑中，我可以想到几件事：

validation logic ( if self.file == None ) 验证逻辑（ if self.file == None ）
pruning logic 修剪逻辑
that all file types claimed to be supported are actually supported 声称支持的所有文件类型实际上都受支持

This testing could take place on two levels: 此测试可以在两个级别进行：

Unittest your logic 单元测试你的逻辑
Test integration (ie supported archive types against the filesystem) 测试集成（即针对文件系统支持的归档类型）

Unittest the logic 单位测试逻辑

Unittesting the logic of your archive objects should be trivial. 单元测试归档对象的逻辑应该是微不足道的。 There looks to be a couple tests in your file_count method that could be valuable: 在你的file_count方法中看起来有几个测试可能很有价值：

test_no_file_returns_negative_one (error conditions are "hopefully" not very frequently executed code paths, and are great candidates for tests. Especially if your clients are expecting this -1 return value. test_no_file_returns_negative_one （错误条件是“希望”不是经常执行的代码路径，并且是测试的理想选择。特别是如果您的客户期望这个-1返回值。
test_zip_file_pruned_logic this looks to be very important functionality in your code, if implemented incorrectly it would completely throw off the count that your code is claiming to be able to return test_zip_file_pruned_logic这看起来是代码中非常重要的功能，如果实现不正确，它会完全抛弃代码声称能够返回的计数
test_happy_path_file_count_successful I like to have a unittest that exercises the whole function, using mocked dependencies ZipFile to make sure that everything is covered, without having to run the integration tests. test_happy_path_file_count_successful我喜欢使用一个执行整个函数的unittest，使用模拟的依赖ZipFile来确保所有内容都被覆盖，而不必运行集成测试。

Test Integrations 测试集成

I think a test for each supported archive type would be very valuable. 我认为对每种支持的存档类型进行测试非常有价值。 These could be static fixtures that live in your repo, and your tests would already know how many files that each archive has and would assert on that. 这些可能是存储在您的仓库中的静态装置，您的测试已经知道每个存档有多少文件并且会在其上断言。

I think all of your concerns are valid, and all of them can be addressed, and tested, in a maintainable way: 我认为您的所有问题都是有效的，所有这些问题都可以通过可维护的方式得到解决和测试：

I'm counting on the precise location of the test files in relation to the current script file 我指望测试文件相对于当前脚本文件的精确位置

This could be addressed by the convention of having your file fixtures stored in a subdirectory of your test package, and then using python to get the file path of your test package: 这可以通过将您的文件夹具存储在测试包的子目录中，然后使用python获取测试包的文件路径的约定来解决：

FIXTURE_DIR = os.path.join(os.path.dirname(__file__), 'fixtures')

For portable code it will be important to dynamically generate these paths. 对于可移植代码，动态生成这些路径非常重要。

I'll need a large sample of manually created archive files to make sure I'm testing enough variations 我需要大量手动创建的归档文件样本，以确保我测试的变量足够多

Yes, how many are good enough? 是的，有多少足够好？ AT LEAST a test per supported archive type. 每个支持的归档类型至少测试一次。 (netflix has to test against every single device that they have an app on :), plenty of companies have to run tests against large matrix of mobile devices) I think the test coverage here is crucial, but try to put all the edge cases that need to be covered in unittests. （netflix必须针对他们拥有应用程序的每一台设备进行测试:)，许多公司必须针对大型移动设备矩阵运行测试）我认为这里的测试覆盖率至关重要，但是尝试将所有边缘情况放到需要在单元测试中涵盖。

I'm manually comparing the returned value to what I expect because I know how many files are in the test archive. 我手动将返回值与我期望的值进行比较，因为我知道测试档案中有多少文件。

The archive will have to become static and your test will store that information. 存档必须变为静态，您的测试将存储该信息。

One thing to keep in mind are the scopes of your tests. 要记住的一件事是测试的范围。 Making a test that exercise ZipFile wouldn't be very valuable because its in the stdlibrary and already has tests. 做一个练习ZipFile的测试不会很有价值，因为它在stdlibrary中并且已经有测试。 Additionally, testing that your code works with all python filesystems/os' probably wouldn't be very valuable either, as python already has those checks. 另外，测试你的代码适用于所有python文件系统/ os'可能也不会很有价值，因为python已经有了这些检查。

But scoping your tests to verify that your application works with all file types that it says it supports, I believe, is extremely valuable, because it is a contract between you and your clients, saying "hey this DOES work, let me show you". 但是，确定您的测试用于验证您的应用程序是否适用于它支持的所有文件类型，我相信，这非常有价值，因为它是您和您的客户之间的合同，并说“嘿，这会有效，让我告诉您” 。 The same way that python's tests are a contract between it and you saying "Hey we support OSX/LINUX/whatever let me show you" 与python的测试相同的方式是它与你之间的合约说“嘿，我们支持OSX / LINUX /无论让我告诉你什么”

Answer 2

Suggest refactoring the pruning logic to a separate method that does not depend on file or ZipFile. 建议将修剪逻辑重构为不依赖于文件或ZipFile的单独方法。 This: 这个：

def file_count(self):
    ...
    with ZipFile(self.file) as zip:
        members = zip.namelist()
        # Remove folder members if there are any.
        pruned = [item for item in members if not item.endswith('/')]
        return len(pruned)

Becomes: 变为：

def file_count(self):
    ...
    with ZipFile(self.file) as zip:
        return len(self.prune_dirs(zip.namelist()))

def prune_dirs(self, members):
    # Remove folder members if there are any.
    pruned = [item for item in members if not item.endswith('/')]
    return pruned

Now, testing prune_dirs can be done without any test files. 现在，测试prune_dirs可以在没有任何测试文件的情况下完成。

members = ["a/", "a/b/", "a/b/c", "a/b/d"]
print archive.prune_dirs(members)

If you want to avoid integration testing, then you have to fake or mock ZipFile. 如果你想避免集成测试，那么你必须伪造或模拟ZipFile。 In this case, any object that provides the method namelist() would do. 在这种情况下，任何提供方法namelist（）的对象都可以。

class FakeZipFile():

    def __init__(self, filename):
        self.filename = filename

    def namelist(self):
        return ['a', 'b/', 'c']

Now I introduce a new method get_helper() on ArchiveFile 现在我在ArchiveFile上引入一个新方法get_helper（）

class ArchiveFile():

    def get_helper(self):
        return ZipFile(self.filename)

    def file_count(self):
        ...
        helper = self.get_helper()
        return len(self.prune_dirs(helper.namelist()))

...and override get_helper() in a Testing child class. ...并覆盖Testing子类中的get_helper（）。

class ArchiveFileTesting(ArchiveFile):

    def get_helper(self):
        return FakeZipFile(self.file);

A testing class let's you override just what you need from ArchiveFile to eliminate the dependency on ZipFile. 测试类允许您覆盖ArchiveFile所需的内容，以消除对ZipFile的依赖。 In your test, use the testing class and I think you have good coverage. 在您的测试中，使用测试类，我认为您有良好的覆盖率。

if __name__ == '__main__':
    archive = ArchiveFileTesting()

You will probably want to think of ways to change namelist so you can test more cases than the one shown here. 您可能想要考虑更改名称列表的方法，以便您可以测试比此处显示的案例更多的案例。

Answer 3

Oasiscircle and dm03514 were very helpful on this and lead me to the right answer eventually, especially with dm's answer to a followup question . Oasiscircle和dm03514对此非常有帮助，并最终引导我找到正确的答案，特别是对于dm对后续问题的回答。

What needs to be done is to use the mock library to create a fake version of ZipFile that won't object to there not actually being a file, but will instead return valid lists when using the nameslist method. 需要做的是使用mock库来创建一个假的ZipFile版本，该版本不会反对实际上不是文件，而是在使用nameslist方法时返回有效列表。

@unittest.mock.patch('comicfile.ZipFile')
def test_page_count(self, mock_zip_file):
    comic_file = ComicFile()
    members = ['dir/', 'dir/file1', 'dir/file2']
    mock_zip_file.return_value.__enter__.return_value.namelist.return_value \
                = members
    self.assertEqual(2, self.comic_file.page_count())

The __enter__.return_value portion above is necessary because in the code being tested the ZipFile instance is being created within a context manager. 上面的__enter__.return_value部分是必需的，因为在测试的代码中，正在上下文管理器中创建ZipFile实例。

Answer 4

Mocking works. 嘲弄工作。
Another approach is setting up an environment. 另一种方法是建立一个环境。

For your use case, setting up an environment would mean creating a temporary directory, copy whatever kinds of files you expect to live there, and run your tests in it. 对于您的用例，设置环境意味着创建一个临时目录，复制您希望在那里生活的任何类型的文件，并在其中运行您的测试。

You do have to add a parameter or a global that tells your code in what directory to look. 您必须添加一个参数或全局，告诉您的代码在哪个目录中查找。

This has been working quite well for me. 这对我来说非常有效。 However, my use case is somewhat different in that I am writing tests for code that uses an external program, so I have no way to mock anything. 但是，我的用例有些不同，因为我正在为使用外部程序的代码编写测试，所以我无法模拟任何东西。

如何测试需要文件存在的方法？

问题描述

4 个解决方案

解决方案1
2 2016-07-22 16:38:38

解决方案2
0 2016-07-22 18:41:52

解决方案3
0 已采纳 2016-07-24 22:52:17

解决方案4
0 2017-11-20 19:20:13

如何测试需要文件存在的方法？

问题描述

4 个解决方案

解决方案1 2 2016-07-22 16:38:38

解决方案2 0 2016-07-22 18:41:52

解决方案3 0 已采纳 2016-07-24 22:52:17

解决方案4 0 2017-11-20 19:20:13

解决方案1
2 2016-07-22 16:38:38

解决方案2
0 2016-07-22 18:41:52

解决方案3
0 已采纳 2016-07-24 22:52:17

解决方案4
0 2017-11-20 19:20:13