简体   繁体   English

python中的unittest Web刮板

[英]unittest web scraper in python

I am new to unit test.I want to write unit test for web scraper that I wrote.My scraper collects the data from website,which is on local disk where inputting different date gives different results 我是单元测试的新手。我想为我编写的Web刮板编写单元测试。我的刮板从网站上收集数据,该网站位于本地磁盘上,在其中输入不同的日期会得出不同的结果

I have the following function in script. 我在脚本中具有以下功能。

get_date [returns date mentioned on web page]
get_product_and_cost [returns product mentioned and their cost]

I am not sure what to test in these functions.So far I have written this 我不确定这些功能要测试什么。到目前为止,我已经写了这个

class SimplisticTest(unittest.TestCase):

    def setUp(self):
        data = read_file("path to file")
        self.soup = BeautifulSoup(data,'html5lib')

    def test_date(self):
        self.assertIsInstance(get_date(self.soup), str)

    def test_date_length(self):
        self.assertEqual(len(get_date(self.soup)),10)

if __name__ == '__main__':
    unittest.main()

Usually, it is good to test a known output from a known input. 通常,最好从已知输入中测试已知输出。 In your case you test for the return type, but it would be even better to test if the returned object corresponds to what you would expect from the input, and that's where static test data (a test web page in your case) becomes useful. 在您的情况下,您将测试返回类型,但最好测试返回的对象是否与您从输入中期望的对象相对应,这就是使用静态测试数据(在您的情况下为测试网页)的地方。 You can also test for exceptions with self.assertRaises(ExceptionType, method, args). 您也可以使用self.assertRaises(ExceptionType,method,args)测试异常。 Refer to https://docs.python.org/3.4/library/unittest.html if you haven't already. 如果尚未使用,请参阅https://docs.python.org/3.4/library/unittest.html

Basically you want to test at least one explicit case (like the test page), the exceptions that can be raised like bad argument type (TypeError or ValueError) or a possible None return type depending on your function. 基本上,您想测试至少一个显式的情况(例如测试页),可以引发异常,例如错误的参数类型(TypeError或ValueError)或可能的None返回类型,具体取决于您的函数。 Make sure not to test only for the type of the return or the amount of the return but explicitly for the data, such that if a change is made that breaks the feature, it is found (whereas a change could still return 10 elements, yet the elements might contain invalid data). 确保不要仅测试返回类型或返回量,而是显式地测试数据,这样,如果进行了更改以破坏功能,则可以找到它(而更改仍然可以返回10个元素,但是元素可能包含无效数据)。 I'd also suggest to have one test method for each method: get_date would have its test method test_get_date. 我还建议每种方法都有一个测试方法:get_date将具有其测试方法test_get_date。

Keep in mind that what you want to find is if the method does its job, so test for extreme cases (big input data, as much as it can support or at least the method defines it can) and try to create them such that if the method outputs differently from what is expected based on its definition (documentation), the test fails and breaking changes are caught early on. 请记住,您要查找的是方法是否有效,因此请测试极端情况(大输入数据,尽可能多地支持或至少由方法定义),并尝试创建它们,以便该方法的输出与基于其定义(文档)的预期结果不同,该测试失败,并且较早发现了重大更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM