简体   繁体   中英

unittest web scraper in python

I am new to unit test.I want to write unit test for web scraper that I wrote.My scraper collects the data from website,which is on local disk where inputting different date gives different results

I have the following function in script.

get_date [returns date mentioned on web page]
get_product_and_cost [returns product mentioned and their cost]

I am not sure what to test in these functions.So far I have written this

class SimplisticTest(unittest.TestCase):

    def setUp(self):
        data = read_file("path to file")
        self.soup = BeautifulSoup(data,'html5lib')

    def test_date(self):
        self.assertIsInstance(get_date(self.soup), str)

    def test_date_length(self):
        self.assertEqual(len(get_date(self.soup)),10)

if __name__ == '__main__':
    unittest.main()

Usually, it is good to test a known output from a known input. In your case you test for the return type, but it would be even better to test if the returned object corresponds to what you would expect from the input, and that's where static test data (a test web page in your case) becomes useful. You can also test for exceptions with self.assertRaises(ExceptionType, method, args). Refer to https://docs.python.org/3.4/library/unittest.html if you haven't already.

Basically you want to test at least one explicit case (like the test page), the exceptions that can be raised like bad argument type (TypeError or ValueError) or a possible None return type depending on your function. Make sure not to test only for the type of the return or the amount of the return but explicitly for the data, such that if a change is made that breaks the feature, it is found (whereas a change could still return 10 elements, yet the elements might contain invalid data). I'd also suggest to have one test method for each method: get_date would have its test method test_get_date.

Keep in mind that what you want to find is if the method does its job, so test for extreme cases (big input data, as much as it can support or at least the method defines it can) and try to create them such that if the method outputs differently from what is expected based on its definition (documentation), the test fails and breaking changes are caught early on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM