如何正确测试Scrapy蜘蛛Python生成器功能？

Question

I have a Scrapy XMLFeedSpider and I'm trying to test the following parse_node function: 我有一个Scrapy XMLFeedSpider ，我正在尝试测试以下parse_node函数：

def parse_node(self, response, selector):
    date = selector.xpath('pubDate/text()').extract_first()
    url = selector.xpath('link/text()').extract_first()               
    if date < self.cutoff_date:  # TEST VALIDITY OF THE DATE
        print "Invalid date"
        self.log("Article %s before crawler start date" % url)
    else:
        print "Valid date"
        yield scrapy.Request(url, self.parse_post)

I'm trying to test the function for both a valid and an invalid date: 我正在尝试测试有效和无效日期的函数：

@mock.patch('my_spiders.spiders.myspider.scrapy.Request')               
def test_parse_node(self, scrapy_request):                                      
    scrapy_request.return_value = mock.MagicMock()                              
    self.spider.log = mock.MagicMock()                                          
    mock_response = mock.MagicMock()                                            
    mock_selector = mock.MagicMock()                                            
    date = self.spider.start_date.strftime("%c")                                
    url = "https://google.com"                                                  
    mock_selector.xpath.return_value.extract_first = mock.MagicMock(            
        side_effect=[date, url]                                                 
    )                                                                           
    parsed_node = self.spider.parse_node(mock_response, mock_selector)          
    self.assertEqual(tuple(parsed_node)[0], scrapy_request.return_value)        
    self.spider.log.assert_not_called()                                         
    scrapy_request.assert_called_once_with(url, self.spider.parse_post)         

@mock.patch('my_spiders.spiders.myspider.scrapy.Request')               
def test_parse_node_invalid_date(self, scrapy_request):                         
    scrapy_request.return_value = mock.MagicMock()                              
    self.spider.log = mock.MagicMock()                                          
    mock_response = mock.MagicMock()                                            
    mock_selector = mock.MagicMock()                                            
    date_object = self.spider.start_date - datetime.timedelta(days=1)           
    date = date_object.strftime("%c")                                           
    url = "https://google.com"                                                  
    mock_selector.xpath.return_value.extract_first = mock.MagicMock(            
        side_effect=[date, url]                                                 
    )                                                                           

    parsed_node = self.spider.parse_node(mock_response, mock_selector)          
    # TODO: figure out why this doesn't work                                    
    # self.spider.log.assert_called_once()                                   
    scrapy_request.assert_not_called()

The first test, test_parse_node runs as expected. 第一个测试， test_parse_node按预期运行。 The problem is with the test_parse_node_invalid_date function. 问题在于test_parse_node_invalid_date函数。 If I put a debugger in the parse_node function it doesn't get called. 如果我在parse_node函数中放置一个调试器，它就不会被调用。 The print functions don't get called either. print功能也不会被调用。

I suspect this is some kind of issue with the yield statement/generator, but can't figure out what's happening. 我怀疑这是yield语句/生成器的某种问题，但无法弄清楚发生了什么。 Why isn't the second test running through the parse_node function as I'd expect it would? 为什么第二次测试没有像我期望的那样通过parse_node函数运行？

Answer 1

A python generator function simply returns an iterator. python生成器函数只返回一个迭代器。 To actually debug that iterator, I had to start the iteration process by invoking the next() method: 要实际调试迭代器，我必须通过调用next()方法来启动迭代过程：

parsed_node = self.spider.parse_node(mock_response, mock_selector).next()

I also had to make sure that each test instantiated a new generator, because a generator can only be iterated over one time. 我还必须确保每个测试都实例化一个新的生成器，因为生成器只能迭代一次。

Then I could step through and debug/complete my test as necessary. 然后我可以根据需要逐步调试/完成我的测试。

如何正确测试Scrapy蜘蛛Python生成器功能？

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-05-17 20:48:18

如何正确测试Scrapy蜘蛛Python生成器功能？

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-05-17 20:48:18

解决方案1
0 已采纳 2016-05-17 20:48:18