I have a Scrapy XMLFeedSpider
and I'm trying to test the following parse_node
function:
def parse_node(self, response, selector):
date = selector.xpath('pubDate/text()').extract_first()
url = selector.xpath('link/text()').extract_first()
if date < self.cutoff_date: # TEST VALIDITY OF THE DATE
print "Invalid date"
self.log("Article %s before crawler start date" % url)
else:
print "Valid date"
yield scrapy.Request(url, self.parse_post)
I'm trying to test the function for both a valid and an invalid date:
@mock.patch('my_spiders.spiders.myspider.scrapy.Request')
def test_parse_node(self, scrapy_request):
scrapy_request.return_value = mock.MagicMock()
self.spider.log = mock.MagicMock()
mock_response = mock.MagicMock()
mock_selector = mock.MagicMock()
date = self.spider.start_date.strftime("%c")
url = "https://google.com"
mock_selector.xpath.return_value.extract_first = mock.MagicMock(
side_effect=[date, url]
)
parsed_node = self.spider.parse_node(mock_response, mock_selector)
self.assertEqual(tuple(parsed_node)[0], scrapy_request.return_value)
self.spider.log.assert_not_called()
scrapy_request.assert_called_once_with(url, self.spider.parse_post)
@mock.patch('my_spiders.spiders.myspider.scrapy.Request')
def test_parse_node_invalid_date(self, scrapy_request):
scrapy_request.return_value = mock.MagicMock()
self.spider.log = mock.MagicMock()
mock_response = mock.MagicMock()
mock_selector = mock.MagicMock()
date_object = self.spider.start_date - datetime.timedelta(days=1)
date = date_object.strftime("%c")
url = "https://google.com"
mock_selector.xpath.return_value.extract_first = mock.MagicMock(
side_effect=[date, url]
)
parsed_node = self.spider.parse_node(mock_response, mock_selector)
# TODO: figure out why this doesn't work
# self.spider.log.assert_called_once()
scrapy_request.assert_not_called()
The first test, test_parse_node
runs as expected. The problem is with the test_parse_node_invalid_date
function. If I put a debugger in the parse_node
function it doesn't get called. The print
functions don't get called either.
I suspect this is some kind of issue with the yield
statement/generator, but can't figure out what's happening. Why isn't the second test running through the parse_node
function as I'd expect it would?
A python generator function simply returns an iterator. To actually debug that iterator, I had to start the iteration process by invoking the next()
method:
parsed_node = self.spider.parse_node(mock_response, mock_selector).next()
I also had to make sure that each test instantiated a new generator, because a generator can only be iterated over one time.
Then I could step through and debug/complete my test as necessary.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.