I have been going through the Scrapy
examples and they make sense, but as soon as I try it on a news feed I don't get anything but titles and don't know how to proceed.
scrapy shell http://feeds.bbci.co.uk/news/rss.xml
All I can get from this is
response.xpath('//title')
Which outputs
<Selector xpath='//title' data=u'<title xmlns:media="http://search.yahoo.'>]
How can I possible find the tags inside?
When I try this:
response.xpath('//div')
it returns null. I have tried Inspect Elements from Chome to check the content, but I can't somehow even get to the body to try out things. Thanks
rss
is not an html
document, it is xml
document. You can find info on rss
at http://www.w3schools.com/xml/xml_rss.asp . rss
documents look something like:
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title>W3Schools Home Page</title>
<link>http://www.w3schools.com</link>
<description>Free web building tutorials</description>
<item>
<title>RSS Tutorial</title>
<link>http://www.w3schools.com/rss</link>
<description>New RSS tutorial on W3Schools</description>
</item>
<item>
<title>XML Tutorial</title>
<link>http://www.w3schools.com/xml</link>
<description>New XML tutorial on W3Schools</description>
</item>
</channel>
</rss>
So there are no div
tags in it. To get description of each post/news you can use response.xpath('//description/text()')
Scrapy docs can be found here http://doc.scrapy.org/en/latest/intro/tutorial.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.