使用scrapy获取Python中的链接？

Question

Sorry if this is a dumb question, but I have absolutely no idea how to use Scrapy. 抱歉，这是一个愚蠢的问题，但是我绝对不知道如何使用Scrapy。 I don't want to create a Scrapy crawler (or w/e), I want to incorporate it into my existing code. 我不想创建Scrapy搜寻器（或带有w / e），而是要将其合并到现有代码中。 I've looked at the docs, but I found them a bit confusing. 我看了看文档，但发现它们有些混乱。

What I need to do is, get links from a list on the site. 我需要做的是，从网站上的列表中获取链接。 I just need an example to better understand it. 我只需要一个例子来更好地理解它。 Also, is it possible to have a for loop to do something with each list item? 另外，是否可以有一个for循环对每个列表项执行某些操作？ They are ordered like 他们被订购像

<ul>
  <li>example</li>
</ul>

Thanks! 谢谢！

Answer 1

maybe you don't need scrappy if it's that simple. 就是这么简单，也许您不需要草率的。

cat local.html

<html><body>
<ul>  
<li>example</li>  
<li>example2</li>
</ul>
<div><a href="test">test</a><div><a href="hi">hi</a></div></div>
</body></html>

then... 然后...

import urllib2
from lxml import html

page =urllib2.urlopen("file:///root/local.html")
root = html.parse(page).getroot()
details = root.cssselect("li")
for x in details:
        print(x.text_content())

for x in root.xpath('//a/@href'):
        print(x)

Answer 2

You might want to consider BeautifulSoup, which is great for parsing HTML/XML, their documentation is quite helpful as well. 您可能需要考虑BeautifulSoup，它对于解析HTML / XML非常有用，它们的文档也非常有用。 Getting the links would be something like: 获取链接将类似于：

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        print link['href']

SoupStrainer removes the need to parse the entire thing when all you're after are the links. 当您只需要链接时，SoupStrainer无需解析整个内容。

EDIT: Just saw that you need to use Scrapy. 编辑：刚刚看到您需要使用Scrapy。 I'm afraid I've not used it, but try looking at the official documentation , it looks like they have what you might be after. 恐怕我还没有使用过它，但是请尝试查看官方文档，看起来他们拥有您想要的东西。

使用scrapy获取Python中的链接？

问题描述

2 个解决方案

解决方案1
0 2011-08-25 09:28:06

解决方案2
0 已采纳 2011-08-25 09:59:41

使用scrapy获取Python中的链接？

问题描述

2 个解决方案

解决方案1 0 2011-08-25 09:28:06

解决方案2 0 已采纳 2011-08-25 09:59:41

解决方案1
0 2011-08-25 09:28:06

解决方案2
0 已采纳 2011-08-25 09:59:41