[英]Using python scrapy to extract links from a webpage
我是python的初學者,並使用scrapy從以下網頁http://www.basketball-reference.com/leagues/NBA_2015_games.html提取鏈接。
我寫的代碼是
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from basketball.items import BasketballItem
class BasketballSpider(CrawlSpider):
name = 'basketball'
allowed_domains = ['basketball-reference.com/']
start_urls = ['http://www.basketball-reference.com/leagues/NBA_2015_games.html']
rules = [Rule(LinkExtractor(allow=['http://www.basketball-reference.com/boxscores/^\w+$']), 'parse_item')]
def parse_item(self, response):
item = BasketballItem()
item['url'] = response.url
return item
我通過命令提示符運行此代碼,但是創建的文件沒有任何鏈接。 有人可以幫忙嗎?
它找不到鏈接,無法在規則中修復正則表達式:
rules = [
Rule(LinkExtractor(allow='boxscores/\w+'))
]
另外,在稱為parse_item
時不必設置callback
-這是默認設置。
並且allow
也可以設置為字符串。
rules = [
Rule(LinkExtractor(allow='boxscores/\w+'), callback='parse_item')
]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.