Python Scrapy - 从mysql填充start_urls

Question

我试图使用spider.py从MYSQL表中使用SELECT填充start_url。 当我运行“scrapy runspider spider.py”时，我没有输出，只是它完成没有错误。

我已经在python脚本中测试了SELECT查询，并且使用来自MYSQL表的entrys来填充start_url。

spider.py

from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import MySQLdb


class ProductsSpider(BaseSpider):
    name = "Products"
    allowed_domains = ["test.com"]
    start_urls = []

    def parse(self, response):
        print self.start_urls

    def populate_start_urls(self, url):
        conn = MySQLdb.connect(
                user='user',
                passwd='password',
                db='scrapy',
                host='localhost',
                charset="utf8",
                use_unicode=True
                )
        cursor = conn.cursor()
        cursor.execute(
            'SELECT url FROM links;'
            )
    rows = cursor.fetchall()

    for row in rows:
        start_urls.append(row[0])
    conn.close()

Answer 1

更好的方法是覆盖start_requests方法。

这可以查询您的数据库，就像populate_start_urls一样，并返回一系列Request对象。

您只需要将populate_start_urls方法重命名为start_requests并修改以下行：

for row in rows:
    yield self.make_requests_from_url(row[0])

Answer 2

在__init__写下填充：

def __init__(self):
    super(ProductsSpider,self).__init__()
    self.start_urls = get_start_urls()

假设get_start_urls()返回url。

Python Scrapy - 从mysql填充start_urls

问题描述

2 个解决方案

解决方案1
13 已采纳 2013-11-22 04:43:19

解决方案2
5 2013-11-21 15:20:22

Python Scrapy - 从mysql填充start_urls

问题描述

2 个解决方案

解决方案1 13 已采纳 2013-11-22 04:43:19

解决方案2 5 2013-11-21 15:20:22

解决方案1
13 已采纳 2013-11-22 04:43:19

解决方案2
5 2013-11-21 15:20:22