在Costom python脚本中从scrapy抓取网站后，如何获取网址列表？

Question

I am working with a script where i need to crawl websites, need to crawl only base_url site. 我正在使用一个脚本，在该脚本中，我需要爬网网站，仅需要爬网base_url站点。 Anyone who has pretty good idea how i can launch scarpy in custom python scripts and get urls link in list? 任何人都有一个很好的主意，我如何才能在自定义python脚本中启动Scarpy并在列表中获取URL链接？

Answer 1

You can use a file to pass the urls from scrapy to your python script. 您可以使用文件将网址从scrapy传递到python脚本。

Or you can print the urls with a mark in your scrapy, and use your python script to catch the stdout of your scrapy.Then parse it to list. 或者您可以在scrapy中打印带有标记的网址，然后使用python脚本捕获您scrapy的标准输出，然后将其解析以列出。

Answer 2

You can add Scrapy commands from an external library by adding scrapy.commands section into entry_points in the setup.py. 您可以通过将scrapy.commands部分添加到setup.py中的entry_points中来从外部库添加Scrapy命令。

from setuptools import setup, find_packages

setup(name='scrapy-mymodule',
  entry_points={
    'scrapy.commands': [
      'my_command=my_scrapy_module.commands:MyCommand',
    ],
  },
 )

http://doc.scrapy.org/en/latest/experimental/index.html?highlight=library#add-commands-using-external-libraries http://doc.scrapy.org/en/latest/experimental/index.html?highlight=library#add-commands-using-external-libraries

Also see Scrapy Very Basic Example . 另请参阅Scrapy基本示例。

在Costom python脚本中从scrapy抓取网站后，如何获取网址列表？

问题描述

2 个解决方案

解决方案1
0 2015-03-17 06:22:32

解决方案2
0 已采纳 2015-03-18 17:21:00

在Costom python脚本中从scrapy抓取网站后，如何获取网址列表？

问题描述

2 个解决方案

解决方案1 0 2015-03-17 06:22:32

解决方案2 0 已采纳 2015-03-18 17:21:00

解决方案1
0 2015-03-17 06:22:32

解决方案2
0 已采纳 2015-03-18 17:21:00