简体   繁体   English

在Costom python脚本中从scrapy抓取网站后,如何获取网址列表?

[英]How we can get List of urls after crawling website from scrapy in costom python script?

I am working with a script where i need to crawl websites, need to crawl only base_url site. 我正在使用一个脚本,在该脚本中,我需要爬网网站,仅需要爬网base_url站点。 Anyone who has pretty good idea how i can launch scarpy in custom python scripts and get urls link in list? 任何人都有一个很好的主意,我如何才能在自定义python脚本中启动Scarpy并在列表中获取URL链接?

You can use a file to pass the urls from scrapy to your python script. 您可以使用文件将网址从scrapy传递到python脚本。

Or you can print the urls with a mark in your scrapy, and use your python script to catch the stdout of your scrapy.Then parse it to list. 或者您可以在scrapy中打印带有标记的网址,然后使用python脚本捕获您scrapy的标准输出,然后将其解析以列出。

You can add Scrapy commands from an external library by adding scrapy.commands section into entry_points in the setup.py. 您可以通过将scrapy.commands部分添加到setup.py中的entry_points中来从外部库添加Scrapy命令。

from setuptools import setup, find_packages

setup(name='scrapy-mymodule',
  entry_points={
    'scrapy.commands': [
      'my_command=my_scrapy_module.commands:MyCommand',
    ],
  },
 )

http://doc.scrapy.org/en/latest/experimental/index.html?highlight=library#add-commands-using-external-libraries http://doc.scrapy.org/en/latest/experimental/index.html?highlight=library#add-commands-using-external-libraries

Also see Scrapy Very Basic Example . 另请参阅Scrapy基本示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM