简体   繁体   中英

How we can get List of urls after crawling website from scrapy in costom python script?

I am working with a script where i need to crawl websites, need to crawl only base_url site. Anyone who has pretty good idea how i can launch scarpy in custom python scripts and get urls link in list?

You can use a file to pass the urls from scrapy to your python script.

Or you can print the urls with a mark in your scrapy, and use your python script to catch the stdout of your scrapy.Then parse it to list.

You can add Scrapy commands from an external library by adding scrapy.commands section into entry_points in the setup.py.

from setuptools import setup, find_packages

setup(name='scrapy-mymodule',
  entry_points={
    'scrapy.commands': [
      'my_command=my_scrapy_module.commands:MyCommand',
    ],
  },
 )

http://doc.scrapy.org/en/latest/experimental/index.html?highlight=library#add-commands-using-external-libraries

Also see Scrapy Very Basic Example .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM