简体   繁体   中英

How do I run a user defined function in a scrapy spider?

Since we run scrapy spiders with its own terminal commands, how can I run my own defined functions?

Example below:

import scrapy


class Fcc(scrapy.Spider):

    name = "fcc"
    start_urls = ["http://freecodecamp.org/"]

    def parse(self, response):
        for link in response.css("a::attr(href)").getall():
            yield {
            "url": link,
            }

    def add(self):
        with open("links.txt", "a") as f:
            f.write(next(self.parse()))

So now if I run the spider from terminal by passing the below command, It will only execute the parse function. So how can I run the add function when I want to?

scrapy runspider fcc_spider.py

Because this will help me working with data I crawl from any website.

Ps. This is just an example, please don't give specific solutions for only this code, give solutions that can be used in any situation.

By default Scrapy execute start_requests or parse methods. You can use def __init__ to check for command line params and run your target function.

You can run your user defined functions by calling them in one of your Scrapy callbacks.

You could call it before or after the for loop inside the parse method (remember of the asynchronous nature of Scrapy).

You could also define a constructor for your Spider and pass the contents of the links.txt file to it.

Here is an example from the Scrapy documentation: https://docs.scrapy.org/en/latest/topics/spiders.html#spider-arguments

In Python, it's possible to create Inner Functions (function in function).

A function defined inside another function is known as an inner function or a nested function. In Python, this kind of function can access names in the enclosing function. Here's an example of how to create an inner function in Python:

def outer_func():
     def inner_func():
         print("Hello, World!")
     inner_func()

outer_func()

Output:

Hello, World!

In this code, you define inner_func() inside outer_func() to print the Hello, World! message to the screen. To do that, you call inner_func() on the last line of outer_func(). This is the quickest way to write an inner function in Python. However, inner functions provide a lot of interesting possibilities beyond what you see in this example.

Read more here

INTEGRATION Example

Based on that you can create a function in one of the Scrapy functions, and call it within that function.

def parse_disease(self, response):
    def function_name(name):
        to_return = "hello {}".format(name)
        return to_return

    #Some code here...

    pharam = function_name(name)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM