简体   繁体   中英

HTTP POST and parsing JSON with Scrapy

I have a site that I want to extract data from. The data retrieval is very straight forward.

It takes the parameters using HTTP POST and returns a JSON object. So, I have a list of queries that I want to do and then repeat at certain intervals to update a database. Is scrapy suitable for this or should I be using something else?

I don't actually need to follow links but I do need to send multiple requests at the same time.

How does looks like the POST request? There are many variations, like simple query parameters ( ?a=1&b=2 ), form-like payload (the body contains a=1&b=2 ), or any other kind of payload (the body contains a string in some format, like json or xml).

In scrapy is fairly straightforward to make POST requests, see: http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples

For example, you may need something like this:

    # Warning: take care of the undefined variables and modules!

    def start_requests(self):
        payload = {"a": 1, "b": 2}
        yield Request(url, self.parse_data, method="POST", body=urllib.urlencode(payload))

    def parse_data(self, response):
        # do stuff with data...
        data = json.loads(response.body)

For handling requests and retrieving response, scrapy is more than enough. And to parse JSON, just use the json module in the standard library:

import json

data = ...
json_data = json.loads(data)

Hope this helps!

Based on my understanding of the question, you just want to fetch/scrape data from a web page at certain intervals. Scrapy is generally used for crawling.

If you just want to make http post requests you might consider using the python requests library.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM