繁体   English   中英

将网址列表传递给Scrapy函数

[英]Pass a list of URLs to Scrapy function

我有一个API Python,它获取两个参数(URL和一个用户定义的单词),并在JSON文件中提供指定单词在URL中出现多少次。

同时,我想一次花费多个URL,一个列表。 我也想用AsyncIO发出请求。 有什么建议吗?

遵循代码:

from flask import Flask
from flask_restful import Resource, Api, reqparse, abort
import requests

app = Flask(__name__)
api = Api(app)

parser = reqparse.RequestParser()
parser.add_argument('url')
parser.add_argument('word')
parser.add_argument('ignorecase')

# Função que faz um GET para a URL e retorna quantas vezes a palavra word aparece no conteudo
def count_words_in(url, word, ignore_case):
    try:
        r = requests.get(url)
        data = str(r.text)
        if (str(ignore_case).lower() == 'true'):
            return data.lower().count(word.lower())
        else:
            return data.count(word)
    except Exception as e:
        raise e

# Função que inclui 'http://' na url e retorna a URL valida
def validate_url(url):
    if not(url.startswith('http')):
        url = 'http://' + url
    return url


class UrlCrawlerAPI(Resource):
    def get(self):
        try:
            args = parser.parse_args()
            valid_url = validate_url(args['url'])
            return { valid_url : { args['word'] : count_words_in(valid_url, args['word'], args['ignorecase']) }}
        except AttributeError:
            return { 'message' : 'Please provide URL and WORD arguments' }
        except Exception as e:
            return { 'message' : 'Unhandled Exception: ' + str(e) }


api.add_resource(UrlCrawlerAPI, "/")

if __name__ == '__main__':
    app.run(debug=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM