簡體   English   中英

SCRAPY SPIDER-發送帖子請求

[英]SCRAPY SPIDER - Send Post Request

我正在嘗試刪除此網頁的表格( https://www.ftse.com/pr oducts / indices / uk)。 當我檢查“網絡”選項卡中的頁面時,我看到此頁面將其數據獲取到具有AJAX請求(類型POST)的API,該請求由瀏覽器在加載布局后完成。 因此,我正在嘗試構建一個蜘蛛,使用請求中提供的form_data將POST請求發送到網頁。 我已經使用以下shell命令進行了快速測試,並獲得了數據。

curl 'https://www.ftse.com/products/indices/home/ra_getIndexData/' --data 'indexName=GEISAC&currency=GBP&rtn=CAPITAL&ctry=Regions&Indices=ASX%2CFTSE+All-Share%2C%3AUKX%2CFTSE+100%2C%3AMCX%2CFTSE+250%2C%3AMCXNUK%2CFTSE+250+Net+Tax%2C%3ANMX%2CFTSE+350%2C%3ASMX%2CFTSE+Small+Cap%2C%3ANSX%2CFTSE+Fledgling%2C%3AAS0%2CFTSE+All-Small%2C%3AASXX%2CFTSE+All-Share+ex+Invt+Trust%2C%3AUKXXIT%2CFTSE+100+Index+ex+Invt+Trust%2C%3AMCIX%2CFTSE+250+Index+ex+Invt+Trust%2C%3ANMIX%2CFTSE+350+Index+ex+Invt+Trust%2C%3ASMXX%2CFTSE+Small+Cap+ex+Invt+Trust%2C%3AAS0X%2CFTSE+All-Small+ex+Invt+Trust%2C%3AUKXDUK%2CFTSE+100+Total+Return+Declared+Dividend%2C%3A&type='

但是,當我嘗試使用FormRequest類在Spider上對其進行編碼時,Spider會失敗。

class FtseSpider(scrapy.Spider):
    name = 'ftse'
    #allowed_domains = ['www.ftserussell.com', 'www.ftse.com']
    start_urls = [
            'https://www.ftse.com/products/indices/uk']


    def parse(self, request):
        # URL parameters for the requst
        data = 'indexName=GEISAC&currency=GBP&rtn=CAPITAL&ctry=Regions&Indices=ASX%2CFTSE+All-Share%2C%3AUKX%2CFTSE+100%2C%3AMCX%2CFTSE+250%2C%3AMCXNUK%2CFTSE+250+Net+Tax%2C%3ANMX%2CFTSE+350%2C%3ASMX%2CFTSE+Small+Cap%2C%3ANSX%2CFTSE+Fledgling%2C%3AAS0%2CFTSE+All-Small%2C%3AASXX%2CFTSE+All-Share+ex+Invt+Trust%2C%3AUKXXIT%2CFTSE+100+Index+ex+Invt+Trust%2C%3AMCIX%2CFTSE+250+Index+ex+Invt+Trust%2C%3ANMIX%2CFTSE+350+Index+ex+Invt+Trust%2C%3ASMXX%2CFTSE+Small+Cap+ex+Invt+Trust%2C%3AAS0X%2CFTSE+All-Small+ex+Invt+Trust%2C%3AUKXDUK%2CFTSE+100+Total+Return+Declared+Dividend%2C%3A&type='`
        # convert the URL parameters in to a dict 
        params_raw_ = urllib.parse.parse_qs(data)
        prams_dict_ = {k: v[0] for k, v in params_raw_.items()}
        # return the response
        yield [scrapy.FormRequest('https://www.ftse.com/products/indices/home/ra_getIndexData/',
                    method='POST',
                    body=prams_dict_)]

由於數據具有嵌套字典,因此無法將其以scrapy形式表示為formdata,因此必須在請求正文中傳遞json轉儲,該轉儲等於“ data”的初始表示。 在產生迭代器時也可以使用yield from,或者使用單個對象或Request來產生。

yield from [scrapy.FormRequest('https://www.ftse.com/products/indices/home/ra_getIndexData/',
                method='POST', body=data)]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM