在 Scrapy 中使用 For 循环将 Xpath 值附加到列表

Question

I'm looking to try and automate my html table scrape in Scrapy.我正在尝试在 Scrapy 中尝试自动化我的 html 表刮擦。 This is what I have so far:这是我到目前为止所拥有的：

import scrapy
import pandas as pd

class XGSpider(scrapy.Spider):

    name = 'expectedGoals'

    start_urls = [
        'https://fbref.com/en/comps/9/schedule/Premier-League-Scores-and-Fixtures',
    ]

    def parse(self, response):

        matches = []

        for row in response.xpath('//*[@id="sched_ks_3232_1"]//tbody/tr'):

            match = {
                'home': row.xpath('td[4]//text()').extract_first(),
                'homeXg': row.xpath('td[5]//text()').extract_first(),
                'score': row.xpath('td[6]//text()').extract_first(),
                'awayXg': row.xpath('td[7]//text()').extract_first(),
                'away': row.xpath('td[8]//text()').extract_first()
            }

            matches.append(match)

        x = pd.DataFrame(
            matches, columns=['home', 'homeXg', 'score', 'awayXg', 'away'])

        yield x.to_csv("xG.csv", sep=",", index=False)

It works fine, however as you can see I am hardcoding the keys ( home , homeXg , etc.) for the match object.它工作正常，但是如您所见，我正在对match object 的键（ home 、 homeXg等）进行硬编码。 I'd like to automate scraping the keys to a list and then initialize a dict wih keys from said list.我想自动将键刮到列表中，然后用所述列表中的键初始化字典。 Problem is, I don't know how to loop through xpath by index.问题是，我不知道如何按索引遍历 xpath。 As an example,举个例子，

 headers = [] 
        for row in response.xpath('//*[@id="sched_ks_3260_1"]/thead/tr'): 
            yield{
                'first': row.xpath('th[1]/text()').extract_first(),
                'second': row.xpath('th[2]/text()').extract_first()
            }

Is it possible to stick th[1] , th[2] , th[3] etc. into a for loop, with the numbers as indexes, and appending the values to a list?是否可以将th[1] 、 th[2] 、 th[3]等粘贴到 for 循环中，将数字作为索引，并将值附加到列表中？ eg例如

row.xpath('th[i]/text()').extract_first() ? row.xpath('th[i]/text()').extract_first() ？

Answer 1

Not tested but should work:未经测试，但应该可以工作：

column_index = 1
columns = {}
for column_node in response.xpath('//*[@id="sched_ks_3260_1"]/thead/tr/th'):
    column_name = column_node.xpath('./text()').extract_first()
    columns[column_name] = column_index
    column_index += 1
    matches = []

for row in response.xpath('//*[@id="sched_ks_3232_1"]//tbody/tr'):
    match = {}        
    for column_name in columns.keys():
        match[column_name] = row.xpath('./td[{index}]//text()'.format(index=columns[column_name])).extract_first()
    matches.append(match)

在 Scrapy 中使用 For 循环将 Xpath 值附加到列表

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-13 15:19:13

在 Scrapy 中使用 For 循环将 Xpath 值附加到列表

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-13 15:19:13

解决方案1
1 已采纳 2020-08-13 15:19:13