在 Scrapy 中使用 For 循環將 Xpath 值附加到列表

Question

我正在嘗試在 Scrapy 中嘗試自動化我的 html 表刮擦。 這是我到目前為止所擁有的：

import scrapy
import pandas as pd

class XGSpider(scrapy.Spider):

    name = 'expectedGoals'

    start_urls = [
        'https://fbref.com/en/comps/9/schedule/Premier-League-Scores-and-Fixtures',
    ]

    def parse(self, response):

        matches = []

        for row in response.xpath('//*[@id="sched_ks_3232_1"]//tbody/tr'):

            match = {
                'home': row.xpath('td[4]//text()').extract_first(),
                'homeXg': row.xpath('td[5]//text()').extract_first(),
                'score': row.xpath('td[6]//text()').extract_first(),
                'awayXg': row.xpath('td[7]//text()').extract_first(),
                'away': row.xpath('td[8]//text()').extract_first()
            }

            matches.append(match)

        x = pd.DataFrame(
            matches, columns=['home', 'homeXg', 'score', 'awayXg', 'away'])

        yield x.to_csv("xG.csv", sep=",", index=False)

它工作正常，但是如您所見，我正在對match object 的鍵（ home 、 homeXg等）進行硬編碼。 我想自動將鍵刮到列表中，然后用所述列表中的鍵初始化字典。 問題是，我不知道如何按索引遍歷 xpath。 舉個例子，

 headers = [] 
        for row in response.xpath('//*[@id="sched_ks_3260_1"]/thead/tr'): 
            yield{
                'first': row.xpath('th[1]/text()').extract_first(),
                'second': row.xpath('th[2]/text()').extract_first()
            }

是否可以將th[1] 、 th[2] 、 th[3]等粘貼到 for 循環中，將數字作為索引，並將值附加到列表中？ 例如

row.xpath('th[i]/text()').extract_first() ？

Answer 1

未經測試，但應該可以工作：

column_index = 1
columns = {}
for column_node in response.xpath('//*[@id="sched_ks_3260_1"]/thead/tr/th'):
    column_name = column_node.xpath('./text()').extract_first()
    columns[column_name] = column_index
    column_index += 1
    matches = []

for row in response.xpath('//*[@id="sched_ks_3232_1"]//tbody/tr'):
    match = {}        
    for column_name in columns.keys():
        match[column_name] = row.xpath('./td[{index}]//text()'.format(index=columns[column_name])).extract_first()
    matches.append(match)

在 Scrapy 中使用 For 循環將 Xpath 值附加到列表

問題描述

1 個解決方案

解決方案1
1 已采納 2020-08-13 15:19:13

在 Scrapy 中使用 For 循環將 Xpath 值附加到列表

問題描述

1 個解決方案

解決方案1 1 已采納 2020-08-13 15:19:13

解決方案1
1 已采納 2020-08-13 15:19:13