简体   繁体   English

如何格式化我的数据,以便我有相应的团队及其分数? (爬虫Python)

[英]How can I format my data so that I have the corresponding teams with their scores? (Scrapy Python)

I'm having trouble formatting my scraped data, any advice on how I could extract my data into four columns (Winning Team, Losing Team, Winning Score, Losing Score)我在格式化我的抓取数据时遇到问题,关于如何将我的数据提取到四列(获胜队、失败队、获胜分数、失败分数)的任何建议

import scrapy


class sportsDataSpider(scrapy.Spider):
    name = "sportsSite"
    allowed_domains = ["www.espn.com"]
    start_urls = ["https://www.espn.com/nhl/scoreboard/_/date/20220504"]

    handle_httpstatus_list = [404]

    def parse(self, response, **kwargs):
        hockey_score_selector = response.css(".ScoreCell__Team--scoreboard").extract()
        loser_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__Score::text"
        winner_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__Score::text"
        team_sel = ".ScoreboardPage .ScoreCell__TeamName--shortDisplayName::text"

        loser_score = response.css(loser_sel).extract()
        winner_score = response.css(winner_sel).extract()
        teams = response.css(team_sel).extract()

        yield {
            'losing score': loser_score,
            'winning score': winner_score,
            'teams': teams
        }

This is my current output I get from this code.这是我从这段代码中得到的当前 output。

{'losing score': ['2', '3', '2', '0'], 'winning score': ['5', '5', '6', '6'], 'teams': ['Bruins', 'Hurricanes', 'Lightning', 'Maple Leafs', 'Blues', 'Wild', 'Kings', 'Oilers']}

Instead of collecting all the teams at once based on .ScoreboardPage , try collecting two sets based on .ScoreboardScoreCell__Item--loser and .ScoreboardScoreCell__Item--winner .与其根据.ScoreboardPage一次收集所有团队,不如尝试根据.ScoreboardScoreCell__Item--loser.ScoreboardScoreCell__Item--winner收集两组。 So:所以:

    def parse(self, response, **kwargs):
        hockey_score_selector = response.css(".ScoreCell__Team--scoreboard").extract()
        loser_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__Score::text"
        winner_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__Score::text"
        
        # team_sel = ".ScoreboardPage .ScoreCell__TeamName--shortDisplayName::text"
        loser_team_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__TeamName--shortDisplayName::text"
        winner_team_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__TeamName--shortDisplayName::text"

        loser_score = response.css(loser_sel).extract()
        winner_score = response.css(winner_sel).extract()

        # teams = response.css(team_sel).extract()
        loser_teams = response.css(loser_team_sel).extract()
        winner_teams = response.css(winner_team_sel).extract()

        yield {
            'losing score': loser_score,
            'winning score': winner_score,
            # 'teams': teams,
            'losing team': loser_teams,
            'winning team': winner_teams
        }

Output: Output:

{'losing score': ['2', '3', '2', '0'],
 'winning score': ['5', '5', '6', '6'],
 'losing team': ['Bruins', 'Maple Leafs', 'Blues', 'Kings'],
 'winning team': ['Hurricanes', 'Lightning', 'Wild', 'Oilers']}

Eg Bruins lost to Hurricanes, etc.例如熊队输给了飓风等。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何编辑我的 VS Code 环境,以便我可以为我的 Python 代码预设输入数据,这样我就不必一次又一次地输入数据 - How do I edit my VS Code environment so that I can preset Input data for my Python Code so that I don't have to input data again and again 尝试以 json 格式对我的数据进行分析。 到目前为止,我的代码下面我的问题是如何加入我的所有数据,请帮助我是 Python 新手 - trying do analysis to my data in json format. my code below so far my questions is how can I join all my data , pls help I am new to python Python - 如何格式化 csv 文件中的数据? - Python - How do I format scrapy data in a csv file? 如何格式化谷歌表格以便正确导出我的数据? - How can I format google sheets so I can export my data properly? 如何格式化 python 数据框超链接,以便我可以使用 Web 浏览器正确打开它们? - How can I format python data frame hyperlinks so I can correctly open them with web browser? 我如何让Scrapy爬进python项目? - How can I get have Scrapy crawling inside a python project ? 如何在scrapy python中编写自定义链接提取器 - How can i write my custom link extractor in scrapy python 如何在我的python代码中使用scrapy抓取多个页面? - How can I scrape multiple pages with scrapy in my python code? 在我的 python 3 中,我不能使用 urllib.request,因为它说我没有这个子模块,所以我如何访问网站数据? - in my python 3 i cannot use urllib.request as it says i don't have this submodule so how can i access websites data? 我想用“%”号显示我的EXCEL数据。 我如何使用python做到这一点 - I want to display my EXCEL data with “%” sign. How can i do so by using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM