[英]How can I format my data so that I have the corresponding teams with their scores? (Scrapy Python)
I'm having trouble formatting my scraped data, any advice on how I could extract my data into four columns (Winning Team, Losing Team, Winning Score, Losing Score)我在格式化我的抓取数据时遇到问题,关于如何将我的数据提取到四列(获胜队、失败队、获胜分数、失败分数)的任何建议
import scrapy
class sportsDataSpider(scrapy.Spider):
name = "sportsSite"
allowed_domains = ["www.espn.com"]
start_urls = ["https://www.espn.com/nhl/scoreboard/_/date/20220504"]
handle_httpstatus_list = [404]
def parse(self, response, **kwargs):
hockey_score_selector = response.css(".ScoreCell__Team--scoreboard").extract()
loser_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__Score::text"
winner_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__Score::text"
team_sel = ".ScoreboardPage .ScoreCell__TeamName--shortDisplayName::text"
loser_score = response.css(loser_sel).extract()
winner_score = response.css(winner_sel).extract()
teams = response.css(team_sel).extract()
yield {
'losing score': loser_score,
'winning score': winner_score,
'teams': teams
}
This is my current output I get from this code.这是我从这段代码中得到的当前 output。
{'losing score': ['2', '3', '2', '0'], 'winning score': ['5', '5', '6', '6'], 'teams': ['Bruins', 'Hurricanes', 'Lightning', 'Maple Leafs', 'Blues', 'Wild', 'Kings', 'Oilers']}
Instead of collecting all the teams at once based on .ScoreboardPage
, try collecting two sets based on .ScoreboardScoreCell__Item--loser
and .ScoreboardScoreCell__Item--winner
.与其根据.ScoreboardPage
一次收集所有团队,不如尝试根据.ScoreboardScoreCell__Item--loser
和.ScoreboardScoreCell__Item--winner
收集两组。 So:所以:
def parse(self, response, **kwargs):
hockey_score_selector = response.css(".ScoreCell__Team--scoreboard").extract()
loser_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__Score::text"
winner_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__Score::text"
# team_sel = ".ScoreboardPage .ScoreCell__TeamName--shortDisplayName::text"
loser_team_sel = ".ScoreboardScoreCell__Item--loser .ScoreCell__TeamName--shortDisplayName::text"
winner_team_sel = ".ScoreboardScoreCell__Item--winner .ScoreCell__TeamName--shortDisplayName::text"
loser_score = response.css(loser_sel).extract()
winner_score = response.css(winner_sel).extract()
# teams = response.css(team_sel).extract()
loser_teams = response.css(loser_team_sel).extract()
winner_teams = response.css(winner_team_sel).extract()
yield {
'losing score': loser_score,
'winning score': winner_score,
# 'teams': teams,
'losing team': loser_teams,
'winning team': winner_teams
}
Output: Output:
{'losing score': ['2', '3', '2', '0'],
'winning score': ['5', '5', '6', '6'],
'losing team': ['Bruins', 'Maple Leafs', 'Blues', 'Kings'],
'winning team': ['Hurricanes', 'Lightning', 'Wild', 'Oilers']}
Eg Bruins lost to Hurricanes, etc.例如熊队输给了飓风等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.