[英]How can i data frames to different worksheets?
I'm currently working on a code that scrapes a website for football match data, and I want it to put the data.tables from the last 5 seasons into one excel workbook but on different sheets for each season.我目前正在编写一个代码,用于抓取足球比赛数据的网站,我希望它将过去 5 个赛季的 data.tables 放入一个 excel 工作簿中,但每个赛季都在不同的工作表上。
The code below works fine until the point where it actually has to create the excel workbooks.下面的代码工作正常,直到它实际必须创建 excel 工作簿为止。 Currently, it creates 5 different workbooks, one for each id, but only the ids 1631 and 1889 contain the correct data.目前,它创建了 5 个不同的工作簿,每个 ID 一个,但只有 ID 1631 和 1889 包含正确的数据。 The other three workbooks contain the data from id=1889.其他三个工作簿包含来自 id=1889 的数据。
I already looked up several solutions, but couldn't find one that matches my problem, so I'm not entirely sure if it even can be done.我已经查找了几种解决方案,但找不到与我的问题相匹配的解决方案,所以我不确定是否可以完成。 Thank you in advance!先感谢您!
import requests
from bs4 import BeautifulSoup
import pandas as pd
import xlsxwriter
def get_df(team_name, team_id):
seasons_url = (
f"https://fbref.com/de/mannschaften/{team_id}/2021-2022/{team_name}-Statistiken",
f"https://fbref.com/de/mannschaften/{team_id}/2020-2021/{team_name}-Statistiken",
f"https://fbref.com/de/mannschaften/{team_id}/2019-2020/{team_name}-Statistiken",
f"https://fbref.com/de/mannschaften/{team_id}/2018-2019/{team_name}-Statistiken",
f"https://fbref.com/de/mannschaften/{team_id}/2017-2018/{team_name}-Statistiken",)
ids = ["stats_standard_11160",
"stats_standard_10728",
"stats_standard_3232",
"stats_standard_1889",
"stats_standard_1631"]
heads = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
for season in seasons_url:
url = season
response = requests.get(url, headers=heads)
html = response.text
soup = BeautifulSoup(html, "html.parser")
for id in ids:
tables = (soup.find(id=id))
if tables is not None:
table = tables
for table_per_season in table:
df = pd.read_html(str(table))[0]
writer = pd.ExcelWriter(f"{team_name}{id}.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name=f"{id}", index=True)
writer.save()
I believe your issue is this bit of code:我相信你的问题是这段代码:
if tables is not None:
table=tables
First, if tables
is None
on the first time then you have an error table is called before assignment
.首先,如果tables
第一次是None
那么你有一个错误table is called before assignment
。
But more importantly, if tables
is None
during the loop, it will not tell you it didn't get the data but will just write the previous value of table
which explains why you have several time the same data.但更重要的是,如果tables
在循环期间为None
,它不会告诉您它没有获取数据,而只会写入table
的先前值,这解释了为什么您有多次相同的数据。
Depending on what you expect from your function, you could put the writing part in the if
condition so it writes only when you found data and print a message otherwise.根据您对 function 的期望,您可以将写入部分放在if
条件中,以便它仅在您找到数据时写入,否则打印一条消息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.