[英]Create multiple tables from multiple web sites
I'm in the process of creating a table with pandas that contain a certain value.我正在创建一个包含特定值的 pandas 表。 For example I want to paste the links from different years of the Premier League and get in multiple rows how a particular team is doing that year.例如,我想粘贴英超联赛不同年份的链接,并在多行中获取特定球队当年的表现。 I would also like to have the link in the first column from which the information comes.我还希望在信息来源的第一列中有链接。
import requests
import pandas as pd
url = 'https://www.skysports.com/premier-league-table'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
contain = df[df["Team"].str.contains("Liverpool")]
print(contain)
Here I already have the first approach for a specific year.在这里,我已经有了特定年份的第一种方法。 So I'm told here how Liverpool is doing this year.所以我在这里被告知利物浦今年的表现如何。 However, I would still like to get more information on how Liverpool has fared in the other years.但是,我仍然想了解更多关于利物浦在其他年份表现如何的信息。 For example for the year 21/22 ( https://www.skysports.com/premier-league-table/2021 ).例如 21/22 年 ( https://www.skysports.com/premier-league-table/2021 )。
So I would like to add another row with the dates for 21/22, 20/21, etc.. At the end there should be several rows of dates with the information and the source.所以我想添加另一行日期为 21 月 22 日、20 月 21 日等。最后应该有几行包含信息和来源的日期。
At the moment I get this:
# Team Pl W ... A GD Pts Last 6
9 10 Liverpool 8 2 ... 12 8 10 NaN
I would like to get this:
# Team Pl W ... A GD Pts Last 6 Link
9 10 Liverpool 8 2 ... 12 8 10 NaN https://www.sky...
1 2 Liverpool 8 28 ... 12 68 92 NaN https://www.sky...
...
You can create a one-column df and merge it by the default index 0您可以创建一个单列 df 并按默认索引 0 合并它
urldf=pd.DataFrame([url],columns=["Link"])
contain=contain.reset_index()
contain = pd.merge(contain,urldf,left_index=True,right_index=True)
Here is a related question Merge two dataframes by index这是一个相关的问题Merge two dataframes by index
You can do this for all the years and use pandas.concat
to make the desired outcome dataframe您可以一直这样做,并使用pandas.concat
来获得所需的结果 dataframe
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.