简体   繁体   English

从多个 web 站点创建多个表

[英]Create multiple tables from multiple web sites

I'm in the process of creating a table with pandas that contain a certain value.我正在创建一个包含特定值的 pandas 表。 For example I want to paste the links from different years of the Premier League and get in multiple rows how a particular team is doing that year.例如,我想粘贴英超联赛不同年份的链接,并在多行中获取特定球队当年的表现。 I would also like to have the link in the first column from which the information comes.我还希望在信息来源的第一列中有链接。

import requests
import pandas as pd

url = 'https://www.skysports.com/premier-league-table'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]

contain = df[df["Team"].str.contains("Liverpool")]

print(contain)

Here I already have the first approach for a specific year.在这里,我已经有了特定年份的第一种方法。 So I'm told here how Liverpool is doing this year.所以我在这里被告知利物浦今年的表现如何。 However, I would still like to get more information on how Liverpool has fared in the other years.但是,我仍然想了解更多关于利物浦在其他年份表现如何的信息。 For example for the year 21/22 ( https://www.skysports.com/premier-league-table/2021 ).例如 21/22 年 ( https://www.skysports.com/premier-league-table/2021 )。

So I would like to add another row with the dates for 21/22, 20/21, etc.. At the end there should be several rows of dates with the information and the source.所以我想添加另一行日期为 21 月 22 日、20 月 21 日等。最后应该有几行包含信息和来源的日期。

At the moment I get this:
    #       Team  Pl  W  ...   A  GD  Pts  Last 6
9  10  Liverpool   8  2  ...  12   8   10     NaN
I would like to get this:
    #       Team  Pl  W  ...   A  GD  Pts  Last 6  Link
9  10  Liverpool   8  2  ...  12   8   10     NaN  https://www.sky...
1  2   Liverpool   8  28 ...  12  68   92     NaN  https://www.sky...
...

You can create a one-column df and merge it by the default index 0您可以创建一个单列 df 并按默认索引 0 合并它

urldf=pd.DataFrame([url],columns=["Link"]) 
contain=contain.reset_index()
contain = pd.merge(contain,urldf,left_index=True,right_index=True)

Here is a related question Merge two dataframes by index这是一个相关的问题Merge two dataframes by index

You can do this for all the years and use pandas.concat to make the desired outcome dataframe您可以一直这样做,并使用pandas.concat来获得所需的结果 dataframe

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM