简体   繁体   中英

Create multiple tables from multiple web sites

I'm in the process of creating a table with pandas that contain a certain value. For example I want to paste the links from different years of the Premier League and get in multiple rows how a particular team is doing that year. I would also like to have the link in the first column from which the information comes.

import requests
import pandas as pd

url = 'https://www.skysports.com/premier-league-table'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]

contain = df[df["Team"].str.contains("Liverpool")]

print(contain)

Here I already have the first approach for a specific year. So I'm told here how Liverpool is doing this year. However, I would still like to get more information on how Liverpool has fared in the other years. For example for the year 21/22 ( https://www.skysports.com/premier-league-table/2021 ).

So I would like to add another row with the dates for 21/22, 20/21, etc.. At the end there should be several rows of dates with the information and the source.

At the moment I get this:
    #       Team  Pl  W  ...   A  GD  Pts  Last 6
9  10  Liverpool   8  2  ...  12   8   10     NaN
I would like to get this:
    #       Team  Pl  W  ...   A  GD  Pts  Last 6  Link
9  10  Liverpool   8  2  ...  12   8   10     NaN  https://www.sky...
1  2   Liverpool   8  28 ...  12  68   92     NaN  https://www.sky...
...

You can create a one-column df and merge it by the default index 0

urldf=pd.DataFrame([url],columns=["Link"]) 
contain=contain.reset_index()
contain = pd.merge(contain,urldf,left_index=True,right_index=True)

Here is a related question Merge two dataframes by index

You can do this for all the years and use pandas.concat to make the desired outcome dataframe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM