簡體   English   中英

Web 使用 Python 和 Pandas 抓取 - 分頁

[英]Web scraping with Python and Pandas - Pagination

使用這個簡短的代碼,我可以從表中獲取數據:

import pandas as pd

df=pd.read_html('https://www.worldathletics.org/records/toplists/middle-long/800-metres/indoor/men/senior/2023?regionType=world&timing=electronic&page=1&bestResultsOnly=false&oversizedTrack=regular',parse_dates=True)

df[0].to_csv('2023_I_M_800.csv')

我正在嘗試從所有頁面或確定數量的頁面獲取數據,但由於該網站不使用 lu 或 li elementsIdon't know exacxtly how to build it.

任何幫助或想法將不勝感激。

嘗試這個:

for page in range(1, 10):
    df=pd.read_html(f'https://www.worldathletics.org/records/toplists/middle-long/800-metres/indoor/men/senior/2023?regionType=world&timing=electronic&page={page}&bestResultsOnly=false&oversizedTrack=regular',parse_dates=True)

    df[0].to_csv(f'2023_I_M_800_page_{page}.csv')

由於concat包含頁碼,為什么不直接進行循環和連接呢?

`https://www.worldathletics.org/records/toplists/middle-long/800-metres/indoor/men/senior/2023?regionType=world&timing=electronic& page=1 &bestResultsOnly=false&oversizedTrack=regular

import pandas as pd
​
F, L = 1, 4 # first and last pages
​
dico = {}
for page in range(F, L+1):
    url = f'https://www.worldathletics.org/records/toplists/middle-long/800-metres/indoor/men/senior/2023?regionType=world&timing=electronic&page={page}&bestResultsOnly=false&oversizedTrack=regular'
    sub_df = pd.read_html(url, parse_dates=True)[0]
    sub_df.insert(0, "page_number", page)
    dico[page] = sub_df
    ​
out = pd.concat(dico, ignore_index=True)
# out.to_csv('2023_I_M_800.csv') # <- uncomment this line to make a .csv

注意:您可以使用鍵索引符號單獨訪問每個sub_dfdico[num_page]

Output:

print(out)

     page_number  Rank  ...         Date Results Score
0              1     1  ...  22 JAN 2023          1230
1              1     2  ...  22 JAN 2023          1204
2              1     3  ...  29 JAN 2023          1204
3              1     4  ...  27 JAN 2023          1192
4              1     5  ...  28 JAN 2023          1189
..           ...   ...  ...          ...           ...
395            4   394  ...  21 JAN 2023           977
396            4   394  ...  28 JAN 2023           977
397            4   398  ...  27 JAN 2023           977
398            4   399  ...  28 JAN 2023           977
399            4   399  ...  29 JAN 2023           977

[400 rows x 11 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM