繁体   English   中英

使用 BeautifulSoup 抓取多个表

[英]Scraping multiple tables with BeautifulSoup

我怎样才能从这个 url https://www.soccerstats.com/pmatch.asp?league=argentina3&stats=114-3-8-2022-almagro-d.-de-belgrano 的“目标时间”表中获得数字目标?

PS:主页面是https://www.soccerstats.com/matches.asp?matchday=1

我能够找到表格但是当我尝试获取统计信息时没有任何变化

代码

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}
s = requests.Session()
s.headers.update(headers)

response = requests.get('https://www.soccerstats.com/pmatch.asp?league=argentina3&stats=114-3-8-2022-almagro-d.-de-belgrano', headers=headers)
if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')
else:
    pass

for ta in soup.findAll('table'):
    for s in ta.findPreviousSiblings():
            if s.name == 'h2':
                    if s.text == 'Goal times':
                            goal_scoring_stats_table = ta
                    else:
                            break

for ta in goal_scoring_stats_table.findAll('table'):
    for s in ta.findPreviuosSiblings():
        if s.name == 'b':
            if s.text == 'Home':
                print(ta)     

您可以使用pandas获取所有表,然后找出您想要的表。 最后,根据自己的喜好按摩桌子。

例如:

import pandas as pd
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:98.0) Gecko/20100101 Firefox/98.0",
}
url = "https://www.soccerstats.com/pmatch.asp?league=argentina3&stats=114-3-8-2022-almagro-d.-de-belgrano"
df = pd.read_html(requests.get(url, headers=headers).text, flavor="lxml")[106]
print(df)

Output:

           0    1    2   3     4
0       0-15   GF  0.0 NaN   NaN
1       0-15   GA  0.0 NaN   NaN
2      16-30   GF  3.0 NaN   NaN
3      16-30   GA  1.0 NaN   NaN
4      31-45   GF  1.0 NaN   NaN
5      31-45   GA  0.0 NaN   NaN
6        NaN  NaN  NaN NaN   NaN
7      46-60   GF  0.0 NaN   NaN
8      46-60   GA  0.0 NaN   NaN
9      61-75   GF  0.0 NaN   NaN
10     61-75   GA  0.0 NaN   NaN
11     76-90   GF  0.0 NaN   NaN
12     76-90   GA  1.0 NaN   NaN
13       NaN  NaN  NaN NaN   NaN
14  1st half   GF  4.0 NaN  100%
15  1st half   GA  1.0 NaN   50%
16       NaN  NaN  NaN NaN   NaN
17  2nd half   GF  0.0 NaN    0%
18  2nd half   GA  1.0 NaN   50%

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM