简体   繁体   English

使用 BeautifulSoup 抓取多个表

[英]Scraping multiple tables with BeautifulSoup

How can I get number goals from 'Goal times' table from this url https://www.soccerstats.com/pmatch.asp?league=argentina3&stats=114-3-8-2022-almagro-d.-de-belgrano ?我怎样才能从这个 url https://www.soccerstats.com/pmatch.asp?league=argentina3&stats=114-3-8-2022-almagro-d.-de-belgrano 的“目标时间”表中获得数字目标?

PS: The main page is https://www.soccerstats.com/matches.asp?matchday=1 PS:主页面是https://www.soccerstats.com/matches.asp?matchday=1

I am able to find the table but when I try to get the stats nothing change我能够找到表格但是当我尝试获取统计信息时没有任何变化

Code代码

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}
s = requests.Session()
s.headers.update(headers)

response = requests.get('https://www.soccerstats.com/pmatch.asp?league=argentina3&stats=114-3-8-2022-almagro-d.-de-belgrano', headers=headers)
if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')
else:
    pass

for ta in soup.findAll('table'):
    for s in ta.findPreviousSiblings():
            if s.name == 'h2':
                    if s.text == 'Goal times':
                            goal_scoring_stats_table = ta
                    else:
                            break

for ta in goal_scoring_stats_table.findAll('table'):
    for s in ta.findPreviuosSiblings():
        if s.name == 'b':
            if s.text == 'Home':
                print(ta)     

You could use pandas to get all the tables and then fish out the one you're after.您可以使用pandas获取所有表,然后找出您想要的表。 Finally, massage the table to your liking.最后,根据自己的喜好按摩桌子。

For example:例如:

import pandas as pd
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:98.0) Gecko/20100101 Firefox/98.0",
}
url = "https://www.soccerstats.com/pmatch.asp?league=argentina3&stats=114-3-8-2022-almagro-d.-de-belgrano"
df = pd.read_html(requests.get(url, headers=headers).text, flavor="lxml")[106]
print(df)

Output: Output:

           0    1    2   3     4
0       0-15   GF  0.0 NaN   NaN
1       0-15   GA  0.0 NaN   NaN
2      16-30   GF  3.0 NaN   NaN
3      16-30   GA  1.0 NaN   NaN
4      31-45   GF  1.0 NaN   NaN
5      31-45   GA  0.0 NaN   NaN
6        NaN  NaN  NaN NaN   NaN
7      46-60   GF  0.0 NaN   NaN
8      46-60   GA  0.0 NaN   NaN
9      61-75   GF  0.0 NaN   NaN
10     61-75   GA  0.0 NaN   NaN
11     76-90   GF  0.0 NaN   NaN
12     76-90   GA  1.0 NaN   NaN
13       NaN  NaN  NaN NaN   NaN
14  1st half   GF  4.0 NaN  100%
15  1st half   GA  1.0 NaN   50%
16       NaN  NaN  NaN NaN   NaN
17  2nd half   GF  0.0 NaN    0%
18  2nd half   GA  1.0 NaN   50%

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM