BeautifulSoup 僅檢索第一個元素

Question

我目前正在嘗試 webscrap 一些網站。

這是我的代碼的一部分：

import pandas as pd
from bs4 import BeautifulSoup
import requests


#Get the source code
r = requests.get('https://fbref.com/fr/comps/13/stats/Statistiques-Ligue-1#all_stats_standard')
c = r.content
soup = BeautifulSoup(c, "html.parser")
print(soup.prettify())

for a in soup:
    print (soup.find("td", {"data-stat" : "avg_age"}).text)

基本上，我在“湯”中有完整的源代碼。 但是，當我調用諸如“td”，{“data-stat”：“avg_age”}之類的元素時，我只會將第一行{“data-row”：“0”}的重復結果作為output：

29.1
29.1
29.1
29.1
29.1

所以這是我的問題：

-> 為什么我的代碼卡在第一行，而我的“湯”變量中沒有預選？

-> 有沒有辦法制作一個循環，每次都可以檢查所有想要的元素是否有不同的行？ 例如，“數據行”：“0”到“數據行”：19。

感謝您的支持，祝您有美好的一天！

Answer 1

它卡在第一行有幾個原因：

您正在使用.find() ，它只返回它在 html 湯 object 中“找到”的第一個元素。
你永遠不會遍歷任何東西。 soup.find("td", {"data-stat": "avg_age"}).text將始終返回相同的內容。 看看你的循環。

本質上，這與您在那里的邏輯相同：

for x in [1, 2, 3, 4]:
    print(1)

當它遍歷該列表時，它只會打印1並且您將在控制台中獲得1 4 次。

您需要使用soup.find_all('tr')獲取soup中的所有行。 然后當你迭代時，如果有一個<td> class 屬性data-stat="avg_age" ，那么你才想要.find()它並獲取文本。

import pandas as pd
from bs4 import BeautifulSoup
import requests


#Get the source code
r = requests.get('https://fbref.com/fr/comps/13/stats/Statistiques-Ligue-1#all_stats_standard')
c = r.content
soup = BeautifulSoup(c, "html.parser")
print(soup.prettify())


rows = soup.find_all('tr')
for a in rows:
    if a.find("td", {"data-stat" : "avg_age"}):
        print (a.find("td", {"data-stat" : "avg_age"}).text)

Output：

29.1
26.8
29.4
26.8
27.8
26.2
27.2
25.8
26.0
26.9
24.8
25.5
26.9
25.9
27.6
24.5
26.3
28.8
25.6
26.7
26.1
28.2
26.9
26.6
26.0
27.7
28.0
26.8
29.9
25.5
27.1
27.1
27.1
27.2
27.0
27.0
25.1
25.8
25.9
25.8

請注意， pandas ' .read_html()在后台使用 bs4 來解析<table>標簽。 用那個。 這更容易。

import pandas as pd

df = pd.read_html('https://fbref.com/fr/comps/13/stats/Statistiques-Ligue-1#all_stats_standard', header=1)[0]

Output：

print(df)
           Équipe  # JC   Âge  Poss  MJ  ...  xG.1  xA.1  xG+xA  npxG.1  npxG+xA.1
0         Ajaccio    18  29.1  34.5   2  ...  0.59  0.14   0.73    0.20       0.34
1          Angers    18  26.8  55.0   2  ...  1.00  0.49   1.49    1.00       1.49
2         Auxerre    15  29.4  39.5   2  ...  0.43  0.43   0.85    0.43       0.85
3           Brest    18  26.8  42.5   2  ...  0.63  0.23   0.86    0.23       0.47
4   Clermont Foot    18  27.8  48.5   2  ...  0.17  0.07   0.24    0.17       0.24
5            Lens    16  26.2  63.0   2  ...  1.48  0.94   2.41    1.08       2.02
6           Lille    18  27.2  65.0   2  ...  2.02  1.65   3.66    2.02       3.66
7         Lorient    14  25.8  36.0   1  ...  0.37  0.26   0.63    0.37       0.63
8            Lyon    15  26.0  68.0   1  ...  1.52  0.49   2.00    0.73       1.22
9       Marseille    17  26.9  55.0   2  ...  1.10  0.89   1.99    1.10       1.99
10         Monaco    19  24.8  40.5   2  ...  2.75  1.21   3.96    2.36       3.57
11    Montpellier    19  25.5  47.5   2  ...  0.93  0.66   1.59    0.93       1.59
12         Nantes    16  26.9  40.5   2  ...  1.37  0.60   1.97    1.37       1.97
13           Nice    18  25.9  54.0   2  ...  0.49  0.40   0.88    0.49       0.88
14      Paris S-G    18  27.6  60.0   2  ...  3.05  1.76   4.81    2.27       4.03
15          Reims    18  24.5  43.0   2  ...  0.54  0.42   0.96    0.54       0.96
16         Rennes    17  26.3  65.0   2  ...  1.86  1.15   3.01    1.86       3.01
17     Strasbourg    18  28.8  49.5   2  ...  0.60  0.57   1.17    0.60       1.17
18       Toulouse    18  25.6  57.0   2  ...  0.58  0.58   1.15    0.58       1.15
19         Troyes    16  26.7  39.0   2  ...  0.91  0.23   1.14    0.52       0.75

[20 rows x 29 columns]

僅打印年齡列： print(df['Âge'])

BeautifulSoup 僅檢索第一個元素

問題描述

1 個解決方案

解決方案1
1 已采納 2022-08-15 09:38:07

BeautifulSoup 僅檢索第一個元素

問題描述

1 個解決方案

解決方案1 1 已采納 2022-08-15 09:38:07

解決方案1
1 已采納 2022-08-15 09:38:07