[英]I am not able to find relevent data using soup.findAll() while doing webscraping using beautifulsoup
I am trying to scrape data(names,ages,teams) from this website- https://sofifa.com/players?offset=0 .我正在尝试从该网站上抓取数据(姓名、年龄、团队)- https://sofifa.com/players?offset=0 。 While I was trying to find the relevent data using soup.findAll(), I am getting an empty list .当我尝试使用soup.findAll() 查找相关数据时,我得到了一个空列表。
import pandas as pd
import re
import requests
from bs4 import BeautifulSoup
k=[]
url="https://sofifa.com/players?offset=0"
resp=requests.get(url)
soup=BeautifulSoup(resp.content,'lxml')
for omk in soup.find_all('><div class="bp3-text-overflow-ellipsis">'):
k.append(str(omk))
print(k)
I read some answers which had mentioned about tags and class but I don't know about these are.我阅读了一些提到标签和类的答案,但我不知道这些。
According to your question, here is an example of working solution:根据您的问题,这是一个工作解决方案的示例:
Code:代码:
import pandas as pd
import re
import requests
from bs4 import BeautifulSoup
k = []
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}
url = "https://sofifa.com/players?offset=0"
resp = requests.get(url, headers = headers)
soup = BeautifulSoup(resp.content, 'lxml')
for omk in soup.select('table.table.table-hover.persist-area tbody tr'):
name = omk.select_one('td.col-name a:nth-child(1) div').get_text(strip=True)
print(name)
Output:输出:
M. Sarr
É. Mendy
P. Daka
F. Wirtz
J. Timber
C. De Ketelaere
Cristiano Ronaldo
D. Maldini
J. Bellingham
Lucas Paquetá
Gavi
Antony
A. Spörle
K. Adeyemi
E. Haaland
D. Kamada
M. Salah
N. Madueke
A. Tchouaméni
M. Greenwood
M. Lacroix
R. Gravenberch
Pedri
J. Gvardiol
N. Lang
Raphinha
A. Hložek
J. Musiala
F. Chiesa
L. Messi
B. Brereton Díaz
R. Cherki
D. Vlahović
Ansu Fati
Pedro Benito
G. Raspadori
Yeremy Pino
Y. Tielemans
K. Mbappé
E. Camavinga
D. Scarlett
A. Bastoni
J. Sancho
T. Hernández
A. Davies
J. Koundé
A. Saint-Maximin
H. Elliott
S. Tonali
A. Broja
A. Isak
M. Vandevoordt
P. Foden
F. Kessié
J. Doku
E. Tapsoba
K. Mitoma
Luiz Felipe
Nuno Mendes
S. Dest
There are a couple issues with your code snippet.您的代码片段存在一些问题。
The first is that you need to specify an HTML parser when instantiating your BeautifulSoup
instance:首先是您需要在实例化BeautifulSoup
实例时指定一个 HTML 解析器:
soup=BeautifulSoup(resp.content,'html.parser')
Then, when searching for a div
element with a class of bp3-text-overflow-ellipsis
, the proper syntax is the following:然后,在搜索具有bp3-text-overflow-ellipsis
类的div
元素时,正确的语法如下:
soup.find_all("div", class_="bp3-text-overflow-ellipsis")
Here is the documentation related to find_all
:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all这是与find_all
相关的文档:https : find_all
Your way of using .find_all()
is incorrect.您使用.find_all()
方式不正确。 For correct usage refer the docs here .有关正确用法,请参阅此处的文档。
This is how you need to do.这是你需要做的。
import pandas as pd
import requests
from bs4 import BeautifulSoup
url="https://sofifa.com/players?offset=0"
resp=requests.get(url)
soup=BeautifulSoup(resp.content,'lxml')
t = soup.find('tbody', class_='list')
items = []
for i in t.find_all('tr'):
name = i.find('td', class_='col-name').find('a', class_='tooltip').text.strip()
age = i.find('td', class_='col-ae').text.strip()
team = i.find_all('td', class_='col-name')[-1].find('a').text.strip()
items.append([name, age, team])
df = pd.DataFrame(items,columns=['Name', 'Age', 'Team'])
print(df)
Name Age Team
0 C. Musonda 24 Chelsea
1 J. Álvarez 21 River Plate
2 Ansu Fati 18 FC Barcelona
3 Arthur Cabral 23 FC Basel 1893
4 A. Schjelderup 17 FC Nordsjælland
5 J. Correa 26 Inter
6 F. Wirtz 18 Bayer 04 Leverkusen
.
.
.
58 A. Isak 21 Real Sociedad
59 Eric García 20 FC Barcelona
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.