简体   繁体   English

使用 beautifulsoup 进行网页抓取时,我无法使用soup.findAll() 找到相关数据

[英]I am not able to find relevent data using soup.findAll() while doing webscraping using beautifulsoup

I am trying to scrape data(names,ages,teams) from this website- https://sofifa.com/players?offset=0 .我正在尝试从该网站上抓取数据(姓名、年龄、团队)- https://sofifa.com/players?offset=0 While I was trying to find the relevent data using soup.findAll(), I am getting an empty list .当我尝试使用soup.findAll() 查找相关数据时,我得到了一个空列表

    import pandas as pd
    import re
    import requests
    from bs4 import BeautifulSoup

    k=[]
    url="https://sofifa.com/players?offset=0"
    resp=requests.get(url)
    soup=BeautifulSoup(resp.content,'lxml')
    for omk in soup.find_all('><div class="bp3-text-overflow-ellipsis">'):
      k.append(str(omk))
    print(k)

I read some answers which had mentioned about tags and class but I don't know about these are.我阅读了一些提到标签和类的答案,但我不知道这些。

According to your question, here is an example of working solution:根据您的问题,这是一个工作解决方案的示例:

Code:代码:

import pandas as pd
import re
import requests
from bs4 import BeautifulSoup

k = []
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}

url = "https://sofifa.com/players?offset=0"
resp = requests.get(url, headers = headers)
soup = BeautifulSoup(resp.content, 'lxml')
for omk in soup.select('table.table.table-hover.persist-area tbody tr'):
    name = omk.select_one('td.col-name a:nth-child(1) div').get_text(strip=True)
    print(name)

Output:输出:

M. Sarr
É. Mendy
P. Daka
F. Wirtz
J. Timber        
C. De Ketelaere  
Cristiano Ronaldo
D. Maldini       
J. Bellingham    
Lucas Paquetá    
Gavi
Antony
A. Spörle        
K. Adeyemi       
E. Haaland       
D. Kamada     
M. Salah      
N. Madueke    
A. Tchouaméni 
M. Greenwood  
M. Lacroix    
R. Gravenberch
Pedri
J. Gvardiol   
N. Lang
Raphinha
A. Hložek
J. Musiala
F. Chiesa
L. Messi
B. Brereton Díaz
R. Cherki
D. Vlahović
Ansu Fati
Pedro Benito
G. Raspadori
Yeremy Pino
Y. Tielemans
K. Mbappé
E. Camavinga
D. Scarlett
A. Bastoni
J. Sancho
T. Hernández
A. Davies
J. Koundé
A. Saint-Maximin
H. Elliott
S. Tonali
A. Broja
A. Isak
M. Vandevoordt
P. Foden
F. Kessié
J. Doku
E. Tapsoba
K. Mitoma
Luiz Felipe
Nuno Mendes
S. Dest

There are a couple issues with your code snippet.您的代码片段存在一些问题。

The first is that you need to specify an HTML parser when instantiating your BeautifulSoup instance:首先是您需要在实例化BeautifulSoup实例时指定一个 HTML 解析器:

soup=BeautifulSoup(resp.content,'html.parser')

Then, when searching for a div element with a class of bp3-text-overflow-ellipsis , the proper syntax is the following:然后,在搜索具有bp3-text-overflow-ellipsis类的div元素时,正确的语法如下:

soup.find_all("div", class_="bp3-text-overflow-ellipsis")

Here is the documentation related to find_all :https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all这是与find_all相关的文档:https : find_all

Your way of using .find_all() is incorrect.您使用.find_all()方式不正确。 For correct usage refer the docs here .有关正确用法,请参阅此处的文档。

This is how you need to do.这是你需要做的。

import pandas as pd
import requests
from bs4 import BeautifulSoup

url="https://sofifa.com/players?offset=0"
resp=requests.get(url)
soup=BeautifulSoup(resp.content,'lxml')

t = soup.find('tbody', class_='list')
items = []
for i in t.find_all('tr'):
    name = i.find('td', class_='col-name').find('a', class_='tooltip').text.strip()
    age = i.find('td', class_='col-ae').text.strip()
    team = i.find_all('td', class_='col-name')[-1].find('a').text.strip()
    items.append([name, age, team])

df = pd.DataFrame(items,columns=['Name', 'Age', 'Team'])
print(df)
                 Name Age                    Team
0          C. Musonda  24                 Chelsea
1          J. Álvarez  21             River Plate
2           Ansu Fati  18            FC Barcelona
3       Arthur Cabral  23           FC Basel 1893
4      A. Schjelderup  17         FC Nordsjælland
5           J. Correa  26                   Inter
6            F. Wirtz  18     Bayer 04 Leverkusen
.
.
.
58            A. Isak  21           Real Sociedad
59        Eric García  20            FC Barcelona

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM