During some tests I did I noticed that the tag a inside the div tag, beautifulsoup automatically "translates" it as text:
<div class='a'>
<a href='....'>TEXT</a>
<i .....
</div>
When i search div tag with the command find_all('div', {'class': 'a'})
and i try to print the result div.a
, bs4 show me value None ... but if i try use print div.text
bs4 show only TEXT and not the tag a ).
this is section code:
soup = BeautifulSoup(html, 'lxml')
data=soup.find_all('div', {'class' : 'a'})
for div in data:
print div.a
$ None
Why?
UPDATE : just now I've noticed... here's another problem. In the source code there is the tag a ... but now, (seeing the output with prettify) I realized that that tag bs4 makes me see it as a div , when in reality it is a tag a ! Strange!!!
BUG???
SOLVED I did some cleaning and deleted all packages for requests and urllib3... then i reinstalled everything with apt and now works. The packages versions of requests and urllib3 is (respectively): 2.12.4-1 and 1.19.1-1
I couldn't replicate your problem, but there's a typo in the HTML you're using: clas='a'
should be class='a'
.
The code I used:
from bs4 import BeautifulSoup
html = '''<div class='a'>
<a href='....'>TEXT</a>
<i> .....</i>
</div>'''
soup = BeautifulSoup(html, 'html.parser')
data = soup.find_all('div', {'class': 'a'})
for div in data:
print (div.a)
The output I got:
<a href="....">TEXT</a>
The .text
will not show the tags, but only the text inside the selected tag and its children. You also need the BeautifulSoup
object in order to use the find_all
method
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.