简体   繁体   中英

BeautifulSoup not show tag <a> in tag <div>

During some tests I did I noticed that the tag a inside the div tag, beautifulsoup automatically "translates" it as text:

<div class='a'>
   <a href='....'>TEXT</a>
   <i .....
</div>

When i search div tag with the command find_all('div', {'class': 'a'}) and i try to print the result div.a , bs4 show me value None ... but if i try use print div.text bs4 show only TEXT and not the tag a ).

this is section code:

soup = BeautifulSoup(html, 'lxml')
data=soup.find_all('div', {'class' : 'a'})    

for div in data:
   print div.a

$ None

Why?

UPDATE : just now I've noticed... here's another problem. In the source code there is the tag a ... but now, (seeing the output with prettify) I realized that that tag bs4 makes me see it as a div , when in reality it is a tag a ! Strange!!!

BUG???

SOLVED I did some cleaning and deleted all packages for requests and urllib3... then i reinstalled everything with apt and now works. The packages versions of requests and urllib3 is (respectively): 2.12.4-1 and 1.19.1-1

I couldn't replicate your problem, but there's a typo in the HTML you're using: clas='a' should be class='a' .

The code I used:

from bs4 import BeautifulSoup

html = '''<div class='a'>
   <a href='....'>TEXT</a>
   <i> .....</i>
</div>'''
soup = BeautifulSoup(html, 'html.parser')
data = soup.find_all('div', {'class': 'a'})
for div in data:
   print (div.a)

The output I got:

<a href="....">TEXT</a>

The .text will not show the tags, but only the text inside the selected tag and its children. You also need the BeautifulSoup object in order to use the find_all method

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM