For example, I have:
<a class="banana" href="http://example.com">link1</a>
<a href="http://example2.com" class="banana"><img ... /></a>
<a class="banana">link2</a>
<a href="http://google.com">link3</a>
How I can get:
['<a href="http://example2.com" class="banana"><img ... /></a>','<a href="http://google.com">link3</a>']
You can use css selector a[href]
to get a
tags with href
attribute:
h = '''
<a class="banana" href="http://example.com">link1</a>
<a href="http://example2.com" class="banana"><img ... /></a>
<a class="banana">link2</a>
<a href="http://google.com">link3</a>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(h)
print(soup.select('a[href]'))
output:
[<a class="banana" href="http://example.com">link1</a>,
<a class="banana" href="http://example2.com"><img ...=""/></a>,
<a href="http://google.com">link3</a>]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.