<h2 class="hello-word"><a href="http://www.google.com">Google</a></h2>
How do I grab the value of the a
tag (Google)?
print soup.select("h2 > a")
returns the entire a tag and I just want the value. Also, there could be multiple H2s on the page. How do I filter for the one with the class hello-word
?
You can use .hello-word
on h2
in the CSS Selector, to select only h2
tags with class hello-word
and then select its child a
. Also soup.select()
returns a list of all possible matches, so you can easily iterate over it and call each elements .text
to get the text. Example -
for i in soup.select("h2.hello-word > a"):
print(i.text)
Example/Demo (I added a few of my own elements , one with a slightly different class to show the working of the selector) -
>>> from bs4 import BeautifulSoup
>>> s = """<h2 class="hello-word"><a href="http://www.google.com">Google</a></h2>
... <h2 class="hello-word"><a href="http://www.google.com">Google12</a></h2>
... <h2 class="hello-word2"><a href="http://www.google.com">Google13</a></h2>"""
>>> soup = BeautifulSoup(s,'html.parser')
>>> for i in soup.select("h2.hello-word > a"):
... print(i.text)
...
Google
Google12
Try this:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<h2 class="hello-word"><a href="http://www.google.com">Google</a></h2>', 'html.parser')
>>> soup.text
'Google'
You also can use lxml.html
library instead
>>> import lxml.html
>>> from lxml.cssselect import CSSSelector
>>> txt = '<h2 class="hello-word"><a href="http://www.google.com">Google</a></h2>'
>>> tree = lxml.html.fromstring(txt)
>>> sel = CSSSelector('h2 > a')
>>> element = sel(tree)[0]
>>> element.text
Google
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.