Is there a way to use BeautifulSoup to match a tag with only the indicated class
attribute, not the indicated class
attribute and others? For example, in this simple HTML:
<html>
<head>
<title>
Title here
</title>
</head>
<body>
<div class="one two">
some content here
</div>
<div class="two">
more content here
</div>
</body>
</html>
is it possible to match only the div
with class="two"
, but not match the div
with class="one two"
? Unless I'm missing something, that section of the documentation doesn't give me any ideas. This is the code I'm using currently:
from bs4 import BeautifulSoup
html = '''
<html>
<head>
<title>
Title here
</title>
</head>
<body>
<div class="one two">
should not be matched
</div>
<div class="two">
this should be matched
</div>
</body>
</html>
'''
soup = BeautifulSoup(html)
div_two = soup.find("div", "two")
print(div_two.contents[0].strip())
I'm trying to get this to print this should be matched
instead of should not be matched
.
EDIT: In this simple example, I know that the only options for classes are "one two"
or "two"
, but in production code, I'll only know that what I want to match will have class "two"
; other tags could have a large number of other classes in addition to "two"
, which may not be known.
On a related note, it's also helpful to read the documentation for version 4 , not version 3 as I previously linked.
Try:
divs = soup.findAll('div', class="two")
for div in divs:
if div['class'] == ['two']:
pass # handle class="two"
else:
pass # handle other cases, including but not limited to "one two"
Hope, below code helps you. Though I didn't try this one.
soup.find("div", { "class" : "two" })
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.