简体   繁体   中英

How do I match a tag containing only the stated class, not any others, using BeautifulSoup?

Is there a way to use BeautifulSoup to match a tag with only the indicated class attribute, not the indicated class attribute and others? For example, in this simple HTML:

<html>
 <head>
  <title>
   Title here
  </title>
 </head>
 <body>
  <div class="one two">
   some content here
  </div>
  <div class="two">
   more content here
  </div>
 </body>
</html>

is it possible to match only the div with class="two" , but not match the div with class="one two" ? Unless I'm missing something, that section of the documentation doesn't give me any ideas. This is the code I'm using currently:

from bs4 import BeautifulSoup

html = '''
<html>
 <head>
  <title>
   Title here
  </title>
 </head>
 <body>
  <div class="one two">
   should not be matched
  </div>
  <div class="two">
   this should be matched
  </div>
 </body>
</html>
'''

soup = BeautifulSoup(html)
div_two = soup.find("div", "two")
print(div_two.contents[0].strip())

I'm trying to get this to print this should be matched instead of should not be matched .

EDIT: In this simple example, I know that the only options for classes are "one two" or "two" , but in production code, I'll only know that what I want to match will have class "two" ; other tags could have a large number of other classes in addition to "two" , which may not be known.

On a related note, it's also helpful to read the documentation for version 4 , not version 3 as I previously linked.

Try:

divs = soup.findAll('div', class="two")

for div in divs:
    if div['class'] == ['two']:
        pass # handle class="two"
    else:
        pass # handle other cases, including but not limited to "one two"

Hope, below code helps you. Though I didn't try this one.

soup.find("div", { "class" : "two" })

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM