简体   繁体   中英

Why doesn't checking an attribute's value imply that the attribute exists?

While using BeautifulSoup I very often have to condition some action on the value of a tag's class. For example, imagine that I want to do some action on <p> only when the attribute is class="box" , and a different action otherwise. What I do is:

soup = BeautifulSoup('''
<html><body>
<h1>Titolo</h1>
<p>Testo che sta sotto il titolo</p>
<p class="sidenote">Questo da stampare</p>
<p>Questo è il testo della nota marginale</p>
</body></html>
''',"lxml")

for sel in soup.body:
    if not isinstance(sel,NavigableString) and \
       "class" in sel.attrs and "sidenote" in sel["class"]:
        print(sel)
    else
        print("not found")

This is a bit clumsy. I wonder if there's a way to make the condition a little more compact than this. The ideal would be that a check on the final condition (that class contains sidenote ) implied that the element does have a class attribute and, consequently, it is a tag, not a NavigableString.

Using a ternary operator would definitely remove some bulk, though not everything at once. The number of if-else's would still be same though. So, not sure if that would work for you.

http://book.pythontips.com/en/latest/ternary_operators.html

So the current code would become:

if "class" in div.attrs:
   div.do_something if "box" in div["class"] else div.do_something_else
else:
   div.do_something_else

or if you want to compress it further (though I feel that would affect readability):

(div.do_something if "box" in div["class"] else div.do_something_else) if "class" in div.attrs else div.do_something_else

Like for Python dictionaries, you can use the get method instead of accessing the element using [...] . This way, it does not raise a KeyError if the element is not present but just returns None . Also, you can provide a default value, so you can simplify the code to:

for sel in soup.body:
    if not isinstance(sel,NavigableString) and \
           "sidenote" in sel.get("class", []):
        print(sel)
    else
        print("not found")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM