While using BeautifulSoup
I very often have to condition some action on the value of a tag's class. For example, imagine that I want to do some action on <p>
only when the attribute is class="box"
, and a different action otherwise. What I do is:
soup = BeautifulSoup('''
<html><body>
<h1>Titolo</h1>
<p>Testo che sta sotto il titolo</p>
<p class="sidenote">Questo da stampare</p>
<p>Questo è il testo della nota marginale</p>
</body></html>
''',"lxml")
for sel in soup.body:
if not isinstance(sel,NavigableString) and \
"class" in sel.attrs and "sidenote" in sel["class"]:
print(sel)
else
print("not found")
This is a bit clumsy. I wonder if there's a way to make the condition a little more compact than this. The ideal would be that a check on the final condition (that class
contains sidenote
) implied that the element does have a class
attribute and, consequently, it is a tag, not a NavigableString.
Using a ternary operator would definitely remove some bulk, though not everything at once. The number of if-else's would still be same though. So, not sure if that would work for you.
http://book.pythontips.com/en/latest/ternary_operators.html
So the current code would become:
if "class" in div.attrs:
div.do_something if "box" in div["class"] else div.do_something_else
else:
div.do_something_else
or if you want to compress it further (though I feel that would affect readability):
(div.do_something if "box" in div["class"] else div.do_something_else) if "class" in div.attrs else div.do_something_else
Like for Python dictionaries, you can use the get
method instead of accessing the element using [...]
. This way, it does not raise a KeyError
if the element is not present but just returns None
. Also, you can provide a default value, so you can simplify the code to:
for sel in soup.body:
if not isinstance(sel,NavigableString) and \
"sidenote" in sel.get("class", []):
print(sel)
else
print("not found")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.