I am trying to find the following <li>
element in an html document using python 3, beautiful soup and regex strings.
<li style="text-indent:0pt; margin-top:0pt; margin-bottom:0pt;" value="394">KEANE J.
The plaintiff is a Sri Lankan national of Tamil ethnicity. While he was a
passenger on a vessel travelling from India to
Australia, that vessel ("the
Indian vessel") was intercepted by an Australian border protection vessel ("the
Commonwealth ship")
in Australia's contiguous
zone<span class="sup"><b><a name="fnB313" href="http://www.austlii.edu.au/au/cases/cth/HCA/2015/1.html#fn313">[313]</a></b></span>.
</li>
I have tried using the following find_all
function, which returns an empty list.
html.find_all('li', string='KEANE J.')
I have also tried the find
function with regex, which returns a none object:
html.find('li', string=re.compile(r'^KEANE\sJ\.\s'))
How would I find this element in the html document?
it has something to do with the element present?
Absolutely, in this case, aside from the text node, the li
element has other children. This is documented in the .string
paragraph :
If a tag contains more than one thing, then it's not clear what
.string
should refer to, so.string
is defined to beNone
What you can do is to locate the text node itself and then get its parent:
li = html.find(string=re.compile(r'^KEANE\sJ\.\s')).parent
print(li)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.