简体   繁体   中英

How to find an html element using Beautiful Soup and regex strings

I am trying to find the following <li> element in an html document using python 3, beautiful soup and regex strings.

<li style="text-indent:0pt; margin-top:0pt; margin-bottom:0pt;" value="394">KEANE J.
The plaintiff is a Sri Lankan national of Tamil ethnicity.  While he was a
passenger on a vessel travelling from India to
Australia, that vessel ("the
Indian vessel") was intercepted by an Australian border protection vessel ("the
Commonwealth ship")
in Australia's contiguous
zone<span class="sup"><b><a name="fnB313" href="http://www.austlii.edu.au/au/cases/cth/HCA/2015/1.html#fn313">[313]</a></b></span>. 
</li>

I have tried using the following find_all function, which returns an empty list.

html.find_all('li', string='KEANE J.')

I have also tried the find function with regex, which returns a none object:

html.find('li', string=re.compile(r'^KEANE\sJ\.\s'))

How would I find this element in the html document?

it has something to do with the element present?

Absolutely, in this case, aside from the text node, the li element has other children. This is documented in the .string paragraph :

If a tag contains more than one thing, then it's not clear what .string should refer to, so .string is defined to be None

What you can do is to locate the text node itself and then get its parent:

li = html.find(string=re.compile(r'^KEANE\sJ\.\s')).parent
print(li)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM