How to find an html element using Beautiful Soup and regex strings

Question

I am trying to find the following <li> element in an html document using python 3, beautiful soup and regex strings.

<li style="text-indent:0pt; margin-top:0pt; margin-bottom:0pt;" value="394">KEANE J.
The plaintiff is a Sri Lankan national of Tamil ethnicity.  While he was a
passenger on a vessel travelling from India to
Australia, that vessel ("the
Indian vessel") was intercepted by an Australian border protection vessel ("the
Commonwealth ship")
in Australia's contiguous
zone<span class="sup"><b><a name="fnB313" href="http://www.austlii.edu.au/au/cases/cth/HCA/2015/1.html#fn313">[313]</a></b></span>. 
</li>

I have tried using the following find_all function, which returns an empty list.

html.find_all('li', string='KEANE J.')

I have also tried the find function with regex, which returns a none object:

html.find('li', string=re.compile(r'^KEANE\sJ\.\s'))

How would I find this element in the html document?

Answer 1

it has something to do with the element present?

Absolutely, in this case, aside from the text node, the li element has other children. This is documented in the .string paragraph :

If a tag contains more than one thing, then it's not clear what .string should refer to, so .string is defined to be None

What you can do is to locate the text node itself and then get its parent:

li = html.find(string=re.compile(r'^KEANE\sJ\.\s')).parent
print(li)

How to find an html element using Beautiful Soup and regex strings

Question

1 answers

solution1
1 ACCPTED 2016-09-24 13:02:57

How to find an html element using Beautiful Soup and regex strings

Question

1 answers

solution1 1 ACCPTED 2016-09-24 13:02:57

solution1
1 ACCPTED 2016-09-24 13:02:57