I have been googling and looking at other question here on search for a string in a BeautifulSoup object.
Per my searching, the following should detect the string - but it doesn't:
strings = soup.find_all(string='Results of Operations and Financial Condition')
However, the following detects the string:
tags = soup.find_all('div',{'class':'info'})
for tag in tags:
if re.search('Results of Operations and Financial Condition',tag.text):
''' Do Something'''
Why does one work and the other not?
You might want to use:
strings = soup.find_all(string=lambda x: 'Results of Operations and Financial Condition' in x)
This happens because the implementation of find_all
looks for the string you search to match exactly. I suppose you might have some other text next to 'Results of Operations and Financial Condition'
.
If you check the docs here you can see that you can give a function to that string
param and it seems that the following lines are equivalent:
soup.find_all(string='Results of Operations and Financial Condition')
soup.find_all(string=lambda x: x == 'Results of Operations and Financial Condition')
For this code
page = urllib.request.urlopen('https://en.wikipedia.org/wiki/Alloxylon_pinnatum')
sp = bs4.BeautifulSoup(page)
print(sp.find_all(string=re.compile('The pinkish-red compound flowerheads'))) # You need to use like this to search within text nodes.
print(sp.find_all(string='The pinkish-red compound flowerheads, known as'))
print(sp.find_all(string='The pinkish-red compound flowerheads, known as ')) #notice space at the end of string
Results are -
['The pinkish-red compound flowerheads, known as ']
[]
['The pinkish-red compound flowerheads, known as ']
It looks like string
argument searches for exact full string match, not whether some HTML text node contains that string, but exact value of the HTML text node . You can however use regular expressions to search whether a text node contains some string, as shown in above code.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.