简体   繁体   中英

BeautifulSoup: Can't find Tag with text it contains

I have troubles finding a tag using the text it contains on the following page:Link to web page

I am trying to find the Bloomberg and Reuters codes using the following code. Using cssSelector i tried:

css_selector = 'tr:has(> td:contains("Bloomberg Code"))'
my_tag: Tag = my_soup.select_one(css_selector)

Using find I tried:

my_tag = my_soup.find(lambda t: t.Tag == 'td' and re.findall('Bloomberg Code', t.text, flags=re.I))

They both return a massive amount of Html code, which does start by the tag "tr", but doesn't match what i was expecting to be:

<tr>
    <td style="padding-top:5px">- Bloomberg Code : </td>
    <td style="padding-left:10px;padding-top:5px" align="left">&nbsp;FLTR:ID</td>
</tr>

I think the issue might be that Beautifulsoup sees it as a navigable string, but when i check type of result found for my_tag it says: class 'bs4.element.Tag'

Thanks for the help Best

You need a User-Agent header and want the adjacent sibling td of the td which contains search term.

from bs4 import BeautifulSoup as bs
import requests

search_strings = ['Bloomberg Code :',' Reuters Code :']
r = requests.get('https://www.marketscreener.com/FLUTTER-ENTERTAINMENT-PLC-59029817/company/', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')

for search_string in search_strings:
    node = soup.select_one(f'td:contains("{search_string}") + td')
    if node is None:
        print(f'{search_string} not found')
    else:
        print(node.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM