简体   繁体   中英

Finding a string within a list

So very stuck

Firstly my HTML is very hard. Sometimes it has missing data like below. My purpose is the get the text after strong (so GOOD, 1:56:5, 1:56.5 etc etc).

Since the data is jumbled, i potentially want nested if statements so when i construct list my data is true (see below code)

Missing data HTML

<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>

Normal HTML

<td><strong>Track Rating:</strong> GOOD</td>
<td><strong>Gross Time:</strong> 2:29:6</td>
<td><strong>Mile Rate:</strong> 1:58:6</td>
<td><strong>Lead Time:</strong> 30.3</td>

My code is below where i want to extract the data from my if statement yet im stuck. Any help appreciated. What im trying to do is collect GOOD here and store it in trackrating and do that for every tracking rating i scrape - if it doesnt exist, i want to store it as blank.

tableoftimes = race.find('table', class_='raceTimes')
                for row in tableoftimes.find_all('tr'):
                    string23 = [td.get_text() for td in row.find_all('td')]
                    matching = [s for s in string23 if "Track Rating: " in s]
                    if matching:
                        trackrating = matching (#want to split to get after : but wont work in list)
                    else:
                        trackrating = ''


Try Using.

from bs4 import BeautifulSoup

html = """<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>"""

soup = BeautifulSoup(html, 'html.parser')
for td in soup.find_all('td'):
    if td.find('strong'):         #Check for `strong` tag 
        if td.strong.text == 'Track Rating:':
            print(td.find(text=True, recursive=False))   #Get direct text

Output:

GOOD

If you have BS4 4.7.1 or above you can try that following code.

Try following css selector it will identify all the strong tag conatins : under td tag and then get the parent tag td and then use contents[-1] to get the value

Code :

html='''<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>'''

soup=BeautifulSoup(html,'html.parser')

for item in soup.select('td>strong:contains(":")'):
    print(item.parent.contents[-1].strip())

Output :

GOOD
1:56:5
1:56:5

Alternatively You can use next_element as well after finding the strong tag.first next_element is the strong tag and second next_element prints the value after strong tag

html='''<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>'''

soup=BeautifulSoup(html,'html.parser')

for item in soup.select('td>strong:contains(":")'):
    print(item.next_element.next_element.strip())

Output :

GOOD
1:56:5
1:56:5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM