So very stuck
Firstly my HTML is very hard. Sometimes it has missing data like below. My purpose is the get the text after strong (so GOOD, 1:56:5, 1:56.5 etc etc).
Since the data is jumbled, i potentially want nested if statements so when i construct list my data is true (see below code)
Missing data HTML
<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>
Normal HTML
<td><strong>Track Rating:</strong> GOOD</td>
<td><strong>Gross Time:</strong> 2:29:6</td>
<td><strong>Mile Rate:</strong> 1:58:6</td>
<td><strong>Lead Time:</strong> 30.3</td>
My code is below where i want to extract the data from my if statement yet im stuck. Any help appreciated. What im trying to do is collect GOOD here and store it in trackrating and do that for every tracking rating i scrape - if it doesnt exist, i want to store it as blank.
tableoftimes = race.find('table', class_='raceTimes')
for row in tableoftimes.find_all('tr'):
string23 = [td.get_text() for td in row.find_all('td')]
matching = [s for s in string23 if "Track Rating: " in s]
if matching:
trackrating = matching (#want to split to get after : but wont work in list)
else:
trackrating = ''
Try Using.
from bs4 import BeautifulSoup
html = """<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>"""
soup = BeautifulSoup(html, 'html.parser')
for td in soup.find_all('td'):
if td.find('strong'): #Check for `strong` tag
if td.strong.text == 'Track Rating:':
print(td.find(text=True, recursive=False)) #Get direct text
Output:
GOOD
If you have BS4 4.7.1 or above you can try that following code.
Try following css selector it will identify all the strong tag conatins :
under td tag and then get the parent tag td and then use contents[-1]
to get the value
Code :
html='''<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>'''
soup=BeautifulSoup(html,'html.parser')
for item in soup.select('td>strong:contains(":")'):
print(item.parent.contents[-1].strip())
Output :
GOOD
1:56:5
1:56:5
Alternatively You can use next_element
as well after finding the strong tag.first next_element
is the strong tag and second next_element
prints the value after strong tag
html='''<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>'''
soup=BeautifulSoup(html,'html.parser')
for item in soup.select('td>strong:contains(":")'):
print(item.next_element.next_element.strip())
Output :
GOOD
1:56:5
1:56:5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.