I'm trying to scrape data off of this URL and getting an error on this part of my scraper, the full chunk of code is below
if table.find_all('tr'):
Note I previously had built it without the if/elif/else
logic and just find_all('tr')
, but it produces the same error
Traceback (most recent call last):
File "statbunker.py", line 217, in <module>
if table.find_all('tr'):
File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/bs4/element.py", line 921, in __getattr__
raise AttributeError(
AttributeError: 'NavigableString' object has no attribute 'find_all'
link = 'https://rugby.statbunker.com/competitions/MatchDetails/World-Cup-2019/Japan-VS-Russia?comp_id=606&match_id=39737&date=20-Sep-2019'
response = requests.get(link)
html_loop = response.content
soup_loop = BeautifulSoup(html_loop, 'html.parser')
home_substititions = soup_loop.find('table', {'id': 'homeSubs'})
for table in home_substititions.find('tbody'):
if table.find_all('tr'):
for row in table.find_all('tr'):
substitutionEvent = {}
substitutionEvent['uuid'] = uuid.uuid1()
substitutionEvent['playerIn'] = row.find_all('td')[2].text
substitutionEvent['playerOut'] = row.find_all('td')[4].text
if int(row.find_all('td')[0].text.split('`')[0]):
substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
else:
substitutionEvent['subTime'] = ''
homeSubstitutionEvents.append(substitutionEvent)
elif table.find('tr'):
for row in table.find('tr'):
substitutionEvent = {}
substitutionEvent['uuid'] = uuid.uuid1()
substitutionEvent['playerIn'] = row.find_all('td')[2].text
substitutionEvent['playerOut'] = row.find_all('td')[4].text
if int(row.find_all('td')[0].text.split('`')[0]):
substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
else:
substitutionEvent['subTime'] = ''
homeSubstitutionEvents.append(substitutionEvent)
else:
continue
Here, using .find()
will return only tags/navigablestring combinations, you have to use .find_all()
to iterate:
home_substititions = soup_loop.find_all('table', {'id': 'homeSubs'})
for table in home_substititions:
# ....
did some little changes. the problem was that you put for table in yourstuff.find('sth')
but you were finding just one element so no need for loop
import requests
from bs4 import BeautifulSoup
link = 'https://rugby.statbunker.com/competitions/MatchDetails/World-Cup-2019/Japan-VS-Russia?comp_id=606&match_id=39737&date=20-Sep-2019'
response = requests.get(link)
html_loop = response.content
soup_loop = BeautifulSoup(html_loop, 'html.parser')
home_substititions = soup_loop.find('table', {'id': 'homeSubs'})
table = home_substititions.find('tbody')
print(table)
if table.find_all('tr'):
for row in table.find_all('tr'):
substitutionEvent = {}
substitutionEvent['uuid'] = uuid.uuid1()
substitutionEvent['playerIn'] = row.find_all('td')[2].text
substitutionEvent['playerOut'] = row.find_all('td')[4].text
if int(row.find_all('td')[0].text.split('`')[0]):
substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
else:
substitutionEvent['subTime'] = ''
homeSubstitutionEvents.append(substitutionEvent)
elif table.find('tr'):
for row in table.find('tr'):
substitutionEvent = {}
substitutionEvent['uuid'] = uuid.uuid1()
substitutionEvent['playerIn'] = row.find_all('td')[2].text
substitutionEvent['playerOut'] = row.find_all('td')[4].text
if int(row.find_all('td')[0].text.split('`')[0]):
substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
else:
substitutionEvent['subTime'] = ''
homeSubstitutionEvents.append(substitutionEvent)
else:
pass
The problem is that home_substititions is not a BeautifulSoup class, not as soup_loop
type (home_substititions)
type (soup_loop)
The output will be
<class 'bs4.element.Tag'>
<class 'bs4.BeautifulSoup'>
For your code to work you will need to apply the find and find_all to the original soup
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.