简体   繁体   中英

BeautifulSoup AttributeError: 'NavigableString' object has no attribute 'find_all'

I'm trying to scrape data off of this URL and getting an error on this part of my scraper, the full chunk of code is below

if table.find_all('tr'):

Note I previously had built it without the if/elif/else logic and just find_all('tr') , but it produces the same error

Traceback (most recent call last):
  File "statbunker.py", line 217, in <module>
    if table.find_all('tr'):        
  File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/bs4/element.py", line 921, in __getattr__
    raise AttributeError(
AttributeError: 'NavigableString' object has no attribute 'find_all'
    link = 'https://rugby.statbunker.com/competitions/MatchDetails/World-Cup-2019/Japan-VS-Russia?comp_id=606&match_id=39737&date=20-Sep-2019'
    response = requests.get(link)
    html_loop = response.content
    soup_loop = BeautifulSoup(html_loop, 'html.parser')

    home_substititions = soup_loop.find('table', {'id': 'homeSubs'})
    for table in home_substititions.find('tbody'):
        if table.find_all('tr'):        
            for row in table.find_all('tr'):
                substitutionEvent = {}
                substitutionEvent['uuid'] = uuid.uuid1()
                substitutionEvent['playerIn'] = row.find_all('td')[2].text
                substitutionEvent['playerOut'] = row.find_all('td')[4].text
                if int(row.find_all('td')[0].text.split('`')[0]):
                    substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
                else:
                    substitutionEvent['subTime'] = ''
                homeSubstitutionEvents.append(substitutionEvent)
        elif table.find('tr'):
            for row in table.find('tr'):
                substitutionEvent = {}
                substitutionEvent['uuid'] = uuid.uuid1()
                substitutionEvent['playerIn'] = row.find_all('td')[2].text
                substitutionEvent['playerOut'] = row.find_all('td')[4].text
                if int(row.find_all('td')[0].text.split('`')[0]):
                    substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
                else:
                    substitutionEvent['subTime'] = ''
                homeSubstitutionEvents.append(substitutionEvent)
        else:
            continue

Here, using .find() will return only tags/navigablestring combinations, you have to use .find_all() to iterate:

home_substititions = soup_loop.find_all('table', {'id': 'homeSubs'})
    for table in home_substititions:
        # ....

did some little changes. the problem was that you put for table in yourstuff.find('sth') but you were finding just one element so no need for loop

import requests
from bs4 import BeautifulSoup

link = 'https://rugby.statbunker.com/competitions/MatchDetails/World-Cup-2019/Japan-VS-Russia?comp_id=606&match_id=39737&date=20-Sep-2019'
response = requests.get(link)
html_loop = response.content
soup_loop = BeautifulSoup(html_loop, 'html.parser')

home_substititions = soup_loop.find('table', {'id': 'homeSubs'})
table = home_substititions.find('tbody')
print(table)
if table.find_all('tr'):
    for row in table.find_all('tr'):
        substitutionEvent = {}
        substitutionEvent['uuid'] = uuid.uuid1()
        substitutionEvent['playerIn'] = row.find_all('td')[2].text
        substitutionEvent['playerOut'] = row.find_all('td')[4].text
        if int(row.find_all('td')[0].text.split('`')[0]):
            substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
        else:
            substitutionEvent['subTime'] = ''
        homeSubstitutionEvents.append(substitutionEvent)
elif table.find('tr'):
    for row in table.find('tr'):
        substitutionEvent = {}
        substitutionEvent['uuid'] = uuid.uuid1()
        substitutionEvent['playerIn'] = row.find_all('td')[2].text
        substitutionEvent['playerOut'] = row.find_all('td')[4].text
        if int(row.find_all('td')[0].text.split('`')[0]):
            substitutionEvent['subTime'] = game['gameTime'] + timedelta.Timedelta(minutes=int(row.find_all('td')[0].text.split('`')[0]))
        else:
            substitutionEvent['subTime'] = ''
        homeSubstitutionEvents.append(substitutionEvent)
else:
    pass

The problem is that home_substititions is not a BeautifulSoup class, not as soup_loop

type (home_substititions)
type (soup_loop)

The output will be

<class 'bs4.element.Tag'>
<class 'bs4.BeautifulSoup'>

For your code to work you will need to apply the find and find_all to the original soup

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM