简体   繁体   中英

I can't access the text in the span using BeautifulSoup

Hi Everyone receive error msg when executing this code :

from bs4 import BeautifulSoup
import requests
import html.parser
from requests_html import HTMLSession

session = HTMLSession()
response = session.get("https://www.imdb.com/chart/boxoffice/?ref_=nv_ch_cht")
soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find_all("tr")

for table in tables:
    movie_name = table.find("span", class_ = "secondaryInfo")
    print(movie_name)

output:

movie_name = table.find("span", class_ = "secondaryInfo").text AttributeError: 'NoneType' object has no attribute 'text'

Here is the desired solution so far.

from bs4 import BeautifulSoup
import requests
import html.parser
from requests_html import HTMLSession

session = HTMLSession()
response = session.get("https://www.imdb.com/chart/boxoffice/?ref_=nv_ch_cht")
soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find("table",class_="chart full-width").find_all('tr')

for table in tables:
    t = table.select_one('td.titleColumn a')
    title = t.get_text(strip=True) if t else None

    p = table.select_one('td.ratingColumn')
    weekend = p.get_text(strip=True) if p else None

    q= table.select_one('td span.secondaryInfo')
    Gross = q.get_text(strip=True) if q else None

    k= table.select_one('td.weeksColumn')
    weeks = k.get_text(strip=True) if k else None

    print( 'title:'+str(title),'weekend:' +str(weekend),'Gross:'+str(Gross),'weeks:' +str(weeks))

Output:

title:None weekend:None Gross:None weeks:None
title:Eternals weekend:$71.0M Gross:$71.0M weeks:1
title:Dune: Part One weekend:$7.6M Gross:$83.9M weeks:3
title:No Time to Die weekend:$6.2M Gross:$143.2M weeks:5
title:Venom: Let There Be Carnage weekend:$4.5M Gross:$197.0M weeks:6
title:Ron's Gone Wrong weekend:$3.6M Gross:$17.6M weeks:3  
title:The French Dispatch weekend:$2.6M Gross:$8.5M weeks:3
title:Halloween Kills weekend:$2.4M Gross:$89.7M weeks:4   
title:Spencer weekend:$2.1M Gross:$2.1M weeks:1
title:Antlers weekend:$2.0M Gross:$7.6M weeks:2
title:Last Night in Soho weekend:$1.8M Gross:$7.6M weeks:2

You selected for the first row which is the header and doesn't have that class as it doesn't list the prices. An alternative way is to simply exclude the header with a css selector of nth-child(n+2). You also only need requests .

from bs4 import BeautifulSoup
import requests

response = requests.get("https://www.imdb.com/chart/boxoffice/?ref_=nv_ch_cht")
soup = BeautifulSoup(response.content, 'html.parser')

for row in soup.select('tr:nth-child(n+2)'):
    movie_name = row.find("span", class_ = "secondaryInfo")
    print(movie_name.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM