简体   繁体   English

我无法使用 BeautifulSoup 访问跨度中的文本

[英]I can't access the text in the span using BeautifulSoup

Hi Everyone receive error msg when executing this code :大家好,执行此代码时收到错误消息:

from bs4 import BeautifulSoup
import requests
import html.parser
from requests_html import HTMLSession

session = HTMLSession()
response = session.get("https://www.imdb.com/chart/boxoffice/?ref_=nv_ch_cht")
soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find_all("tr")

for table in tables:
    movie_name = table.find("span", class_ = "secondaryInfo")
    print(movie_name)

output:输出:

movie_name = table.find("span", class_ = "secondaryInfo").text AttributeError: 'NoneType' object has no attribute 'text' movie_name = table.find("span", class_ = "secondaryInfo").text AttributeError: 'NoneType' 对象没有属性 'text'

Here is the desired solution so far.这是迄今为止所需的解决方案。

from bs4 import BeautifulSoup
import requests
import html.parser
from requests_html import HTMLSession

session = HTMLSession()
response = session.get("https://www.imdb.com/chart/boxoffice/?ref_=nv_ch_cht")
soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find("table",class_="chart full-width").find_all('tr')

for table in tables:
    t = table.select_one('td.titleColumn a')
    title = t.get_text(strip=True) if t else None

    p = table.select_one('td.ratingColumn')
    weekend = p.get_text(strip=True) if p else None

    q= table.select_one('td span.secondaryInfo')
    Gross = q.get_text(strip=True) if q else None

    k= table.select_one('td.weeksColumn')
    weeks = k.get_text(strip=True) if k else None

    print( 'title:'+str(title),'weekend:' +str(weekend),'Gross:'+str(Gross),'weeks:' +str(weeks))

Output:输出:

title:None weekend:None Gross:None weeks:None
title:Eternals weekend:$71.0M Gross:$71.0M weeks:1
title:Dune: Part One weekend:$7.6M Gross:$83.9M weeks:3
title:No Time to Die weekend:$6.2M Gross:$143.2M weeks:5
title:Venom: Let There Be Carnage weekend:$4.5M Gross:$197.0M weeks:6
title:Ron's Gone Wrong weekend:$3.6M Gross:$17.6M weeks:3  
title:The French Dispatch weekend:$2.6M Gross:$8.5M weeks:3
title:Halloween Kills weekend:$2.4M Gross:$89.7M weeks:4   
title:Spencer weekend:$2.1M Gross:$2.1M weeks:1
title:Antlers weekend:$2.0M Gross:$7.6M weeks:2
title:Last Night in Soho weekend:$1.8M Gross:$7.6M weeks:2

You selected for the first row which is the header and doesn't have that class as it doesn't list the prices.您为第一行选择了标题并且没有该类,因为它没有列出价格。 An alternative way is to simply exclude the header with a css selector of nth-child(n+2).另一种方法是使用 nth-child(n+2) 的 css 选择器简单地排除标题。 You also only need requests .您也只需要requests

from bs4 import BeautifulSoup
import requests

response = requests.get("https://www.imdb.com/chart/boxoffice/?ref_=nv_ch_cht")
soup = BeautifulSoup(response.content, 'html.parser')

for row in soup.select('tr:nth-child(n+2)'):
    movie_name = row.find("span", class_ = "secondaryInfo")
    print(movie_name.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM