简体   繁体   English

BeautifulSoup“AttributeError: 'NoneType' 对象没有属性 'text'”

[英]BeautifulSoup "AttributeError: 'NoneType' object has no attribute 'text'"

I was web-scraping weather-searched Google with bs4, and Python can't find a <span> tag when there is one.我正在用 bs4 进行网页抓取天气搜索谷歌,而 Python 找不到<span>标签。 How can I solve this problem?我怎么解决这个问题?

I tried to find this <span> with the class and the id , but both failed.我试图用classid找到这个<span> ,但都失败了。

<div id="wob_dcp">
    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    
</div>

Above is the HTML code I was trying to scrape in the page :以上是我试图在页面中抓取的 HTML 代码:

response = requests.get('https://www.google.com/search?hl=ja&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text

But failed with this code, the error is:但是这个代码失败了,错误是:

Traceback (most recent call last):
  File "C:\Users\sungn_000\Desktop\weather.py", line 23, in <module>
    tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
AttributeError: 'NoneType' object has no attribute 'text'

Please solve this error.请解决这个错误。

This is because the weather section is rendered by the browser via JavaScript.这是因为天气部分是由浏览器通过 JavaScript 呈现的。 So when you use requests you only get the HTML content of the page which doesn't have what you need.因此,当您使用requests您只会获得没有您需要的页面的 HTML 内容。 You should use for example selenium (or requests-html ) if you want to parse page with elements rendered by web browser.如果您想使用由 Web 浏览器呈现的元素来解析页面,您应该使用例如selenium (或requests-html )。

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.google.com/search?hl=en&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
print(tomorrow_weather)

Output:输出:

pawel@pawel-XPS-15-9570:~$ python test.py
Clear with periodic clouds
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(a)
>>> a
'<div id="wob_dcp">\n    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    \n</div>'
>>> soup.find("span", id="wob_dc").text
'Clear with periodic clouds'

Try this out.试试这个。

It's not rendered via JavaScript as pawelbylina mentioned, and you don't have to use requests-html or selenium since everything needed is in the HTML, and it will slow down the scraping process a lot because of page rendering.不是通过pawelbylina提到的 JavaScript 呈现的,并且您不必使用requests-htmlselenium因为所需的一切都在 HTML 中,并且由于页面呈现,它会大大减慢抓取过程。

It could be because there's no user-agent specified thus Google blocks your request and you receiving a different HTML with some sort of error because the default requests user-agent is python-requests .这可能是因为没有指定user-agent因此 Google 阻止了您的请求,并且您收到了带有某种错误的不同 HTML,因为默认requests user-agent是 python-requests Google understands it and blocks a request since it's not the "real" user visit.谷歌理解它并阻止请求,因为它不是“真正的”用户访问。Checks what's your user-agent .检查您的user-agent是什么

Pass user-agent intro request headers:传递user-agent介绍请求标头:

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

requests.get("YOUR_URL", headers=headers)

You're looking for this, use select_one() to grab just one element:你正在寻找这个,使用select_one()只抓取一个元素:

soup.select_one('#wob_dc').text

Have a look at SelectorGadget Chrome extension to grab CSS selectors by clicking on the desired elements in your browser.查看SelectorGadget Chrome 扩展程序,通过单击浏览器中的所需元素来获取CSS选择器。


Code and full example that scrapes more in the online IDE : 在在线 IDE 中抓取更多内容的代码和完整示例

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "일본 桜川市真壁町古城 내일 날씨",
  "hl": "en",
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

location = soup.select_one('#wob_loc').text
weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text

print(f'Location: {location}\n'
      f'Weather condition: {weather_condition}\n'
      f'Temperature: {tempature}°F\n'
      f'Precipitation: {precipitation}\n'
      f'Humidity: {humidity}\n'
      f'Wind speed: {wind}\n'
      f'Current time: {current_time}\n')

------
'''
Location: Makabecho Furushiro, Sakuragawa, Ibaraki, Japan
Weather condition: Cloudy
Temperature: 79°F
Precipitation: 40%
Humidity: 81%
Wind speed: 7 mph
Current time: Saturday
'''

Alternatively, you can achieve the same thing by using the Direct Answer Box API from SerpApi.或者,您可以使用 SerpApi 的Direct Answer Box API来实现相同的目的。 It's a paid API with a free plan.这是一个带有免费计划的付费 API。

The difference in your case is that you don't have to think about how to bypass block from Google or figure out why data from certain elements aren't extracting as it should since it's already done for the end-user.您的情况的不同之处在于,您不必考虑如何绕过 Google 的阻止或弄清楚为什么某些元素的数据没有按预期提取,因为它已经为最终用户完成了。 The only thing that needs to be done is to iterate over structured JSON and grab the data you want.唯一需要做的就是迭代结构化 JSON 并获取您想要的数据。

Code to integrate:集成代码:

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "일본 桜川市真壁町古城 내일 날씨",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

print(f'{loc}\n{weather_date}\n{weather}\n{temp}°F\n{precipitation}\n{humidity}\n{wind}\n')

--------
'''
Makabecho Furushiro, Sakuragawa, Ibaraki, Japan
Saturday
Cloudy
79°F
40%
81%
7 mph
'''

Disclaimer, I work for SerpApi.免责声明,我为 SerpApi 工作。

I also had this problem.我也有这个问题。 You should not import like this你不应该像这样导入

from bs4 import BeautifulSoup

you should import like this你应该像这样导入

from bs4 import * 

This should work.这应该有效。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 BeautifulSoup AttributeError: &#39;NoneType&#39; 对象没有属性 &#39;text&#39; - BeautifulSoup AttributeError: 'NoneType' object has no attribute 'text' AttributeError:“NoneType”对象没有属性“文本”-Beautifulsoup - AttributeError: 'NoneType' object has no attribute 'text' - Beautifulsoup AttributeError: &#39;NoneType&#39; 对象没有属性 &#39;text&#39; BeautifulSoup - AttributeError: 'NoneType' object has no attribute 'text' BeautifulSoup BeautifulSoup: AttributeError: &#39;NoneType&#39; 对象没有属性 &#39;text&#39; - BeautifulSoup: AttributeError: 'NoneType' object has no attribute 'text' AttributeError: 'NoneType' object 在使用 BeautifulSoup 时没有属性 'text' - AttributeError: 'NoneType' object has no attribute 'text' when using BeautifulSoup AttributeError: 'NoneType' object 没有属性 'text' BeautifulSoup 解析 - AttributeError: 'NoneType' object has no attribute 'text' BeautifulSoup Parsing AttributeError: 'NoneType' object 没有属性 'text' - BeautifulSoup 到 CSV - AttributeError: 'NoneType' object has no attribute 'text' - BeautifulSoup to CSV AttributeError:&#39;NoneType&#39;对象没有属性&#39;text&#39;-Python,BeautifulSoup错误 - AttributeError: 'NoneType' object has no attribute 'text' - Python , BeautifulSoup Error AttributeError:&#39;NoneType&#39;对象没有属性&#39;text&#39;beautifulsoup python - AttributeError: 'NoneType' object has no attribute 'text' beautifulsoup python AttributeError:“ NoneType”对象在具有beautifulsoup的Python中没有属性*** - AttributeError: 'NoneType' object has no attribute *** in Python with beautifulsoup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM