简体   繁体   English

(Python) Beautifull soup 和编码 (utf-8, cp1252,ascii…)

[英](Python) Beautifull soup and encoding (utf-8, cp1252,ascii…)

Please help, I am so loosing nerves now.I am having this problems since I started learning Python.请帮忙,我现在很紧张。自从我开始学习 Python 以来,我就遇到了这个问题。 Always come to a same issue and no one online can give any valid answer总是遇到同样的问题,网上没有人能给出任何有效的答案

My code:我的代码:

from bs4 import BeautifulSoup
import requests

page = requests.get(
    'https://forecast.weather.gov/MapClick.php?lat=34.05349000000007&lon=-118.24531999999999#.XswiwMCxWUk')
soup = BeautifulSoup(page.content, 'html.parser')
week = soup.find(id='seven-day-forecast-body')
items = week.find_all(class_='forecast-tombstone')

print(items[0].find(class_='period-name').get_text())
print(items[0].find(class_='short-desc').get_text())
print(items[0].find(class_='temp temp-high').get_text())

period_names = [item.find(class_='period-name').get_text() for item in items]
short_descp = [item.find(class_='short-desc').get_text() for item in items]
temp = [item.find(class_='temp temp-high').get_text() for item in items]
print(period_names)
print(short_descp)
print(temp)

output: output:

[Running] python -u "c:\Users\dukasu\Documents\Python\test.py"
ThisAfternoon
Partly Sunny
High: 76 �F
Traceback (most recent call last):
  File "c:\Users\dukasu\Documents\Python\test.py", line 20, in <module>
    temp = [item.find(class_='temp temp-high').get_text() for item in items]
  File "c:\Users\dukasu\Documents\Python\test.py", line 20, in <listcomp>
    temp = [item.find(class_='temp temp-high').get_text() for item in items]
AttributeError: 'NoneType' object has no attribute 'get_text'

[Done] exited with code=1 in 0.69 seconds

Issue is due to a utf-8 encoding (my PC is at cp1252), but how to solve it terminally (I think problem is cos it cant operate with degree symbol).问题是由于 utf-8 编码(我的电脑在 cp1252),但如何最终解决它(我认为问题是因为它不能使用度数符号操作)。 There is a simple code in Python 2, but how to solve it in Python 3.xx. Python 2 中有一个简单的代码,但是如何在 Python 3.xx 中解决它。 How to set encoding at start of the code and forget about this issue.如何在代码开头设置编码并忘记这个问题。 anp please pardon my english, it is not my native language. anp 请原谅我的英语,它不是我的母语。

The error is coming from class name which is returning None, use only class_='temp Not class_='temp temp-high错误来自 class 名称返回无,仅使用class_='temp Not class_='temp temp-high

Example例子

temp = [item.find(class_='temp').get_text() for item in items]

Full code完整代码

from bs4 import BeautifulSoup
import requests

page = requests.get(
    'https://forecast.weather.gov/MapClick.php?lat=34.05349000000007&lon=-118.24531999999999#.XswiwMCxWUk')
soup = BeautifulSoup(page.content, 'html.parser')
week = soup.find(id='seven-day-forecast-body')
items = week.find_all(class_='forecast-tombstone')

print(items[0].find(class_='period-name').get_text())
print(items[0].find(class_='short-desc').get_text())
print(items[0].find(class_='temp temp-high').get_text())

period_names = [item.find(class_='period-name').get_text() for item in items]
short_descp = [item.find(class_='short-desc').get_text() for item in items]
temp = [item.find(class_='temp').get_text() for item in items]
print(period_names)
print(short_descp)
print(temp)

Prints Out打印出来

ThisAfternoon
Partly Sunny
High: 76 °F
['ThisAfternoon', 'Tonight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight', 'Monday', 'MondayNight', 'Tuesday']
['Partly Sunny', 'Patchy Fog', 'Patchy Fogthen MostlySunny', 'Patchy Fog', 'Patchy Fogthen PartlySunny', 'Patchy Fog', 'Patchy Fogthen MostlyCloudy', 'Mostly Cloudy', 'Partly Sunny']
['High: 76 °F', 'Low: 58 °F', 'High: 75 °F', 'Low: 59 °F', 'High: 80 °F', 'Low: 61 °F', 'High: 78 °F', 'Low: 61 °F', 'High: 77 °F']

This turned out to be a simple issue.原来这是一个简单的问题。

OK changed but here is a printout:好的,但这里是打印输出:

[Running] python -u "c:\Users\dukasu\Documents\Python\test.py"
ThisAfternoon
Partly Sunny
High: 76 �F
['ThisAfternoon', 'Tonight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight', 'Monday', 'MondayNight', 'Tuesday']
['Partly Sunny', 'Patchy Fog', 'Patchy Fogthen MostlySunny', 'Patchy Fog', 'Patchy Fogthen PartlySunny', 'Patchy Fog', 'Patchy Fogthen MostlyCloudy', 'Mostly Cloudy', 'Partly Sunny']
['High: 76 �F', 'Low: 58 �F', 'High: 75 �F', 'Low: 59 �F', 'High: 80 �F', 'Low: 61 �F', 'High: 78 �F', 'Low: 61 �F', 'High: 77 �F']

[Done] exited with code=0 in 0.619 seconds

How to print out degree symbol °?如何打印出度数符号°?

later I added后来我加了

import sys
sys.stdout.reconfigure(encoding='utf-8')

and print out:并打印出来:

High: 76 °F

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM