[英]UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
I'm working on scraping Oregon Teacher License data for a project I'm doing. 我正在为正在执行的项目抓取俄勒冈州教师许可数据 。 Here's my code:
这是我的代码:
educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
print educ_employ
#[u'Jefferson Middle School\xa0\xa0(2013 - 2014)']
I want to strip the the "\\xa0". 我要剥离“ \\ xa0”。 This is my code:
这是我的代码:
educ_employ = ([s.strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
>>>
I didn't get an error with the last one but I also didn't get an output. 我没有遇到最后一个错误,但是也没有得到输出。 I'm using Python 2.7.
我正在使用Python 2.7。 Does anyone know how to fix this?
有谁知道如何解决这一问题?
You are mixing up unicode
objects and str
objects. 您正在混合
unicode
对象和str
对象。 educ_employ
is a unicode
, but '\\xa0'
is a str
. educ_employ
是unicode
,但是'\\xa0'
是str
。
Additionally, .strip()
only removes characters from the beginning and end of the string, not the middle. 此外,
.strip()
仅从字符串的开头和结尾删除字符,而不从中间删除字符。 Try .replace()
instead. 尝试使用
.replace()
代替。
Try: 尝试:
educ_employ = [u'Jefferson Middle School\xa0\xa0(2013 - 2014)']
educ_employ = [s.replace(u'\xa0', u'') for s in educ_employ]
print educ_employ
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.