[英]UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
[英]UnicodeEncodeError: 'ascii' codec can't encode character u'\xef'
我正在取消亞馬遜客戶評論。 它運行了一段時間,但在某一點之后,我得到了這個錯誤。
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "custreviewscrap.py", line 73, in <module>
strcomment = str(k.getText())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 293
7: ordinal not in range(128)
我試過跟蹤但不起作用......
1)strcomment = `str(k.getText()).encode('utf8')`
2)strcomment = str(k.getText())
strcomment = strcomment.encode('ascii', 'ignore')
非常感謝你!
for k in bsreview2.findAll('div',{"style":"margin-left:0.5em;"}):
#next part is clean the comments. sorry, this part is really dirty, I should have written a function
#the comment is surrounded by different stuff depends on what kind of review it is, video or pics or text
strcomment = str(k.getText())
patcomment = re.compile(r'(.*(\(Electronics\)|\(Health and Beauty\)))')
patcomment2 = re.compile(r'Help other customers find.*')
patcomment3 = re.compile(r'(Customer review from the Amazon Vine Program(.|\n)*Length::)|(\<\!(.|\n)*Length::)|(Customer review from the Amazon Vine Program\(What\'s this\?\)|(.*See all my reviews))')
cleancomment = re.sub(patcomment, '', strcomment)
cleancomment = re.sub(' ', '', cleancomment)
cleancomment = re.sub(patcomment2, '', cleancomment)
cleancomment = re.sub(',' ,'.', cleancomment)
cleancomment = re.sub(patcomment3, '', cleancomment)
strdate = str(k.nobr.getText())
cleandate = re.sub(',','',strdate)
print (k.span.getText())[0:1]+','+ cleandate +',' + cleancomment
csvtext = csvtext + (k.span.getText())[0:1]+','+ cleandate +',' + a +','+ cleancomment + '\n'
假設k.getText()
返回Unicode,則以下方法可行(其中s
是k.getText()
的結果):
>>> s = u'\xef'
>>> s.encode('utf-8')
'\xc3\xaf'
請注意,不再需要str()
調用。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.