简体   繁体   English

Scrapy:ASCII编解码器无法编码字符

[英]Scrapy: ascii' codec can't encode characters

I am having problem on running my crawler 我在运行我的搜寻器时遇到问题

UnicodeEncodeError: 'ascii' codec can't encode characters in position

I am using this code 我正在使用此代码

author = str(info.css(".author::text").extract_first())

but still I am having that error any idea how can solve it? 但我仍然遇到该错误,任何想法如何解决? Thank you! 谢谢!

Here's the error 这是错误

Traceback (most recent call last):
 File "/usr/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 
 102, in iter_errback
yield next(it)
  File "/usr/local/lib/python2.7/site-packages/sh_scrapy/middlewares.py", line 30, in process_spider_output
for x in result:
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
 File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
  return (r for r in result or () if _filter(r))
 File "/app/__main__.egg/teslamotorsclub_spider/spiders/teslamotorsclub.py", line 40, in parse
author = str(info.css(".author::text").extract_first())
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Try: 尝试:

author = info.css(".author::text").extract_first().decode('utf-8')

The reason for this is extract_first returns a raw bytes object. 原因是extract_first返回一个原始字节对象。 To convert this to a string, python makes no guesses as to how it's encoded, therefore, you need to make that explicit. 要将其转换为字符串,python不会猜测其编码方式,因此,您需要使其明确。 Utf-8 will handle just about anything you throw at it. Utf-8几乎可以处理您扔给它的任何东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM