Scrapy-如何在我的输出上转换unicode？

Question

I'm scraping a website and the titles have latin accents Ex: É, não, etc... 我正在抓捕一个网站，标题中有拉丁语重音，例如：É，não等。

This is my code: 这是我的代码：

    for tank in response.xpath('//html/body/div/div[4]/div/div/div/table[1]/tr/td/div'):
        item = VapeItem()
        item["title"] = tank.xpath("h3/a/text()").extract()

And the Json Output example: 和Json Output示例：

{"title": "HALO CAF\u00c9 MOCHA"},

Question is: How do I convert this so it shows up like this? 问题是：如何转换此格式，使其显示如下？

 {"title": "HALO CAFÉ MOCHA"},

I've tried encode("utf8") without success. 我尝试了encode（“ utf8”）失败。

Answer 1

You probably need to just print it? 您可能只需要打印它？

>>> print json.loads(txt)['title']

HALO CAFÉ MOCHA

Writing to a file works just as well, don't really see the problem here. 写入文件也一样，在这里看不到真正的问题。

>>> parsed_data = json.loads('{"title": "HALO CAF\u00c9 MOCHA"}')
>>> with open('foo.txt', 'w') as fin:
...   fin.write(parsed_data['title'].encode('utf-8'))
...

Answer 2

You've got it backwards. 你已经倒退了。 You need to decode as utf-8 (to convert from bytes-like str data to unicode ). 您需要decode为utf-8 （以将类似字节的str数据转换为unicode ）。

But that's not the real problem: json dump ensures ASCII compatible output by default (using escapes) to avoid problems with protocols that only handle ASCII (or can't rely on a specific encoding besides "ASCII compatible"). 但这不是真正的问题： json dump默认情况下（使用转义符）确保ASCII兼容输出，以避免仅处理ASCII（或“除ASCII兼容”以外不能依赖特定编码）的协议出现问题。

Pass ensure_ascii=False to the dump / dumps call to allow it to output non-ASCII. 通过ensure_ascii=False到dump / dumps调用它允许输出非ASCII。 Note the warnings on the docs; 注意文档上的警告； this can make some calls return str , others unicode , which may cause problems (on Py3, the issues aren't there; it's always str ). 这可能会使某些调用返回str ，而另一些则返回unicode ，这可能会导致问题（在Py3上，问题不存在；它始终是str ）。

Scrapy-如何在我的输出上转换unicode？

问题描述

2 个解决方案

解决方案1
1 2015-10-21 02:05:57

解决方案2
0 2015-10-21 02:11:01

Scrapy-如何在我的输出上转换unicode？

问题描述

2 个解决方案

解决方案1 1 2015-10-21 02:05:57

解决方案2 0 2015-10-21 02:11:01

解决方案1
1 2015-10-21 02:05:57

解决方案2
0 2015-10-21 02:11:01