如何将unicode文本转换为普通文本

Question

I am learning Beautiful Soup in Python. 我正在学习Python中的美丽汤。

I am trying to parse a simple webpage with list of books. 我试图解析一个包含书籍列表的简单网页。

Eg 例如

<a href="https://www.nostarch.com/carhacking">The Car Hacker’s Handbook</a>

I use the below code. 我使用下面的代码。

import requests, bs4
res = requests.get('http://nostarch.com')
res.raise_for_status()
nSoup = bs4.BeautifulSoup(res.text,"html.parser")
elems = nSoup.select('.product-body a')

#elems[0] gives
<a href="https://www.nostarch.com/carhacking">The Car Hacker\u2019s Handbook</a>

And 和

#elems[0].getText() gives
u'The Car Hacker\u2019s Handbook'

But I want the proper text which is given by, 但我想要的是正确的文字，

s = elems[0].getText()
print s
>>>The Car Hacker’s Handbook

How to modify my code in order to give "The Car Hacker's Handbook" output instead of "u'The Car Hacker\’s Handbook'" ? 如何修改我的代码以便给出“The Car Hacker's Handbook”输出而不是“u'The Car Hacker \\ u2019s Handbook”？

Kindly help. 请帮助。

Answer 1

Have you tried using the encode method? 你尝试过使用编码方法吗？

elems[0].getText().encode('utf-8')

More info about unicode and python can be found in https://docs.python.org/2/howto/unicode.html 有关unicode和python的更多信息，请访问https://docs.python.org/2/howto/unicode.html

Moreover, to discover if your string is really utf-8 encoded you can use chardet and run the following command: 此外，要发现您的字符串是否真的是utf-8编码，您可以使用chardet并运行以下命令：

>>> import chardet
>>> chardet.detect(elems[0].getText()) 
{'confidence': 0.5, 'encoding': 'utf-8'}

Answer 2

you can try 你可以试试

import unicodedata

def normText(unicodeText):
return unicodedata.normalize('NFKD', unicodeText).encode('ascii','ignore')

This will convert unicodetext to plain text and you can write to a file. 这会将unicodetext转换为纯文本，您可以写入文件。

如何将unicode文本转换为普通文本

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-04-14 13:07:55

解决方案2
-2 2016-04-14 14:29:11

如何将unicode文本转换为普通文本

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-04-14 13:07:55

解决方案2 -2 2016-04-14 14:29:11

解决方案1
3 已采纳 2016-04-14 13:07:55

解决方案2
-2 2016-04-14 14:29:11