简体   繁体   English

字符串中显示的字符,如何使用 python 删除它们

[英]Characters showing up in string, how to remove them with python

I have two string that look the same but aren't.我有两个看起来相同但不是的字符串。

$ more /tmp/1
'[FORM-15801]

$ more /tmp/2
'[FORM‑15801]

I see the characters here.我在这里看到了人物。

$ sed -n l /tmp/1
'[FORM-15801]$


$ sed -n l /tmp/2
'[FORM\342\200\22115801]$

In python, how can i convert the contents of /tmp/2 to look like /tmp/2?在 python 中,如何将 /tmp/2 的内容转换为 /tmp/2 的样子?

You could use the unidecode module.您可以使用unidecode模块。

From the PyPI page:从 PyPI 页面:

What Unidecode provides is a middle road: the function unidecode() takes Unicode data and tries to represent it in ASCII characters (ie, the universally displayable characters between 0x00 and 0x7F), where the compromises taken when mapping between two character sets are chosen to be near what a human with a US keyboard would choose. Unidecode 提供的是一条中间道路:function unidecode()采用 Unicode 数据并尝试用 ASCII 字符(即 0x00 和 0x7F 之间的普遍可显示字符)来表示它,其中选择了在两个字符集之间映射时采取的折衷方案接近使用美式键盘的人的选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM