字符串中显示的字符，如何使用 python 删除它们

Question

I have two string that look the same but aren't.我有两个看起来相同但不是的字符串。

$ more /tmp/1
'[FORM-15801]

$ more /tmp/2
'[FORM‑15801]

I see the characters here.我在这里看到了人物。

$ sed -n l /tmp/1
'[FORM-15801]$


$ sed -n l /tmp/2
'[FORM\342\200\22115801]$

In python, how can i convert the contents of /tmp/2 to look like /tmp/2?在 python 中，如何将 /tmp/2 的内容转换为 /tmp/2 的样子？

Answer 1

You could use the unidecode module.您可以使用unidecode模块。

From the PyPI page:从 PyPI 页面：

What Unidecode provides is a middle road: the function unidecode() takes Unicode data and tries to represent it in ASCII characters (ie, the universally displayable characters between 0x00 and 0x7F), where the compromises taken when mapping between two character sets are chosen to be near what a human with a US keyboard would choose. Unidecode 提供的是一条中间道路：function unidecode()采用 Unicode 数据并尝试用 ASCII 字符（即 0x00 和 0x7F 之间的普遍可显示字符）来表示它，其中选择了在两个字符集之间映射时采取的折衷方案接近使用美式键盘的人的选择。

字符串中显示的字符，如何使用 python 删除它们

问题描述

1 个解决方案

解决方案1
0 2020-04-26 18:36:08

字符串中显示的字符，如何使用 python 删除它们

问题描述

1 个解决方案

解决方案1 0 2020-04-26 18:36:08

解决方案1
0 2020-04-26 18:36:08