如何在Python中比较这两个字符串？

Question

I have a file with the following two strings: 我有一个包含以下两个字符串的文件：

25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0
25_\xD1\x80\xD0\xB0\xD1\x88\xD3\x99\xD0\xB0\xD1\x80\xD0\xB0

They both represent the same URL path, and therefore should be equal. 它们都代表相同的URL路径，因此应该相等。 I would like to apply the same "cleaning function" to both of them, obtaining the same string. 我想对它们两个都应用相同的“清理函数”，以获得相同的字符串。

After reading these strings from the file I have: 从文件中读取这些字符串后，我得到：

>> s0
'25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0'
>> s1
'2_\\xD1\\x80\\xD0\\xB0\\xD1\\x88\\xD3\\x99\\xD0\\xB0\\xD1\\x80\\xD0\\xB0'

(note the escaped backslashes in s1 ). （请注意s1转义的反斜杠）。 If I unquote s0 I get the following: 如果我取消对s0引用， s0得到以下信息：

>> import urllib
>> t0 = urllib.unquote(s0)
'25_\xd1\x80\xd0\xb0\xd1\x88\xd3\x99\xd0\xb0\xd1\x80\xd0\xb0'
>> print t0
25_рашәара

which is good. 很好 However, the only thing I know to do on s1 is the following: 但是，我知道对s1要做的唯一事情如下：

>> t1 = s1.decode("unicode_escape")
u'2_\xd1\x80\xd0\xb0\xd1\x88\xd3\x99\xd0\xb0\xd1\x80\xd0\xb0'
>> print t1
2_ÑÐ°ÑÓÐ°Ñ

which looks broken. 看起来坏了。 My question is: what clean(s) function could be written to normalize these two strings, so they either are both <type 'str'> or both <type 'unicode'> and the both print equally (and compare equally as well)? 我的问题是：可以编写什么样的clean(s)函数将这两个字符串归一化，所以它们要么都是<type 'str'>要么都是<type 'unicode'>并且它们均打印相同（并且同样比较）？

Answer 1

Consider: 考虑：

>>> s0 = '25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0'
>>> s1 = '25_\\xD1\\x80\\xD0\\xB0\\xD1\\x88\\xD3\\x99\\xD0\\xB0\\xD1\\x80\\xD0\\xB0'
>>> import urllib
>>> t0 = urllib.unquote(s0).decode('utf8')
>>> t1 = s1.decode('string_escape').decode('utf8')
>>> print t0
25_рашәара
>>> print t1
25_рашәара
>>> t0 == t1
True
>>>

如何在Python中比较这两个字符串？

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-07-29 15:01:59

如何在Python中比较这两个字符串？

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-07-29 15:01:59

解决方案1
2 已采纳 2015-07-29 15:01:59