简体   繁体   English

如何在Python中比较这两个字符串?

[英]How to compare these two strings in Python?

I have a file with the following two strings: 我有一个包含以下两个字符串的文件:

25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0
25_\xD1\x80\xD0\xB0\xD1\x88\xD3\x99\xD0\xB0\xD1\x80\xD0\xB0

They both represent the same URL path, and therefore should be equal. 它们都代表相同的URL路径,因此应该相等。 I would like to apply the same "cleaning function" to both of them, obtaining the same string. 我想对它们两个都应用相同的“清理函数”,以获得相同的字符串。

After reading these strings from the file I have: 从文件中读取这些字符串后,我得到:

>> s0
'25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0'
>> s1
'2_\\xD1\\x80\\xD0\\xB0\\xD1\\x88\\xD3\\x99\\xD0\\xB0\\xD1\\x80\\xD0\\xB0'

(note the escaped backslashes in s1 ). (请注意s1转义的反斜杠)。 If I unquote s0 I get the following: 如果我取消对s0引用, s0得到以下信息:

>> import urllib
>> t0 = urllib.unquote(s0)
'25_\xd1\x80\xd0\xb0\xd1\x88\xd3\x99\xd0\xb0\xd1\x80\xd0\xb0'
>> print t0
25_рашәара

which is good. 很好 However, the only thing I know to do on s1 is the following: 但是,我知道对s1要做的唯一事情如下:

>> t1 = s1.decode("unicode_escape")
u'2_\xd1\x80\xd0\xb0\xd1\x88\xd3\x99\xd0\xb0\xd1\x80\xd0\xb0'
>> print t1
2_ÑаÑÓаÑ

which looks broken. 看起来坏了。 My question is: what clean(s) function could be written to normalize these two strings, so they either are both <type 'str'> or both <type 'unicode'> and the both print equally (and compare equally as well)? 我的问题是:可以编写什么样的clean(s)函数将这两个字符串归一化,所以它们要么都是<type 'str'>要么都是<type 'unicode'>并且它们均打印相同(并且同样比较) ?

Consider: 考虑:

>>> s0 = '25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0'
>>> s1 = '25_\\xD1\\x80\\xD0\\xB0\\xD1\\x88\\xD3\\x99\\xD0\\xB0\\xD1\\x80\\xD0\\xB0'
>>> import urllib
>>> t0 = urllib.unquote(s0).decode('utf8')
>>> t1 = s1.decode('string_escape').decode('utf8')
>>> print t0
25_рашәара
>>> print t1
25_рашәара
>>> t0 == t1
True
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM