简体   繁体   English

使用Python 2.7.10解码ASCII字符串

[英]Decode a ASCII string with Python 2.7.10

I'm fairly new to Python so I'm probably still making a lot of rookie mistakes. 我是Python的新手,所以我可能仍然会犯很多新手错误。

I was comparing two seemingly matching strings in Python, but it always returned false. 我正在比较Python中两个看似匹配的字符串,但始终返回false。 When I checked the representation of the object, I found that one of the strings was encoded in ASCII. 当我检查对象的表示形式时,我发现其中一个字符串是用ASCII编码的。

The representation of the first string returns: 第一个字符串的表示形式返回:

'\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y\x00_\x00o\x00p\x00t\x00i\x00o\x00n\x00s\x00_\x00s\x00e\x00a\x00r\x00c\x00h\x00_\x00b\x00u\x00t\x00t\x00o\x00n\x00"\x00 \x00=\x00 \x00"\x00L\x00a\x00u\x00n\x00c\x00h\x00 \x00t\x00h\x00e\x00 \x00s\x00e\x00a\x00r\x00c\x00h\x00"\x00;\x00'

While the representation of the second string returns: 当第二个字符串的表示形式返回时:

"itinerary_options_search_button" = "Launch the search";

I'm trying to figure out how to decode the first string to get the second string, so that my comparison of the two will match. 我试图弄清楚如何解码第一个字符串以获得第二个字符串,以便我对两者的比较能够匹配。 When I decode the first string with 当我用第一个字符串解码时

string.decode('ascii')

I get a unicode object. 我得到一个unicode对象。 I'm not sure what to do to get the decoded string. 我不确定该怎么做才能得到解码后的字符串。

Your first string seems to have some issues. 您的第一个字符串似乎有一些问题。 I'm not entirely sure why there is so many null characters ( \\x00 ), but either way, we could write a function to clean those up: 我不完全确定为什么会有这么多的空字符( \\x00 ),但是无论哪种方式,我们都可以编写一个函数来清除它们:

s_1 = '\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y\x00_\x00o\x00p\x00t\x00i\x00o\x00n\x00s\x00_\x00s\x00e\x00a\x00r\x00c\x00h\x00_\x00b\x00u\x00t\x00t\x00o\x00n\x00"\x00 \x00=\x00 \x00"\x00L\x00a\x00u\x00n\x00c\x00h\x00 \x00t\x00h\x00e\x00 \x00s\x00e\x00a\x00r\x00c\x00h\x00"\x00;\x00'
s_2 = '"itinerary_options_search_button" = "Launch the search";'

def null_cleaner(string):
    new_string = ""
    for char in string:
        if char != "\x00":
            new_string += char
    return new_string

print(null_cleaner(s_1) == null_cleaner(s_2))

A little bit less robust way of doing this is to simply splice the string to remove every other character (which happens to be \\x00 ): 健壮性稍差的方法是简单地拼接字符串以删除所有其他字符(恰好是\\x00 ):

s_1 = '\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y\x00_\x00o\x00p\x00t\x00i\x00o\x00n\x00s\x00_\x00s\x00e\x00a\x00r\x00c\x00h\x00_\x00b\x00u\x00t\x00t\x00o\x00n\x00"\x00 \x00=\x00 \x00"\x00L\x00a\x00u\x00n\x00c\x00h\x00 \x00t\x00h\x00e\x00 \x00s\x00e\x00a\x00r\x00c\x00h\x00"\x00;\x00'
s_2 = '"itinerary_options_search_button" = "Launch the search";'

print(s_1[1::2] == s_2)

... encoded in ASCII. ...以ASCII编码。

 [lots of NULs] 

Nope. 不。

>>> '\x00"\x00i\x00t\x00i\x00n\x00e\x00r\x00a\x00r\x00y'.decode('utf-16be')
u'"itinerary'

Of course, your data has an extra NUL that will break it. 当然,您的数据还有一个额外的NUL会破坏它。 Once you clean that up you should be able to decode it with no problem. 清理完之后,您应该可以毫无问题地对其进行解码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM