[英]Why does printing these values give different values in different OS and versions?
Why does printing these \\x values give different values in different OS and versions? 为什么在不同的操作系统和版本中打印这些\\ x值会给出不同的值? Example:
例:
print("A"*20+"\xef\xbe\xad\xde")
This gives different output in Python3 and 2 and in different platforms 这在Python3和2和不同平台中提供了不同的输出
In Microsoft's Windows: 在Microsoft Windows中:
Python2: AAAAAAAAAAAAAAAAAAAAï¾Þ
Python2:
AAAAAAAAAAAAAAAAAAAAï¾Þ
Python3: AAAAAAAAAAAAAAAAAAAAï¾Þ
Python3:
AAAAAAAAAAAAAAAAAAAAï¾Þ
In Kali: 在卡利市:
Python2: AAAAAAAAAAAAAAAAAAAAᆳ
Python2:
AAAAAAAAAAAAAAAAAAAAᆳ
Python3: AAAAAAAAAAAAAAAAAAAAï¾Þ
Python3:
AAAAAAAAAAAAAAAAAAAAï¾Þ
UPDATE: What I want is the exact Python2 output but with Python3? 更新:我想要的是确切的Python2输出,但带有Python3? I tried many things(encoding, decoding, byte conversion) but realised \\xde can't be decoded.
我尝试了很多事情(编码,解码,字节转换),但意识到\\ xde无法解码。 Any other way to achieve what I want?
还有其他方法可以实现我想要的吗?
It is a question of encoding. 这是编码问题。
In Latin1 or Windows 1252 encoding, you have: 在Latin1或Windows 1252编码中,您具有:
0xef -> ï (LATIN SMALL LETTER I WITH DIAERESIS)
0xbe -> ¾ (VULGAR FRACTION THREE QUARTERS)
0xad -> undefined and non printed in your examples
0xde -> Þ (LATIN CAPITAL LETTER THORN)
In utf-8 encoding, you have: 使用utf-8编码,您可以:
'\\xef\\xbe\\xad'
-> u'\ᆳ'
or 'ᆳ'
(HALFWIDTH HANGUL LETTER RIEUL-SIOS) '\\xde'
-> should raise an UnicodeDecodeError... '\\xef\\xbe\\xad'
-> u'\ᆳ'
或'ᆳ'
(半角语言字母RIEUL-SIOS) '\\xde'
->应该引发UnicodeDecodeError ...
In Windows, Python2 or Python3 both use Windows 1252 code page (in your example). 在Windows中,Python2或Python3都使用Windows 1252代码页(在您的示例中)。 On Kali, Python2 sees the string as byte string and the terminal displays it in utf8, while Python3 assumes it already contains unicode character values and displays them directly.
在Kali上,Python2将字符串视为字节字符串,并且终端在utf8中显示该字符串,而Python3假定它已经包含unicode字符值并直接显示它们。
As in Latin1 (and in Windows 1252 for all characters outside 0x80-0x9f) the byte code is the unicode value, that is enough to explain your outputs. 像在Latin1中一样(在Windows 1252中,对于0x80-0x9f之外的所有字符),字节码是unicode值,足以解释您的输出。
What to learn: be explicit whether strings contains unicode or bytes and beware of encodings! 学习内容:明确字符串是否包含unicode或字节,并提防编码!
To get consistent behavior on both Python 2 and Python 3, you'll need to be explicit about your intended output. 为了在Python 2和Python 3上获得一致的行为,您需要明确说明预期的输出。 If you want,
AAAAAAAAAAAAAAAAAAAAᆳ
, then the \\xde
is garbage; 如果需要
AAAAAAAAAAAAAAAAAAAAᆳ
,则\\xde
为垃圾; if you want AAAAAAAAAAAAAAAAAAAAï¾Þ
, the \\xad
is garbage. 如果您需要
AAAAAAAAAAAAAAAAAAAAï¾Þ
,则\\xad
是垃圾。 Either way, the "solution" to printing what you've got is to explicitly use bytes
literals and decode
them with the desired encoding, ignoring errors. 无论哪种方式,“解决方案”,以打印你有什么是明确使用
bytes
文字和decode
他们所需的编码,忽略错误。 So to get AAAAAAAAAAAAAAAAAAAAᆳ
(interpreting as UTF-8), you'd do: 因此,要获取
AAAAAAAAAAAAAAAAAAAAᆳ
(解释为UTF-8),您需要执行以下操作:
print((b"A"*20+b"\xef\xbe\xad\xde").decode('utf-8', errors='ignore'))
while to get AAAAAAAAAAAAAAAAAAAAï¾Þ
you'd do: 而要获得
AAAAAAAAAAAAAAAAAAAAï¾Þ
您可以:
# cp1252 can be used instead of latin-1, depending on intent; they overlap in this case
print((b"A"*20+b"\xef\xbe\xad\xde").decode('latin-1', errors='ignore'))
Importantly, note the leading b
on the literals; 重要的是,请注意文字上的前导
b
; they're recognized and ignored on Python 2.7 (unless from __future__ unicode_literals
is in effect, in which case they're needed just like in Python 3) and on Python 3, it makes the literals bytes
literals (no special encoding assumed), rather than str
literals, so you can decode in your desired encoding. 它们在Python 2.7上被识别和忽略(除非生效
from __future__ unicode_literals
,在这种情况下,就像在Python 3中一样需要它们),在Python 3上,它们使文字量为bytes
文字量(不假定特殊编码),而是而不是str
文字,因此您可以使用所需的编码进行解码。 Either way, you end up with raw bytes, which can then be decoded in the preferred encoding, with errors ignored. 无论哪种方式,您最终都会得到原始字节,然后可以使用首选编码对其进行解码,而忽略错误。
Note that ignoring errors is usually going to be wrong; 注意忽略错误通常是错误的。 you're dropping data on the floor.
您将数据放在地板上。 0xDEADBEEF isn't guaranteed to produce a useful byte string in any given encoding, and if that's not your real data, you're probably still risking errors by wanting to silently ignore undecodeable data.
0xDEADBEEF不能保证以任何给定的编码方式生成有用的字节字符串,如果这不是您的真实数据,则可能会因为想要静默忽略不可解码的数据而仍然冒着错误的风险。
If you want to write the raw bytes and let whatever is consuming stdout
interpret them however it wants, you need to drop below the print
level, since print
on Python 3 is purely str
based. 如果您想编写原始字节,并让任何消耗
stdout
需要对其进行解释,那么您就必须降至print
级别以下,因为Python 3上的print
完全基于str
。 To write the raw bytes on Python 3, you'd use sys.stdout.buffer
( sys.stdout
is text based, sys.stdout.buffer
is the underlying buffered byte-oriented stream it wraps); 要在Python 3上写原始字节,可以使用
sys.stdout.buffer
( sys.stdout
是基于文本的, sys.stdout.buffer
是它包装的底层缓冲的面向字节的流); you'd also need to manually add the newline (if desired): 您还需要手动添加换行符(如果需要):
sys.stdout.buffer.write(b"A"*20+b"\xef\xbe\xad\xde\n")
vs. on Python 2 where stdout
isn't an encoding wrapper: 与
stdout
不是编码包装的Python 2相比:
sys.stdout.write(b"A"*20+b"\xef\xbe\xad\xde\n")
For portable code, you can get a "raw stdout" ahead of time and use that: 对于可移植代码,您可以提前获取“原始标准输出”并使用该代码:
# Put this at the top of your file so you don't have to constantly recheck/reacquire
# Gets sys.stdout.buffer if it exists, sys.stdout otherwise
bstdout = getattr(sys.stdout, 'buffer', sys.stdout)
# Works on both Py2 and Py3
bstdout.write(b"A"*20+b"\xef\xbe\xad\xde\n")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.