为什么在不同的操作系统和版本中打印这些值会给出不同的值？

Question

Why does printing these \\x values give different values in different OS and versions? 为什么在不同的操作系统和版本中打印这些\\ x值会给出不同的值？ Example: 例：

print("A"*20+"\xef\xbe\xad\xde")

This gives different output in Python3 and 2 and in different platforms 这在Python3和2和不同平台中提供了不同的输出

In Microsoft's Windows: 在Microsoft Windows中：

Python2: AAAAAAAAAAAAAAAAAAAAï¾Þ Python2： AAAAAAAAAAAAAAAAAAAAï¾Þ

Python3: AAAAAAAAAAAAAAAAAAAAï¾Þ Python3： AAAAAAAAAAAAAAAAAAAAï¾Þ

In Kali: 在卡利市：

Python2: AAAAAAAAAAAAAAAAAAAAﾭ Python2： AAAAAAAAAAAAAAAAAAAAﾭ

Python3: AAAAAAAAAAAAAAAAAAAAï¾Þ Python3： AAAAAAAAAAAAAAAAAAAAï¾Þ

UPDATE: What I want is the exact Python2 output but with Python3? 更新：我想要的是确切的Python2输出，但带有Python3？ I tried many things(encoding, decoding, byte conversion) but realised \\xde can't be decoded. 我尝试了很多事情（编码，解码，字节转换），但意识到\\ xde无法解码。 Any other way to achieve what I want? 还有其他方法可以实现我想要的吗？

Answer 1

It is a question of encoding. 这是编码问题。

In Latin1 or Windows 1252 encoding, you have: 在Latin1或Windows 1252编码中，您具有：

0xef -> ï (LATIN SMALL LETTER I WITH DIAERESIS)
0xbe -> ¾ (VULGAR FRACTION THREE QUARTERS)
0xad -> undefined and non printed in your examples
0xde -> Þ (LATIN CAPITAL LETTER THORN)

In utf-8 encoding, you have: 使用utf-8编码，您可以：

'\\xef\\xbe\\xad' -> u'\ﾭ' or 'ﾭ' (HALFWIDTH HANGUL LETTER RIEUL-SIOS) '\\xde' -> should raise an UnicodeDecodeError... '\\xef\\xbe\\xad' -> u'\ﾭ'或'ﾭ' （半角语言字母RIEUL-SIOS） '\\xde' ->应该引发UnicodeDecodeError ...

In Windows, Python2 or Python3 both use Windows 1252 code page (in your example). 在Windows中，Python2或Python3都使用Windows 1252代码页（在您的示例中）。 On Kali, Python2 sees the string as byte string and the terminal displays it in utf8, while Python3 assumes it already contains unicode character values and displays them directly. 在Kali上，Python2将字符串视为字节字符串，并且终端在utf8中显示该字符串，而Python3假定它已经包含unicode字符值并直接显示它们。

As in Latin1 (and in Windows 1252 for all characters outside 0x80-0x9f) the byte code is the unicode value, that is enough to explain your outputs. 像在Latin1中一样（在Windows 1252中，对于0x80-0x9f之外的所有字符），字节码是unicode值，足以解释您的输出。

What to learn: be explicit whether strings contains unicode or bytes and beware of encodings! 学习内容：明确字符串是否包含unicode或字节，并提防编码！

Answer 2

To get consistent behavior on both Python 2 and Python 3, you'll need to be explicit about your intended output. 为了在Python 2和Python 3上获得一致的行为，您需要明确说明预期的输出。 If you want, AAAAAAAAAAAAAAAAAAAAﾭ , then the \\xde is garbage; 如果需要AAAAAAAAAAAAAAAAAAAAﾭ ，则\\xde为垃圾； if you want AAAAAAAAAAAAAAAAAAAAï¾Þ , the \\xad is garbage. 如果您需要AAAAAAAAAAAAAAAAAAAAï¾Þ ，则\\xad是垃圾。 Either way, the "solution" to printing what you've got is to explicitly use bytes literals and decode them with the desired encoding, ignoring errors. 无论哪种方式，“解决方案”，以打印你有什么是明确使用bytes文字和decode他们所需的编码，忽略错误。 So to get AAAAAAAAAAAAAAAAAAAAﾭ (interpreting as UTF-8), you'd do: 因此，要获取AAAAAAAAAAAAAAAAAAAAﾭ （解释为UTF-8），您需要执行以下操作：

print((b"A"*20+b"\xef\xbe\xad\xde").decode('utf-8', errors='ignore'))

while to get AAAAAAAAAAAAAAAAAAAAï¾Þ you'd do: 而要获得AAAAAAAAAAAAAAAAAAAAï¾Þ您可以：

# cp1252 can be used instead of latin-1, depending on intent; they overlap in this case
print((b"A"*20+b"\xef\xbe\xad\xde").decode('latin-1', errors='ignore'))

Importantly, note the leading b on the literals; 重要的是，请注意文字上的前导b ； they're recognized and ignored on Python 2.7 (unless from __future__ unicode_literals is in effect, in which case they're needed just like in Python 3) and on Python 3, it makes the literals bytes literals (no special encoding assumed), rather than str literals, so you can decode in your desired encoding. 它们在Python 2.7上被识别和忽略（除非生效from __future__ unicode_literals ，在这种情况下，就像在Python 3中一样需要它们），在Python 3上，它们使文字量为bytes文字量（不假定特殊编码），而是而不是str文字，因此您可以使用所需的编码进行解码。 Either way, you end up with raw bytes, which can then be decoded in the preferred encoding, with errors ignored. 无论哪种方式，您最终都会得到原始字节，然后可以使用首选编码对其进行解码，而忽略错误。

Note that ignoring errors is usually going to be wrong; 注意忽略错误通常是错误的。 you're dropping data on the floor. 您将数据放在地板上。 0xDEADBEEF isn't guaranteed to produce a useful byte string in any given encoding, and if that's not your real data, you're probably still risking errors by wanting to silently ignore undecodeable data. 0xDEADBEEF不能保证以任何给定的编码方式生成有用的字节字符串，如果这不是您的真实数据，则可能会因为想要静默忽略不可解码的数据而仍然冒着错误的风险。

If you want to write the raw bytes and let whatever is consuming stdout interpret them however it wants, you need to drop below the print level, since print on Python 3 is purely str based. 如果您想编写原始字节，并让任何消耗stdout需要对其进行解释，那么您就必须降至print级别以下，因为Python 3上的print完全基于str 。 To write the raw bytes on Python 3, you'd use sys.stdout.buffer ( sys.stdout is text based, sys.stdout.buffer is the underlying buffered byte-oriented stream it wraps); 要在Python 3上写原始字节，可以使用sys.stdout.buffer （ sys.stdout是基于文本的， sys.stdout.buffer是它包装的底层缓冲的面向字节的流）； you'd also need to manually add the newline (if desired): 您还需要手动添加换行符（如果需要）：

sys.stdout.buffer.write(b"A"*20+b"\xef\xbe\xad\xde\n")

vs. on Python 2 where stdout isn't an encoding wrapper: 与stdout不是编码包装的Python 2相比：

sys.stdout.write(b"A"*20+b"\xef\xbe\xad\xde\n")

For portable code, you can get a "raw stdout" ahead of time and use that: 对于可移植代码，您可以提前获取“原始标准输出”并使用该代码：

# Put this at the top of your file so you don't have to constantly recheck/reacquire
# Gets sys.stdout.buffer if it exists, sys.stdout otherwise
bstdout = getattr(sys.stdout, 'buffer', sys.stdout)

# Works on both Py2 and Py3
bstdout.write(b"A"*20+b"\xef\xbe\xad\xde\n")

为什么在不同的操作系统和版本中打印这些值会给出不同的值？

问题描述

2 个解决方案

解决方案1
3 2018-12-18 15:01:36

解决方案2
2 已采纳 2018-12-18 16:54:56

为什么在不同的操作系统和版本中打印这些值会给出不同的值？

问题描述

2 个解决方案

解决方案1 3 2018-12-18 15:01:36

解决方案2 2 已采纳 2018-12-18 16:54:56

解决方案1
3 2018-12-18 15:01:36

解决方案2
2 已采纳 2018-12-18 16:54:56