简体   繁体   English

在Python中打印转义的Unicode

[英]Printing escaped Unicode in Python

>>> s = 'auszuschließen'
>>> print(s.encode('ascii', errors='xmlcharrefreplace'))
b'auszuschließen'
>>> print(str(s.encode('ascii', errors='xmlcharrefreplace'), 'ascii'))
auszuschließen

Is there a prettier way to print any string without the b'' ? 有没有更漂亮的方法来打印没有b''任何字符串?

EDIT: 编辑:

I'm just trying to print escaped characters from Python, and my only gripe is that Python adds "b''" when i do that. 我只是想从Python打印转义字符,而我唯一的抱怨是,当我这样做时,Python会添加“ b”。

If i wanted to see the actual character in a dumb terminal like Windows 7's, then i get this: 如果我想在像Windows 7这样的哑终端中查看实际字符,那么我会得到:

Traceback (most recent call last):
  File "Mailgen.py", line 378, in <module>
    marked_copy = mark_markup(language_column, item_row)
  File "Mailgen.py", line 210, in mark_markup
    print("TP: %r" % "".join(to_print))
  File "c:\python32\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 29: character maps to <undefined>

To see ascii representation (like repr() on Python 2) for debugging: 要查看ascii表示形式(如Python 2上的repr() )以进行调试:

print(ascii('auszuschließen…'))
# -> 'auszuschlie\xdfen\u2026'

To print bytes: 要打印字节:

sys.stdout.buffer.write('auszuschließen…'.encode('ascii', 'xmlcharrefreplace'))
# -> auszuschlie&#223;en&#8230;
>>> s='auszuschließen…'
>>> s
'auszuschließen…'
>>> print(s)
auszuschließen…
>>> b=s.encode('ascii','xmlcharrefreplace')
>>> b
b'auszuschlie&#223;en&#8230;'
>>> print(b)
b'auszuschlie&#223;en&#8230;'
>>> b.decode()
'auszuschlie&#223;en&#8230;'
>>> print(b.decode())
auszuschlie&#223;en&#8230;

You start out with a Unicode string. 您从Unicode字符串开始。 Encoding it to ascii creates a bytes object with the characters you want. 将其编码为ascii会创建一个带有所需字符的bytes对象。 Python won't print it without converting it back into a string and the default conversion puts in the b and quotes. Python不会在不将其转换回字符串的情况下将其打印出来,并且默认的转换将其放在b和引号中。 Using decode explicitly converts it back to a string; 使用decode显式地将其转换回字符串。 the default encoding is utf-8 , and since your bytes only consist of ascii which is a subset of utf-8 it is guaranteed to work. 默认编码为utf-8 ,并且由于您的bytes仅包含ascii这是utf-8的子集),因此可以保证工作。

Not all terminals can handle more than some sort of 8-bit character set, that's true. 并非所有终端都可以处理多种8位字符集,这是事实。 But they won't handle that no matter what you do, really. 但实际上,无论您做什么,他们都不会处理。

Printing a Unicode string will, assuming that your OS set's up the terminal properly, result in the best result possible, which means that the characters that the terminal can not print will be replaced with some character, like a question mark or similar. 假设您的操作系统正确设置了终端,那么打印Unicode字符串将产生最佳结果,这意味着终端无法打印的字符将被替换为某些字符,例如问号或类似字符。 Doing that translation yourself will not really improve things. 自己进行翻译并不会真正改善。

Update: 更新:

Since you want to know what characters are in the string, you actually want to know the Unicode codes for them, or the XML equivalent in this case. 由于您想知道字符串中包含哪些字符,因此实际上您想知道它们的Unicode代码,或者在这种情况下为XML等效代码。 That's more inspecting than printing, and then usually the b'' part isn't a problem per se. 这比打印要检查的多,然后通常b''部分本身就不是问题。

But you can get rid of it easily and hackily like so: 但是您可以像这样轻松轻松地摆脱它:

print(repr(s.encode('ascii', errors='xmlcharrefreplace'))[2:-1])

Since you're using Python 3, you're afforded the ability to write print(s) to the console. 由于您使用的是Python 3,因此可以将print(s)写入控制台。

I can agree that, depending on the console, it may not be able to print properly, but I would imagine that most modern OSes since 2006 can handle Unicode strings without too much of an issue. 我可以同意,根据控制台的不同,它可能无法正确打印,但是我可以想象,自2006年以来,大多数现代OS都可以处理Unicode字符串,而不会出现太多问题。 I'd encourage you to give it a try and see if it works. 我鼓励您尝试一下,看看它是否有效。

Alternatively, you can enforce a coding by placing this before any lines in a file (similar to a shebang): 另外,您可以通过将编码放在文件中的任何行之前(类似于shebang)来实施编码:

# -*- coding: utf-8 -*-

This will force the interpreter to render it as UTF-8. 这将强制解释器将其呈现为UTF-8。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM