简体   繁体   English

为什么这样打印字符串?

[英]Why does this string gets printed out like this?

i am playing around with string formatting. 我正在玩字符串格式。 And actually i trying to understand the following piece of code: 实际上,我试图理解以下代码:

mystring  = "\x80" * 50;
print mystring

output: 输出:

>>> 
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
>>>

the output is one string of Euro sings. 输出为一串欧元。 But why is this like that? 但是为什么会这样呢? This is no ASCII afaik, and the other question i am asking myself is why does it not print out the hex \\x80 ? 这不是ASCII afaik,我要问自己的另一个问题是为什么它不打印出十六进制\\ x80? Thanks in advance 提前致谢

As for the first question, \\x80 Is interpreted as . 对于第一个问题, \\x80解释为 A nice explanation can be found at Bytes in a unicode Python string . 一个很好的解释可以在Unicode Python字符串的Bytes中找到。

Edit: @Joran Besley is right, so let me rephrase it: 编辑: @Joran Besley是正确的,所以让我改一下:

u'\\x80' is equal to u'\€' . u'\\x80'等于u'\€'

In fact: 事实上:

unicode(u'\u0080')
>>> u'\x80'

and that's because Python < 3 prefers \\x as escaping representation of Unicode characters when possible, that is as long as the code point is less than 256. After that it uses the normal \\u\u003c/code> : 这是因为Python <3在可能的情况下更喜欢\\x作为转义的Unicode字符表示形式,只要代码点小于256。之后,它使用普通的\\u\u003c/code> :

unicode(u'\u2019')
>>> u'\u2019' # curved quotes in windows-1252

Where the character is then mapped depends on your terminal encoding. 然后将字符映射到的位置取决于您的终端编码。 As Joran said, you are probably using Windows-1252 or something close to it, where the euro symbol is the hex byte 0x80. 如Joran所说,您可能正在使用Windows-1252或与其类似的东西,其中欧元符号是十六进制字节0x80。 In iso-8898-15 for example the hex value is 0xa4: 例如在iso-8898-15中,十六进制值为0xa4:

"\xa4".decode("iso-8859-15") == "\x80".decode('windows-1252')
>>> True

If you are curious about your terminal encoding you can get it from sys 如果您对终端编码感到好奇,可以从sys获取它。

import sys
sys.stdin.encoding
>>> 'UTF-8' # my terminal
sys.stdout.encoding
>>> 'UTF-8' # same as above

I hope it makes up for my mistake. 我希望这可以弥补我的错误。

It depends on your terminal encoding ... in the windows terminal that encodes to a bunch of C-cedilla's 这取决于您的终端编码...在Windows终端中编码为一堆C-cedilla的

if you want to see the "\\x80" you can print repr(mystring) 如果要查看“ \\ x80”,可以print repr(mystring)

furthermore 0x80 = 128 which is the (not ascii,since ascii only technically goes to 0x7f) value of the euro 而且0x80 = 128,这是欧元的值(不是ascii,因为ascii仅从技术上讲达到0x7f)

specifically that is how "Windows-1252" encodes the euro sign (actually apparently thats how almost all the "Windows-125x" encode the euro sign) 具体来说就是“ Windows-1252”对欧元符号的编码方式(实际上这就是几乎所有“ Windows-125x”对欧元符号的编码方式)

this answer has lots more info 这个答案有更多信息

Hex representation of Euro Symbol € 欧元符号的十六进制表示€

furthermore you can convert it to unicode 此外,您可以将其转换为unicode

unicode_ch = "\x80".decode("Windows-1252")  #it is now decoded into unicode
print repr(unicode_ch) # \u20AC  the unicode equivalent of Euro
print unicode_ch #as long as your terminal can handle it

A little tinkering in IDLE produced this output. 稍微修改一下IDLE便产生了此输出。

>>> a = "\x80"
>>> a
'\x80'
>>> print a * 50
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
>>> print a
€
>>> 

The first thing that stands out is the '\\' character. 突出的第一件事是'\\'字符。 This character is used for escaping characters in strings. 此字符用于转义字符串中的字符。 You can learn about escaping characters in the link below. 您可以在下面的链接中了解转义字符的信息。

http://en.wikipedia.org/wiki/Escape_character http://en.wikipedia.org/wiki/Escape_character

Changing the string slightly tells us that escaping is occurring. 稍稍更改字符串就可以告诉我们正在发生转义。

>>> print '\x8'
ValueError: invalid \x escape

What I think is happening is the escape is causing the string to be looked up in the ASCII (or similar) table. 我认为发生的是转义导致在ASCII(或类似)表中查找字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM