简体   繁体   English

u'somestring'和unicode('somestring'),python 2.7有什么区别

[英]What's the difference of u'somestring' and unicode('somestring'), python 2.7

I was getting 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)' when I was concatenating python strings with Django model.CharField like this: 当我用Django模型连接python字符串时,我得到了'UnicodeDecodeError:'ascii'编解码器无法解码位置14的字节0xc3:序数不在范围(128)中)

some_variable = unicode("jotain älähti") + self.some_charfield

After I switched to this: 在我切换到此之后:

some_variable = u"jotain älähti" + self.some_charfield

It didn't raise the error anymore. 它不再引发错误了。 What is the difference between u and the unicode function in python? u和python中的unicode函数有什么区别? I'm using python 2.7.5 and Django 1.7.1 Why does it not raise the error anymore? 我正在使用python 2.7.5和Django 1.7.1,为什么它不再引发错误了?

I'm not sure why it would have to decode in the first place. 我不确定为什么首先要解码。 Isn't decoding the process of forming human-readable letters and words from bytes? 难道不是要解码由字节构成人类可读的字母和单词的过程吗? I would understand decoding in this case if I needed to print it, but I never printed it. 如果需要打印,我会理解这种情况下的解码,但是我从未打印过。 Could the decoding relate to somehow to the concatenation process? 解码是否可能与级联过程有关? That in order for the program to concatenate, it needs to decode those both strings, and only after that it can make the concatenation, and then encode those to bytes? 为了使程序连接起来,它需要解码这两个字符串,然后才能进行连接,然后将它们编码为字节? I had the coding method input like this in the beginning of the file: # - - coding: utf-8 - - 我在文件开头输入了这样的编码方法:# --编码:utf-8--

u"ä" is a Unicode text -- a sequence of Unicode codepoints. u"ä"是Unicode文本-Unicode代码点的序列。 It may correspond to different byte sequences depending on a character encoding: 根据字符编码,它可能对应于不同的字节序列:

>>> u"ä".encode('utf-8')
'\xc3\xa4'
>>> u"ä".encode('cp1252')
'\xe4'
>>> u"ä".encode('utf-16le')
'\xe4\x00'

The encoding declaration # -*- coding: utf-8 -*- specifies your source code encoding . 编码声明# -*- coding: utf-8 -*-指定您的源代码encoding It just makes sure that b"ä" bytestring literal is interpreted as b'\\xc3\\xa4' byte sequence. 它只是确保将b"ä"字节字符串文字解释为b'\\xc3\\xa4'字节序列。

The encoding of your source code has nothing to do with encodings that are used at runtime. 源代码的编码与运行时使用的编码无关。

unicode(bytestring) is equivalent to bytestring.decode('ascii') here. unicode(bytestring)等效于bytestring.decode('ascii')

>>> b'\xc3\xa4'.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Non-ascii literals characters (such as b"ä" ) in a bytestring are deprecated in Python 3 and unicode type is called str there. 在Python 3中不推荐使用字节串中的非ASCII文字字符(例如b"ä" ),并且在其中将unicode类型称为str You could add from __future__ import unicode_literals at the top to interpret "ä" as a Unicode text on both Python 2 and 3. 您可以在顶部的from __future__ import unicode_literals中添加from __future__ import unicode_literals以将"ä"解释为Python 2和3上的Unicode文本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 KeyError:u&#39;somestring&#39;Json - KeyError: u'somestring' Json Python中的dir(string)和dir(someString)之间的区别? - Difference between dir(string) and dir(someString) in Python? Python3 中的“return someString 和 partString in someString” - “return someString and partString in someString” in Python3 Python unicode字符串文字::'\ u0391'和u'\ u0391'之间的区别是什么 - Python unicode string literals :: what's the difference between '\u0391' and u'\u0391' python中u''前缀和unicode()有什么区别? - What is the difference between u' ' prefix and unicode() in python? Python3:print(somestring,end ='\ r',flush = True)没有显示任何内容 - Python3: print(somestring,end='\r', flush=True) shows nothing python utf-8字符支持中以`\\ U`和`\\ u`开头的unicode字符有什么区别 - what is the difference between unicode characters starting from `\U` and `\u` in python utf-8 characters support 在python 2.7中表示为u&#39;xxxx而不是日语的Unicode文本 - Unicode text represented as u'xxxx instead of Japanese in Python 2.7 如何在 Python 2.7 中打印像“u{variable}”这样的 Unicode? - How to print Unicode like “u{variable}” in Python 2.7? 从python(2.7)的csv文件中删除所有unicode u&#39; - Removing all unicode u' from a csv file in python (2.7)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM