u'somestring'和unicode（'somestring'），python 2.7有什么区别

Question

I was getting 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)' when I was concatenating python strings with Django model.CharField like this: 当我用Django模型连接python字符串时，我得到了'UnicodeDecodeError：'ascii'编解码器无法解码位置14的字节0xc3：序数不在范围（128）中）

some_variable = unicode("jotain älähti") + self.some_charfield

After I switched to this: 在我切换到此之后：

some_variable = u"jotain älähti" + self.some_charfield

It didn't raise the error anymore. 它不再引发错误了。 What is the difference between u and the unicode function in python? u和python中的unicode函数有什么区别？ I'm using python 2.7.5 and Django 1.7.1 Why does it not raise the error anymore? 我正在使用python 2.7.5和Django 1.7.1，为什么它不再引发错误了？

I'm not sure why it would have to decode in the first place. 我不确定为什么首先要解码。 Isn't decoding the process of forming human-readable letters and words from bytes? 难道不是要解码由字节构成人类可读的字母和单词的过程吗？ I would understand decoding in this case if I needed to print it, but I never printed it. 如果需要打印，我会理解这种情况下的解码，但是我从未打印过。 Could the decoding relate to somehow to the concatenation process? 解码是否可能与级联过程有关？ That in order for the program to concatenate, it needs to decode those both strings, and only after that it can make the concatenation, and then encode those to bytes? 为了使程序连接起来，它需要解码这两个字符串，然后才能进行连接，然后将它们编码为字节？ I had the coding method input like this in the beginning of the file: # - - coding: utf-8 - - 我在文件开头输入了这样的编码方法：＃ --编码：utf-8--

Answer 1

u"ä" is a Unicode text -- a sequence of Unicode codepoints. u"ä"是Unicode文本-Unicode代码点的序列。 It may correspond to different byte sequences depending on a character encoding: 根据字符编码，它可能对应于不同的字节序列：

>>> u"ä".encode('utf-8')
'\xc3\xa4'
>>> u"ä".encode('cp1252')
'\xe4'
>>> u"ä".encode('utf-16le')
'\xe4\x00'

The encoding declaration # -*- coding: utf-8 -*- specifies your source code encoding . 编码声明# -*- coding: utf-8 -*-指定您的源代码encoding 。 It just makes sure that b"ä" bytestring literal is interpreted as b'\\xc3\\xa4' byte sequence. 它只是确保将b"ä"字节字符串文字解释为b'\\xc3\\xa4'字节序列。

The encoding of your source code has nothing to do with encodings that are used at runtime. 源代码的编码与运行时使用的编码无关。

unicode(bytestring) is equivalent to bytestring.decode('ascii') here. unicode(bytestring)等效于bytestring.decode('ascii') 。

>>> b'\xc3\xa4'.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Non-ascii literals characters (such as b"ä" ) in a bytestring are deprecated in Python 3 and unicode type is called str there. 在Python 3中不推荐使用字节串中的非ASCII文字字符（例如b"ä" ），并且在其中将unicode类型称为str 。 You could add from __future__ import unicode_literals at the top to interpret "ä" as a Unicode text on both Python 2 and 3. 您可以在顶部的from __future__ import unicode_literals中添加from __future__ import unicode_literals以将"ä"解释为Python 2和3上的Unicode文本。

u'somestring'和unicode（'somestring'），python 2.7有什么区别

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-04-16 09:11:21

u&#39;somestring&#39;和unicode（&#39;somestring&#39;），python 2.7有什么区别

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-04-16 09:11:21

u'somestring'和unicode（'somestring'），python 2.7有什么区别

解决方案1
0 已采纳 2015-04-16 09:11:21