简体   繁体   English

(unicode错误)'unicodeescape'编解码器无法解码字节 - 字符串带'\\ u'

[英](unicode error) 'unicodeescape' codec can't decode bytes - string with '\u'

Writing my code for Python 2.6, but with Python 3 in mind, I thought it was a good idea to put 编写我的Python 2.6代码,但考虑到Python 3,我认为这是一个好主意

from __future__ import unicode_literals

at the top of some modules. 在一些模块的顶部。 In other words, I am asking for troubles (to avoid them in the future), but I might be missing some important knowledge here. 换句话说,我要求麻烦(将来要避免它们),但我可能会遗漏一些重要的知识。 I want to be able to pass a string representing a filepath and instantiate an object as simple as 我希望能够传递表示文件路径的字符串并实例化一个简单的对象

MyObject('H:\\unittests')

In Python 2.6 , this works just fine, no need to use double backslashes or a raw string, even for a directory starting with '\\u..\u0026#39; , which is exactly what I want. Python 2.6中 ,这很好用,不需要使用双反斜杠或原始字符串,即使对于以'\\u..\u0026#39;开头的目录,这正是我想要的。 In the __init__ method I make sure all single \\ occurences are interpreted as ' \\\\ ', including those before special characters as in \\a , \\b , \\f , \\n , \\r , \\t and \\v (only \\x remains a problem). __init__方法中,我确保所有单个\\都被解释为' \\\\ ',包括特殊字符之前的那些,如\\a\\b\\f\\n\\r\\t\\v (仅限\\x仍然是一个问题)。 Also decoding the given string into unicode using (local) encoding works as expected. 还使用(本地)编码将给定字符串解码为unicode按预期工作。

Preparing for Python 3.x , simulating my actual problem in an editor (starting with a clean console in Python 2.6), the following happens: 准备Python 3.x ,在编辑器中模拟我的实际问题(从Python 2.6中的干净控制台开始),会发生以下情况:

>>> '\u'
'\\u'
>>> r'\u'
'\\u'

(OK until here: '\\u\u0026#39; is encoded by the console using the local encoding) (好的,直到这里: '\\u\u0026#39;由控制台使用本地编码进行编码)

>>> from __future__ import unicode_literals
>>> '\u'
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: end of string in escape sequence

In other words, the (unicode) string is not interpreted as unicode at all, nor does it get decoded automatically with the local encoding. 换句话说,(unicode)字符串根本不被解释为unicode,也不会使用本地编码自动解码。 Even so for a raw string: 对于原始字符串也是如此:

>>> r'\u'
SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX

same for u'\\u\u0026#39; : 同样适合u'\\u\u0026#39;

>>> u'\u'
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: end of string in escape sequence

Also, I would expect isinstance(str(''), unicode) to return True (which it does not), because importing unicode_literals should make all string-types unicode. 此外,我希望isinstance(str(''), unicode)返回True (它没有),因为导入unicode_literals应该使所有字符串类型unicode。 (edit:) Because in Python 3, all strings are sequences of Unicode characters , I would expect str('')) to return such a unicode-string, and type(str('')) to be both <type 'unicode'> , and <type 'str'> (because all strings are unicode) but also realise that <type 'unicode'> is not <type 'str'> . (编辑:)因为在Python 3中,所有字符串都是Unicode字符序列 ,我希望str(''))返回这样的unicode-string,并type(str(''))<type 'unicode'><type 'str'> (因为所有字符串都是unicode),但也意识到<type 'unicode'> is not <type 'str'> Confusion all around... 周围的困惑......

Questions 问题

  • how can I best pass strings containing ' \\u\u003c/code> '? 如何才能最好地传递包含' \\u\u003c/code> '的字符串? (without writing ' \\\\u\u003c/code> ') (不写' \\\\u\u003c/code> ')
  • does from __future__ import unicode_literals really implement all Python 3. related unicode changes so that I get a complete Python 3 string environment? from __future__ import unicode_literals真的实现了所有Python 3.相关的unicode更改,以便我获得完整的Python 3字符串环境?

edit: In Python 3, <type 'str'> is a Unicode object and <type 'unicode'> simply does not exist. 编辑:在Python 3中, <type 'str'>是一个Unicode对象<type 'unicode'>根本不存在。 In my case I want to write code for Python 2(.6) that will work in Python 3. But when I import unicode_literals , I cannot check if a string is of <type 'unicode'> because: 在我的情况下,我想编写适用于Python 3的Python 2(.6)代码。但是当我import unicode_literals ,我无法检查字符串是否为<type 'unicode'> import unicode_literals <type 'unicode'>因为:

  • I assume unicode is not part of the namespace 我假设unicode不是命名空间的一部分
  • if unicode is part of the namespace, a literal of <type 'str'> is still unicode when it is created in the same module 如果unicode是命名空间的一部分,那么当在同一模块中创建时, <type 'str'>的文字仍然是unicode
  • type(mystring) will always return <type 'str'> for unicode literals in Python 3 对于Python 3中的unicode文字, type(mystring)将始终返回<type 'str'>

My modules use to be encoded in 'utf-8' by a # coding: UTF-8 comment at the top, while my locale.getdefaultlocale()[1] returns 'cp1252'. 我的模块用'utf-8' # coding: UTF-8locale.getdefaultlocale()[1] # coding: UTF-8注释在顶部,而我的locale.getdefaultlocale()[1]返回'cp1252'。 So if I call MyObject('çça') from my console, it is encoded as 'cp1252' in Python 2, and in 'utf-8' when calling MyObject('çça') from the module. 因此,如果我从我的控制台调用MyObject('çça') ,它在Python 2中编码为'cp1252',在从模块调用MyObject('çça')时编码为'utf-8'。 In Python 3, it will not be encoded, but a unicode literal. 在Python 3中,它不会被编码,而是一个unicode文字。

edit: 编辑:

I gave up hope about being allowed to avoid using '\\' before a u (or x for that matter). 我放弃了希望被允许避免在u之前使用'\\'(或x为此)。 Also I understand the limitations of importing unicode_literals . 我也理解导入unicode_literals的局限性。 However, the many possible combinations of passing a string from a module to the console and vica versa with each different encoding, and on top of that importing unicode_literals or not and Python 2 vs Python 3, made me want to create an overview by actual testing. 但是,从模块到控制台传递字符串的许多可能组合,以及每种不同的编码反之亦然,除了导入unicode_literals和Python 2与Python 3之外,我想通过实际测试创建概述。 Hence the table below. 因此下表。 在此输入图像描述

In other words, type(str('')) does not return <type 'str'> in Python 3, but <class 'str'> , and all of Python 2 problems seem to be avoided. 换句话说, type(str(''))不会在Python 3中返回<type 'str'> ,而是<class 'str'> ,并且似乎可以避免所有Python 2问题。

AFAIK, all that from __future__ import unicode_literals does is to make all string literals of unicode type, instead of string type. AFAIK, from __future__ import unicode_literals所有内容都是使所有字符串文字都是unicode类型,而不是字符串类型。 That is: 那是:

>>> type('')
<type 'str'>
>>> from __future__ import unicode_literals
>>> type('')
<type 'unicode'>

But str and unicode are still different types, and they behave just like before. strunicode仍然是不同的类型,它们的行为就像以前一样。

>>> type(str(''))
<type 'str'>

Always, is of str type. 总是,是str类型。

About your r'\\u\u0026#39; issue, it is by design, as it is equivalent to ru'\\u\u0026#39; without unicode_literals . 关于你的r'\\u\u0026#39;问题,它是设计的,因为它相当于没有unicode_literals ru'\\ u'。 From the docs: 来自文档:

When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U' prefix, then the \\uXXXX and \\UXXXXXXXX escape sequences are processed while all other backslashes are left in the string. 当'r'或'R'前缀与'u'或'U'前缀一起使用时,处理\\ uXXXX和\\ UXXXXXXXX转义序列,而所有其他反斜杠都保留在字符串中。

Probably from the way the lexical analyzer worked in the python2 series. 可能来自词法分析器在python2系列中的工作方式。 In python3 it works as you (and I) would expect. 在python3中它可以像你(和我)所期望的那样工作。

You can type the backslash twice, and then the \\u\u003c/code> will not be interpreted, but you'll get two backslashes! 您可以键入反斜杠两次,然后\\u\u003c/code>将不会被解释,但您将获得两个反斜杠!

Backslashes can be escaped with a preceding backslash; 反斜杠可以使用前面的反斜杠进行转义; however, both remain in the string 但是,两者都留在字符串中

>>> ur'\\u'
u'\\\\u'

So IMHO, you have two simple options: 恕我直言,你有两个简单的选择:

  • Do not use raw strings, and escape your backslashes (compatible with python3): 不要使用原始字符串,并转义反斜杠(与python3兼容):

    'H:\\\\unittests'

  • Be too smart and take advantage of unicode codepoints ( not compatible with python3): 太聪明并利用unicode代码点(与python3 兼容):

    r'H:\\unittests'

For me this issue related to version not up to date, in this case numpy 对我来说这个问题与最新的版本有关,在这种情况下是numpy

To fix : 修理 :

conda install -f numpy

I try this on Python 3: 我在Python 3上尝试这个:

import os 进口口

os.path.abspath("yourPath") os.path.abspath则( “yourPath”)

and it's worked! 它的工作原理!

When you're writing string literals which contain backslashes, such as paths (on Windows) or regexes, use raw strings. 当您编写包含反斜杠的字符串文字时,例如路径(在Windows上)或正则表达式,请使用原始字符串。 That's what they're for. 这就是他们的目的。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX 转义错误 - python SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape error unicode错误)“ unicodeescape”编解码器无法解码位置9-10中的字节 - unicode error) 'unicodeescape' codec can't decode bytes in position 9-10 SyntaxError: (unicode error) &#39;unicodeescape&#39; codec can&#39;t decode bytes in position 2-3: truncated \\UXXXXXXXXX escape , on an image - SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape , on an image Python SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX 转义 - Python SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escap SyntaxError:(unicode错误)“ unicodeescape”编解码器无法解码位置0-1的字节:格式错误的\\ N字符转义 - SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: malformed \N character escape (unicode错误)&#39;unicodeescape&#39;编解码器无法解码位置16-17中的字节:截断\\ uXXXX转义 - (unicode error) 'unicodeescape' codec can't decode bytes in position 16-17: truncated \uXXXX escape Tkinter:SyntaxError:(unicode 错误)“unicodeescape”编解码器无法解码位置 2-3 中的字节:截断的 \\UXXXXXXXX 转义 - Tkinter: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape SyntaxError:(unicode 错误)“unicodeescape”编解码器无法解码 position 7-8 中的字节:截断 \UXXXXXXXX 转义 - SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 7-8: truncated \UXXXXXXXX escape SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position using Selenium Python - SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position using Selenium Python 地址错误:(unicode error)&#39;unicodeescape&#39;编解码器无法解码 - Address error: (unicode error) 'unicodeescape' codec can't decode
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM