简体   繁体   English

获取没有前缀为'u'的python字符串文字的警告

[英]Get warning for python string literals not prefixed with 'u'

To follow best practices for Unicode in python , you should prefix all string literals of characters with 'u'. 在python中遵循Unicode的最佳实践 ,您应该在字符的所有字符串文字前加上'u'。 Is there any tool available (preferably PyDev compatible) that warns if you forget it? 有没有可用的工具(最好是PyDev兼容的),如果你忘了它就会​​发出警告?

you should prefix all string literals with 'u' 你应该在所有字符串文字前加上'u'

No, not really. 不,不是真的。

You should prefix literals for strings of characters with u . 您应该为u字符串添加字符串前缀。 But not all strings are strings of characters. 但并非所有字符串都是字符串。 When you are talking to components that are byte based, like network services, or binary files, you need to be using byte strings. 当您与基于字节的组件(如网络服务或二进制文件)进行通信时,您需要使用字节字符串。

eg. 例如。 Want to try to write a Unicode string into a PNG file? 想尝试将Unicode字符串写入PNG文件? Not sensible. 不明智。 Want to base64-decode the string Y2Fm6Q== ? 想要base64解码字符串Y2Fm6Q== You can't reasonably use a Unicode string here, base64 is explicitly bytes. 你不能在这里合理地使用Unicode字符串,base64是显式字节。

Sure, Python will often let you get away with passing a unicode string where a byte string is expected, but only by automatically encoding to ASCII. 当然,Python通常会让你通过传递一个字符串的unicode字符串,但只能通过自动编码为ASCII。 If the string contains non-ASCII characters you going to get UnicodeError just as surely as if you'd used bytes where unicode was expected. 如果字符串包含非ASCII字符,您将获得UnicodeError就像您使用了预期使用unicode的字节一样。 “Unicode is right, bytes are wrong” is a damaging myth. “Unicode是对的,字节错误”是一个具有破坏性的神话。 Manipulation for both kinds of strings are required. 需要对两种字符串进行操作。

If you are concerned about the transition to Python 3, you should certainly mark up your character strings as u'' , but you should then also mark up your explicitly-bytes strings as b'' . 如果您担心转换到Python 3,您当然应该将字符串标记为u'' ,但是您还应该将明确字节字符串标记为b'' Strings where it doesn't matter you can leave as '' and let them get converted from byte strings to unicode strings on Python 3. There are lots of cases where Python 2 used to use bytes and Python 3 uses Unicode where it is appropriate to do this. 无关紧要的字符串可以保留为''并让它们在Python 3上从字节字符串转换为unicode字符串。在很多情况下,Python 2使用字节而Python 3使用Unicode,它适用于做这个。 But there are still plenty of cases where you do really need to be talking bytes, and having that converted to Python 3 as unicode will cause problems. 但是仍然有很多情况下你确实需要讨论字节,并且将其转换为Python 3作为unicode会导致问题。

(The only problem with this is that b'' syntax requires Python 2.6 or later, so using it will make you incompatible with earlier versions.) (唯一的问题是b''语法需要Python 2.6或更高版本,因此使用它会使您与早期版本不兼容。)

You might want to write a such a warnging-generator tool by parsing Python source code using the parser or the dis built-in modules. 您可能希望通过使用parserdis内置模块解析Python源代码来编写这样的warnging-generator工具。 You may also consider adding such a feature to pylint . 您也可以考虑在pylint中添加这样的功能。

KennyTM's comment should be posted as an answer: KennyTM的评论应该作为答案发布:

from __future__ import unicode_literals

This future declaration can be used in Python 2.6 and 2.7 and enables Python 3's string syntax so that unprefixed string literals are Unicode strings and byte arrays require a b prefix. 这个未来声明可以在Python 2.6和2.7中使用,并启用Python 3的字符串语法,以便无前缀的字符串文字是Unicode字符串,字节数组需要b前缀。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM