简体   繁体   English

Python 字符串中的 u 前缀是什么?

[英]What's the u prefix in a Python string?

Like in:像:

u'Hello'

My guess is that it indicates "Unicode", is that correct?我的猜测是它表示“Unicode”,对吗?

If so, since when has it been available?如果是这样,它是从什么时候开始提供的?

You're right, see 3.1.3.你说得对,见3.1.3。 Unicode Strings . Unicode 字符串

It's been the syntax since Python 2.0.这是自 Python 2.0 以来的语法。

Python 3 made them redundant, as the default string type is Unicode. Python 3 使它们变得多余,因为默认的字符串类型是 Unicode。 Versions 3.0 through 3.2 removed them, but they were re-added in 3.3+ for compatibility with Python 2 to aide the 2 to 3 transition. 3.0 到 3.2 版本删除了它们,但它们在 3.3+重新添加以与 Python 2 兼容,以帮助 2 到 3 过渡。

The u in u'Some String' means that your string is a Unicode string . u'Some String'的 u 表示您的字符串是Unicode string

Q: I'm in a terrible, awful hurry and I landed here from Google Search.问:我非常着急,我是从 Google 搜索来到这里的。 I'm trying to write this data to a file, I'm getting an error, and I need the dead simplest, probably flawed, solution this second.我正在尝试将此数据写入文件,但出现错误,我需要最简单的,可能有缺陷的解决方案。

A: You should really read Joel's Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) essay on character sets.答:您真的应该阅读 Joel 撰写的关于字符集的关于 Unicode 和字符集(没有借口!)的文章,每个软件开发人员绝对必须知道绝对最小值

Q: sry no time code pls问:sry没有时间码请

A: Fine.答:好的。 try str('Some String') or 'Some String'.encode('ascii', 'ignore') .尝试str('Some String')'Some String'.encode('ascii', 'ignore') But you should really read some of the answers and discussion on Converting a Unicode string and this excellent, excellent, primer on character encoding.但是你真的应该阅读一些关于Converting a Unicode string的答案和讨论,以及这本优秀的、优秀的、关于字符编码的入门书。

My guess is that it indicates "Unicode", is it correct?我的猜测是它表示“Unicode”,对吗?

Yes.是的。

If so, since when is it available?如果是这样,从什么时候开始可用?

Python 2.x. Python 2.x。

In Python 3.x the strings use Unicode by default and there's no need for the u prefix.在 Python 3.x 中,字符串默认使用 Unicode,不需要u前缀。 Note: in Python 3.0-3.2, the u is a syntax error.注意:在 Python 3.0-3.2 中,u 是一个语法错误。 In Python 3.3+ it's legal again to make it easier to write 2/3 compatible apps.在 Python 3.3+ 中,让编写 2/3 兼容应用程序变得更容易再次合法。

I came here because I had funny-char-syndrome on my requests output.我来这里是因为我的requests输出中有滑稽字符综合症。 I thought response.text would give me a properly decoded string, but in the output I found funny double-chars where German umlauts should have been.我以为response.text会给我一个正确解码的字符串,但在输出中我发现有趣的双字符应该是德语变音。

Turns out response.encoding was empty somehow and so response did not know how to properly decode the content and just treated it as ASCII (I guess).结果response.encoding不知何故是空的,所以response不知道如何正确解码内容,只是将其视为 ASCII (我猜)。

My solution was to get the raw bytes with 'response.content' and manually apply decode('utf_8') to it.我的解决方案是获取带有 'response.content' 的原始字节,并对其手动应用decode('utf_8') The result was schöne Umlaute.结果是schöne Umlaute。

The correctly decoded正确解码

für毛皮

vs. the improperly decoded与不正确解码

fĂźr fĂźr

The following should help: 以下内容应有所帮助:

http://docs.python.org/library/functions.html#unicode http://docs.python.org/library/functions.html#unicode

http://www.amk.ca/python/howto/unicode (skip down to "Python's Unicode Support" if you're already familiar with Unicode principles) http://www.amk.ca/python/howto/unicode (如果您已经熟悉Unicode原理, 跳至“ Python的Unicode支持”)

All strings meant for humans should use u"".所有用于人类的字符串都应该使用 u""。

I found that the following mindset helps a lot when dealing with Python strings: All Python manifest strings should use the u"" syntax.我发现以下思维方式在处理 Python 字符串时很有帮助:所有Python 清单字符串都应该使用u""语法。 The "" syntax is for byte arrays, only. ""语法仅适用于字节数组。

Before the bashing begins, let me explain.在抨击开始之前,让我解释一下。 Most Python programs start out with using "" for strings.大多数 Python 程序开始时使用""作为字符串。 But then they need to support documentation off the Internet, so they start using "".decode and all of a sudden they are getting exceptions everywhere about decoding this and that - all because of the use of "" for strings.但是随后他们需要支持 Internet 上的文档,因此他们开始使用"".decode并且突然之间他们在解码这个和那个方面到处都出现异常——这一切都是因为对字符串使用了"" In this case, Unicode does act like a virus and will wreak havoc.在这种情况下,Unicode 确实像病毒一样会造成严重破坏。

But, if you follow my rule, you won't have this infection (because you will already be infected).但是,如果你遵循我的规则,你就不会被感染(因为你已经被感染了)。

It's Unicode.是 Unicode。

Just put the variable between str() , and it will work fine.只需将变量放在str()之间,它就可以正常工作。

But in case you have two lists like the following:但是,如果您有两个如下列表:

a = ['co32','co36']
b = [u'co32',u'co36']

If you check set(a)==set(b) , it will come as False, but if you do as follows:如果您检查set(a)==set(b) ,它将显示为 False,但如果您执行以下操作:

b = str(b)
set(a)==set(b)

Now, the result will be True.现在,结果将是 True。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM