简体   繁体   English

u'太'u'太多'u'unicode'u返回

[英]u'Too' u'much' u'unicode' u'returned'

I have an api which I'm putting things into and out of in a natural language processing context, using json. 我有一个api,它使用json在自然语言处理上下文中将内容放入和取出。

Everything is coming out as unicode. 一切都以unicode的形式出现。 For example, if retrieve a list of words from my api, every single word is u''. 例如,如果从我的api中检索单词列表,则每个单词都是u''。 This is what the json output looks like after printing to a file: 这是打印到文件后json输出的样子:

{u'words': [u'every', u'single', u'word']}

I must clarify that in the terminal everything looks good, just not when I print the output to a file. 我必须澄清一下,在终端中一切看起来都不错,只是当我将输出打印到文件中时不是。

I haven't figured out yet if this is preferable default behavior or if I need to do something along the way to make this plain, or what. 我还没有弄清楚这是否是默认的默认行为,或者是否需要做一些简单的事情来使之简单明了。 The outputs are going to used with languages other than python, other contexts where they need to be readable and/or parseable. 输出将用于除python之外的其他语言以及需要可读和/或可解析的其他上下文。

So clearly I don't have a grasp on python & unicode and how and where this is being. 所以很明显,我对python和unicode以及它的运行方式和位置不了解。

  1. Is this preferable when dealing with json? 在处理json时是否更可取? Should I not worry about it? 我应该不用担心吗?

  2. How I turn this off, or how do I take an extra step (I've already tried but can't figure out exactly where this is doing this) to make this less of a nuisance. 如何关闭此功能,或如何采取进一步的措施(我已经尝试过,但是无法确切地知道这样做的位置),以减少麻烦。

I have much to learn, so any input is appreciated. 我有很多东西要学,所以任何输入都值得赞赏。

EDIT: all the input has been useful, thank you. 编辑:所有输入都很有用,谢谢。

I was under the mistaken notion that jsonify did more than it actually does I guess. 我误以为JSONify所做的比我猜想的要多。 If I do json.dumps earlier in my task chain, I get actual json on the other end. 如果我在任务链的前面进行json.dumps,则在另一端会得到实际的json。

There is nothing wrong with this, and you don't need to do anything about it. 这没有错,您无需为此做任何事情。

In Python 2, a str is similar to a C string - it's just a sequence of bytes, sometimes incorrectly assumed to be ASCII text. 在Python 2中, str类似于C字符串-它只是一个字节序列,有时错误地假定为ASCII文本。 It can contain encoded text, eg as UTF-8 or ASCII. 它可以包含编码文本,例如UTF-8或ASCII。

The unicode type represents an actual string of text, similar to a Java String . unicode类型表示实际的文本字符串,类似于Java String It is text in the abstract sense, not tied to a particular encoding. 它是抽象意义上的文本,与特定的编码无关。 You can decode a str into unicode , or encode a unicode into a str . 您可以将解码strunicode ,或编码unicode到一个str

JSON keys and values are strings - they are not byte arrays, but text - so they are represented by unicode objects in Python. JSON键和值是字符串-它们不是字节数组,而是文本-因此它们由Python中的unicode对象表示。

If you need JSON output for use in another language, use the json module to produce it from your dictionary: 如果您需要JSON输出以用于另一种语言,请使用json模块从您的字典中生成它:

>>> import json
>>> print json.dumps({u'words': [u'every', u'single', u'word']})
{"words": ["every", "single", "word"]}

It is preferable, yes, since JSON is defined to be unicode. 最好是,因为JSON被定义为unicode。

If you have more specific things that are causing you trouble you should share them, otherwise I'd recommend watching Ned Batchelder's Intro if you're just generally uncomfortable with Unicode (in Python in particular). 如果您有其他更麻烦的事情引起麻烦,应该与他人分享,否则,如果您通常不喜欢Unicode(尤其是Python),我建议您观看Ned Batchelder的介绍 I don't know what is causing this to be a nuisance to you, since I don't know what you're doing with this dict. 我不知道是什么使您感到烦恼,因为我不知道您在使用此命令做什么。

You should keep everything internal to python in unicode if there's any chance you will need it. 如果有可能,您应该将所有内部内容都保留在unicode中。 Where python speaks to other programs, use s.encode('UTF-8') to make a regular string that you can write to a file or socket or whatever. 在python与其他程序s.encode('UTF-8') ,使用s.encode('UTF-8')生成一个常规字符串,您可以将其写入文件或套接字或任何其他内容。 Use s.decode('UTF-8') to convert a string from a file/socket back to unicode. 使用s.decode('UTF-8')将字符串从文件/套接字转换回unicode。 (UTF-8 seems like a reasonable default, but use whatever your protocol specifies.) (UTF-8似乎是一个合理的默认值,但请使用您的协议指定的任何内容。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM