简体   繁体   English

Python将混合的ASCII代码转换为字符串

[英]Python convert mixed ASCII code to String

I am retrieving a value that is set by another application from memcached using python-memcached library. 我正在使用python-memcached库从memcached检索另一个应用程序设置的值。 But unfortunately this is the value that I am getting: 不幸的是,这就是我得到的价值:

>>> mc.get("key")
'\x04\x08"\nHello'

Is it possible to parse this mixed ASCII code into plain string using python function? 是否可以使用python函数将此混合的ASCII代码解析为纯字符串?

Thanks heaps for your help 谢谢您的帮助

It is a "plain string", to the extent that such a thing exists. 只要存在这样的东西,它就是一个“普通字符串”。 I have no idea what kind of output you're expecting, but: 我不知道您期望什么样的输出,但是:

There ain't no such thing as plain text . 没有纯文本这样的东西

The Python (in 2.x, anyway) str type is really a container for bytes, not characters. Python(无论如何在2.x中) str类型实际上是字节而不是字符的容器。 So it isn't really text in the first place :) It displays the bytes assuming a very simple encoding, using escape sequence to represent every byte that's even slightly "weird". 因此,它并不是真正的文本:)它以非常简单的编码显示字节,使用转义序列表示每个稍微有些“怪异”的字节。 It will be formatted differently again if you print the string (what you're seeing right now is syntax for creating such a literal string in your code). 如果您print字符串,它将再次以不同的格式设置(您现在看到的是在代码中创建此类文字字符串的语法)。

In simpler times, we naively assumed that we could just map bytes to these symbols we call "characters", and that would be that. 在更简单的时间里,我们天真地假设我们可以将字节映射到我们称为“字符”的这些符号,就是这样。 Then it turned out that there were approximately a zillion different mappings that people wanted to use, and lots of them needed more symbols than a byte could represent. 事实证明,人们想要使用大约不计其数的不同映射,其中许多映射需要的符号多于一个字节所能代表的数量。 Which is why we have Unicode now: it represents every symbol you could conceivably need for any real-world language (and several for fake languages and other purposes), and it abstractly assigns numbers to those symbols but does not say how to collect and interpret the bytes as numbers. 这就是我们现在使用Unicode的原因:它代表了您可能想到的任何现实世界语言中的每个符号(以及一些用于伪造语言和其他用途的符号),并且抽象地为这些符号分配了数字,但没有说明如何收集和解释字节作为数字。 (That is the purpose of the encoding). (这是编码的目的)。

If you know that the string data is encoded in a particular way, you can decode it to a Unicode string. 如果您知道字符串数据是以特定方式编码的,则可以将其解码为Unicode字符串。 It could either be an encoding of actual Unicode data, or it could be in some other format (for example, Japanese text is often found in something called "Shift-JIS", because it has approximately the same significance to them as "Latin-1" - a common extension of ASCII - does to us). 它可以是实际Unicode数据的编码,也可以是其他格式(例如,日语文本经常在“ Shift-JIS”中找到,因为它对它们的意义与“ Latin- 1“-ASCII的常见扩展名-对我们有用。 Either way, you get an in-memory representation of a series of Unicode code points (the numbers referred to in the previous paragraph). 无论哪种方式,您都会获得一系列Unicode代码点(在上一段中引用的数字)的内存内表示。 This, for all intents and purposes, is really "text", but it isn't really "plain" :) 就所有意图和目的而言,这实际上是“文本”,但不是真正的“纯文本” :)

But it looks like the data you have is really a binary blob of bytes that simply happens to consist mostly of "readable text" if interpreted as ASCII. 但是看起来您拥有的数据实际上是一个二进制的二进制blob,如果将其解释为ASCII,则恰好主要由“可读文本”组成。

What you really need to do is figure out why the first byte has a value of 4 and the next byte has a value of 8, and proceed accordingly. 您真正需要做的是找出为什么第一个字节的值为4,而下一个字节的值为8,然后继续进行操作。

If you just need to trim the '\\x04\\x08"\\n' , and it's always the same (you haven't put your question very clearly, I'm not certain if that's what it is or what you want), do something like this: 如果您只需要修剪'\\x04\\x08"\\n' ,并且它始终是相同的(您的问题并没有很清楚,我不确定那是它的意思还是您想要的),请执行像这样的东西:

to_trim = '\x04\x08"\n'
string = mc.get('key')
if string.startswith(to_trim):
    string = string[len(to_trim):]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM