简体   繁体   English

将二进制(字节)转换为str,反之亦然

[英]Converting binary (bytes) to str and vice versa

Consider I have a binary data such as in a dataframe 's column : 考虑我有一个二进制数据,例如在dataframe的列中:

b'x\\x9c\\xd4\\x14Q\\xd3\\xf7\\x92\\x8b\\x89 \\x01\\xc3)B\\x8c\\x80\\x91#\\x86\\xfb\\xa6\\x9b\\x10\\xce\\x00\\x86p\\x85Cr\\x11\\xd8p\\x84\\xcc\\x12<A\\x17!'

I need an idea how do I convert it to string for each binary column value only, as this dataframe need to be converted to json type and passed as a result of a rest api endpoint. 我需要一个想法如何仅将其转换为每个binary列值的字符串,因为此dataframe需要转换为json类型并作为rest api端点的结果传递。

Here is how I do return dataframe in json format( doing it to dataframe with binary will raise exception): 这是我如何以json格式返回dataframe (使用二进制dataframedataframe会引发异常):
return json.loads(df.to_json(orient='table'))
Whereas df is a dataframe df是一个数据框

I would of course will want to know how to convert back the binary values from string representation of bytes to bytes - binary again. 我当然会想知道如何将二进制值从bytes字符串表示形式转换回bytes -再次为二进制。

You need to know the encoding used to create those bytes. 您需要知道用于创建这些字节的编码。 The default encoding is platform specific, you check yours: 默认编码是特定于平台的,请检查以下内容:

import sys
sys.getdefaultencoding() # 'utf-8' on macos python 3.7

If you pass a string to pickle.dumps it will encode it using the default encoding, if you want to use another encoding you can encode the string before passing it to pickle.dumps (as an example). 如果将字符串传递给pickle.dumps ,它将使用默认编码进行编码;如果要使用其他编码,则可以在将字符串传递给pickle.dumps之前对其进行pickle.dumps (例如)。

In [2]: pickle.dumps('höy') # will be 'utf-8' encoded by default
Out[2]: b'\x80\x03C\x04h\xc3\xb6yq\x00.'

In [3]: 'höy'.encode('utf-8')
Out[3]: b'h\xc3\xb6y' # compare with the previous output

In [4]: pickle.dumps('höy'.encode('latin1'))
Out[4]: b'\x80\x03C\x03h\xf6yq\x00.'

In [5]: 'höy'.encode('latin1')
Out[5]: b'h\xf6y' # compare with the previous output

Based on the encodings you can decode your strings: 根据编码,您可以解码字符串:

In [1]: 'höy'.encode('utf-8').decode('utf-8')
Out[1]: 'höy'

In [2]: 'höy'.encode('latin-1').decode('latin-1')
Out[2]: 'höy'

Using the wrong encoding will result in failure or wrong result: 使用错误的编码将导致失败或错误的结果:

In [3]: 'höy'.encode('utf-8').decode('latin-1')
Out[3]: 'höy'

And not every random bunch of bytes is an encoded string: 并非每个随机字节都是编码字符串:

In [6]: pickle.dumps('höy').decode('utf-8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-2b1872a5aa1a> in <module>
----> 1 pickle.dumps('höy').decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM