简体   繁体   English

如何转换包含混合十六进制字符的python字节字符串?

[英]How to convert python byte string containing a mix of hex characters?

Specifically, I am receiving a stream of bytes from a TCP socket that looks something like this: 具体来说,我正在从TCP套接字接收字节流,看起来像这样:

inc_tcp_data = b'\x02hello\x1cthisisthedata'

The stream using hex values to denote different parts of the incoming data. 使用十六进制值表示输入数据的不同部分的流。 However I want to use the inc_data in the following format: 但是我想以以下格式使用inc_data:

converted_data = '\x02hello\x1cthisisthedata'

essentially I want to get rid of the b and just literally spit out what came in. 本质上,我想摆脱b并从字面上吐出来。

I've tried various struct.unpack methods as well as .decode("encoding). I could not get the former to work at all, and the latter would strip out the hex values if there was no visual way to encode it or it would convert it to character if it could. Any ideas? 我已经尝试了各种struct.unpack方法以及.decode(“ encoding)。我根本无法使前者正常工作,如果没有可视化的方式对其进行编码,后者将去除十六进制值可以将其转换为字符,有什么想法吗?

Update: 更新:

I was able to get my desired result with the following code: 我可以使用以下代码获得所需的结果:

inc_tcp_data = b'\x02hello\x3Fthisisthedata'.decode("ascii")


d = repr(inc_tcp_data)

print(d)
print(len(d))
print(len(inc_tcp_data))

the output is: 输出为:

'\x02hello?thisisthedata'
25
20

however, this still doesn't help me because I do actually need the regular expression that follows to see \\x02 as a hex value and not as a 4 byte string. 但是,这仍然无济于事,因为我确实需要以下正则表达式将\\ x02视为十六进制值而不是4字节字符串。

what am I doing wrong? 我究竟做错了什么?

UPDATE 更新

I've solved this issue by not solving it. 我已经解决了这个问题。 The reason I wanted the hex characters to remain unchanged was so that a regular expression would be able to detect it further down the road. 我希望十六进制字符保持不变的原因是为了使正则表达式能够在以后进一步检测到它。 However what I should have done (and did) was simply change the regular expression to analyze the bytes without decoding it. 但是,我应该做的事情就是更改正则表达式以分析字节而不解码。 Once I had separated out all the parts via regular expression, I decoded the parts with .decode("ascii") and everything worked out great. 一旦我通过正则表达式将所有部分分离出来,我便使用.decode("ascii")解码了这些部分,并且一切工作都很好。

I'm just updating this if it happens to help someone else. 我只是在更新它,如果它碰巧可以帮助别人。

Assuming you are using python 3 假设您正在使用python 3

>>> inc_tcp_data.decode('ascii')
'\x02hello\x1cthisisthedata'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM