简体   繁体   中英

remove prefix b in encoded text in python

text = "hello world what is happening"
encodedText = text.encode('utf-16') #Encoding the input text
textReplaced = encodedText.replace('h'.encode('utf-16'), 'Q'.encode('utf-16')) #Doing the replacement of an encoded character by another encoded character

print('Input : ', text)
print('Expected Output : Qello world wQat is Qappening')
print('Actual Output : ', textReplaced.decode('utf-16'))
print('Encoded h : ', 'h'.encode('utf-16'))
print('Encoded Q : ', 'Q'.encode('utf-16'))
print('Encoded Actual Output : ', textReplaced)

Output:

Input :  hello world what is happening
Expected Output : Qello world wQat is Qappening
Actual Output :  Qello world what is happening
Encoded h :  b'\xff\xfeh\x00'
Encoded Q :  b'\xff\xfeQ\x00'
Encoded Actual Output :  b'\xff\xfeQ\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x00w\x00h\x00a\x00t\x00 \x00i\x00s\x00 \x00h\x00a\x00p\x00p\x00e\x00n\x00i\x00n\x00g\x00'

The problem with the code is since the encoded character has a prefix b' for every encoded string or character, the replacement is done only on the first occurrence in the Encoded Input.

The problem is that the replacement bytes include the byte order mark ( b'\xff\xfe' ), which is only present a the beginning of the bytestring. If you are obliged to do the replacing in bytes rather than in str , you need to encode the replacement bytes without a BOM by using the UTF-16 encoding that matches the endianness of your system (or the bytes, which might not be the same).

Assuming the endianness of the bytes is that of your system, this will work:

>>> import sys
>>> enc = 'utf-16-le' if sys.byteorder == 'little' else 'utf-16-be'
>>> textReplaced = encodedText.replace('h'.encode(enc), 'Q'.encode(enc))
>>> textReplaced.decode('utf-16')
'Qello world wQat is Qappening'

An even simpler, and more flexible, approach would be to use thebytes.translate method.

>>> trans_table = bytes.maketrans('h'.encode('utf-16'), 'Q'.encode('utf-16'))
>>> print(encodedText.translate(trans_table).decode('utf-16'))
Qello world wQat is Qappening

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM