简体   繁体   中英

How to convert bits into bytes in python?

I'm using the following code to turn characters into bits and I don't know how to convert the bits back into their characters.

I tried following the steps I took to reverse the process. I know that the opposite of ord() is chr(), but how do I reverse the format(ord(char),"b")? any help is appreciated

temp = format(ord(char), 'b')

You can convert the string back to an integer with int() passing a base of 2 and then back to a character with chr() :

temp = format(ord('a'), 'b')
print(temp)
#'1100001'

c = chr(int(temp, 2))
print(c)
# 'a'

Mark Meyer's answer is spot on, and works for any character :

>>> char = '😎'
>>> bits = format(ord(char), 'b')
>>> bits
'11111011000001110'
>>> char = chr(int(bits, 2))
>>> char
'😎'

But it only works for characters, not for grapheme clusters. Suppose you had the woman scientist emoji:

>>> char = '👩‍🔬'
>>> bits = format(ord(char), 'b')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 3 found

This does not work because the woman scientist emoji is not a single character, but rather a grapheme cluster made up of three characters:

  • WOMAN
  • ZERO WIDTH JOINER
  • MICROSCOPE

So the string has three characters and you can not do ord on a three character string.

I think it's important to not here that turning a single character into a bit string for its code point is highly unusual and in practice this is never really done (unless you are using an encoding known as UTF-32 BE, in which case you should pad the bit string out with zeros to 32 places). IMHO, what you should be doing here is NOT using ord and chr , but rather encoding and decoding using UTF-8. The very idea of turning characters into bits or bytes should be done with a well known character encoding scheme, and UTF-8 is the most proper.

Here is how I would suggest you do the character and bit thing:

>>> char = '👩‍🔬'
>>> bytes = char.encode('utf-8')
>>> bytes
b'\xf0\x9f\x91\xa9\xe2\x80\x8d\xf0\x9f\x94\xac'
>>> char = bytes.decode('utf-8')
>>> char
'👩‍🔬'

If you want bits and not bytes, then:

>>> char = '👩‍🔬'
>>> bytes = char.encode('utf-8')
>>> bits = ''.join(f'{b:08b}' for b in bytes)
>>> bits
'1111000010011111100100011010100111100010100000001000110111110000100111111001010010101100'

To and from bits using Python 3.6+ f-strings:

>>> char = 'a'
>>> bits = f'{ord(char):08b}'  # 08b means 8 binary digits with leading zeros.
>>> bits
'01100001'
>>> chr(int(bits,2)) # convert string to integer using base 2.
'a'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM