I'm using the following code to turn characters into bits and I don't know how to convert the bits back into their characters.
I tried following the steps I took to reverse the process. I know that the opposite of ord() is chr(), but how do I reverse the format(ord(char),"b")? any help is appreciated
temp = format(ord(char), 'b')
You can convert the string back to an integer with int()
passing a base of 2
and then back to a character with chr()
:
temp = format(ord('a'), 'b')
print(temp)
#'1100001'
c = chr(int(temp, 2))
print(c)
# 'a'
Mark Meyer's answer is spot on, and works for any character :
>>> char = '😎'
>>> bits = format(ord(char), 'b')
>>> bits
'11111011000001110'
>>> char = chr(int(bits, 2))
>>> char
'😎'
But it only works for characters, not for grapheme clusters. Suppose you had the woman scientist emoji:
>>> char = '👩🔬'
>>> bits = format(ord(char), 'b')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 3 found
This does not work because the woman scientist emoji is not a single character, but rather a grapheme cluster made up of three characters:
So the string has three characters and you can not do ord
on a three character string.
I think it's important to not here that turning a single character into a bit string for its code point is highly unusual and in practice this is never really done (unless you are using an encoding known as UTF-32 BE, in which case you should pad the bit string out with zeros to 32 places). IMHO, what you should be doing here is NOT using ord
and chr
, but rather encoding and decoding using UTF-8. The very idea of turning characters into bits or bytes should be done with a well known character encoding scheme, and UTF-8 is the most proper.
Here is how I would suggest you do the character and bit thing:
>>> char = '👩🔬'
>>> bytes = char.encode('utf-8')
>>> bytes
b'\xf0\x9f\x91\xa9\xe2\x80\x8d\xf0\x9f\x94\xac'
>>> char = bytes.decode('utf-8')
>>> char
'👩🔬'
If you want bits and not bytes, then:
>>> char = '👩🔬'
>>> bytes = char.encode('utf-8')
>>> bits = ''.join(f'{b:08b}' for b in bytes)
>>> bits
'1111000010011111100100011010100111100010100000001000110111110000100111111001010010101100'
To and from bits using Python 3.6+ f-strings:
>>> char = 'a'
>>> bits = f'{ord(char):08b}' # 08b means 8 binary digits with leading zeros.
>>> bits
'01100001'
>>> chr(int(bits,2)) # convert string to integer using base 2.
'a'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.