简体   繁体   中英

why is sys.stdin.read(4).encode('utf-8') returning more than 4 bytes?

I am passing a JSON object from Chrome to my Python app's stdin via the Chrome/JavaScript sendNativeMessage function.

Sometimes, the below code works. Other times (I believe on larger messages), it does not work. I'm not sure what I'm doing wrong, but I will say that sometimes sys.stdin.read(4).encode('utf-8') seems to read 7 bytes instead of the specified 4 bytes, and that's when it breaks with a "struct.error: unpack requires a byte object of length 4" message.

Can someone let me know what I'm doing wrong here?

# On Windows, the default I/O mode is O_TEXT. Set this to O_BINARY
# to avoid unwanted modifications of the input/output streams.
import os, msvcrt
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

# Read the message length (first 4 bytes).
#for line in sys.stdin:
text_length_bytes = sys.stdin.read(4).encode('utf-8')

logging.info( text_length_bytes )

# Unpack message length as 4 byte integer.
text_length = struct.unpack('i', text_length_bytes)[0]

logging.info( text_length )

# Read the text of the message.
text = json.loads( sys.stdin.read(text_length) )

One Unicode character may consist of more than one byte:

In [4]: len('ü'.encode('utf-8'))
Out[4]: 2

As you want to decode those 4 bytes as integer, you probably want to read them as bytes (instead of str) from stdin in the first place:

In [8]: type(sys.stdin.read(4))
aoeu
Out[8]: str

In [9]: type(sys.stdin.buffer.read(4))
aoeu
Out[9]: bytes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM