简体   繁体   中英

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 1023: unexpected end of data

Basically I've made an IRC Twitch bot in Python that does nothing but join the channel for now. The ping-pong cycle works properly for a while but then it gets halted with the error in the title. What did I do wrong? Thanks in advance.

import re
import socket

HOST = "irc.twitch.tv"
PORT = 6667
NICK = "asdsad"
PASS = "oauth:asdasdasdasd"
channel = "#coolperson"

def send_message(sock, msg):
    sock.send("PRIVMSG #{} {}".format(channel, msg))

s = socket.socket()
s.connect((HOST, PORT))
s.send("PASS {}\r\n".format(PASS).encode("utf-8"))
s.send("NICK {}\r\n".format(NICK).encode("utf-8"))
s.send("JOIN {}\r\n".format(channel).encode("utf-8"))

while True:
    response = s.recv(1024).decode("utf-8")
    if response == "PING :tmi.twitch.tv\r\n":
        s.send("PONG :tmi.twitch.tv\r\n".encode("utf-8"))
        print("answered the call")

You can skip that error. Instead of the following part:

response = s.recv(1024).decode("utf-8")

Use this one:

response = s.recv(1024).decode('utf-8', 'ignore')

You're reading part of a multi-byte character from the network. Your buffer is 1024 bytes and you see 0xe2 at the end of that buffer, index 1023. Non-ASCII characters with codepoints greather than 127 are multi-byte in UTF-8, and you don't get to control where the split is when reading stuff from the network, so if you're unlucky you'll see a character split across calls to recv() . If you set the 'ignore' option to decode, you'll be throwing that character away, effectively dropping it.

If you're just looking for the "ping", dropping other stuff is probably OK as the text you're looking for is pure ASCII. If you need all that text (for example, to display it to a user), you'd have to check for unterminated characters before trying to decode the byte string from the network. Python's codecs module provides the Incremental Encode/Decoder interface for this, you can feed an Incremental Decoder bytes and it will output whatever characters it can decode and cope with what it can't by retaining state between calls. See https://docs.python.org/3/library/codecs.html#incremental-encoding-and-decoding for docs and python decode partial utf-8 byte array for an example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM