简体   繁体   中英

UnicodeEncodeError with Twitch.tv IRC bot

So I'm trying to program a simple Twitch.tv IRC bot. The bot reads incoming messages in the channel, and if the messages match certain patterns, the bot performs certain tasks. The problem that I'm getting is that if a user inputs certain unicode characters (ie if the user enters "¯_(ツ)_/¯", the program will throw the error and crash:

UnicodeEncodeError was unhandled by user code

'charmap' codec can't encode character '\\xaf' in position 13: character maps to < undefined >

Now, I want my program to be able to handle these inputs, but I have no idea what to change or add to my code to enable this. This is my code:

http://pastebin.com/EBTaqpbZ (I couldn't figure out how to use Stackoverflow code paste)

The main part of the code that I'm receiving the error on is:

while True:                                                     #Main Loop
    response = s.recv(1024).decode("utf-8")
    if response == "PING :tmi.twitch.tv\r\n":                   #If Ping, return Pong
        s.send("PONG :tmi.twitch.tv\r\n".encode("utf-8"))
        print("Pong Successful")
    else:                                                       #Else, Decode User Message
        username = re.search(r"\w+", response).group(0)         #Gets User
        message = CHAT_MSG.sub("", response)                    #Gets Message
        print (username + ": " + message)                       #Prints User Message
        if message.find("!hello") != -1:                        #Simple Test command to see if Reading Chat Input
            chat ("Hello! I'm speaking!\r\n")
    time.sleep(1 / cfg.RATE)

The error always seems to happen on the line of code: print (username + ": " + message)

does anyone know how I should go about handling these unicode characters?

(Would comment with a link to an answer but I do not have enough reputation yet.)

So, I assume you are using windows? What happens is that the encoding your console uses cannot print the unicode characters, and that causes the crash.

So the problem is not so much in the code itself, just the tools used. For example, the code runs fine when ran from a linux console. One way to overcome this problem seems to be using win-unicode-console to enable unicode input and output from windows console. See this answer for a broader description of the problem and solution.

You can also just go around the problem if you just need the print for debugging purposes:

msg = username + ": " + message
print (msg.encode("utf-8")) 

However, that is not a real solution, and the output will be something like

b'\\xc2\\xaf_(\\xe3\\x83\\x84)_/\\xc2\\xaf\\r\\n'

for your example string, so not very convenient. I recommend reading the answer I linked.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM