Python: UnicodeEncodeError 'ascii' codec

Question

I would just like the python code to work but these conversion errors I don't understand (I always get some type of 'ascii' encoding or decoding error). I went crazy and did a decode and encode on every part of the line and it still giving me trouble. It's available via GIT at https://github.com/TBOpen/papercut if you would be so kind as to correct it (I also solved a similar error not checked in on line 885 using self.wfile.write(message.decode('cp1250', 'replace').encode('ascii', 'replace') + "\\r\\n") .

However here's the traceback for the one I can't solve (where I gave up).

Traceback (most recent call last):
  File "/usr/local/lib/python2.6/SocketServer.py", line 535, in process_request
    self.finish_request(request, client_address)
  File "/usr/local/lib/python2.6/SocketServer.py", line 320, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python2.6/SocketServer.py", line 615, in __init__
    self.handle()
  File "./papercut.py", line 221, in handle
    getattr(self, "do_%s" % (command))()
  File "./papercut.py", line 410, in do_ARTICLE
    self.send_response("%s\r\n%s\r\n\r\n%s\r\n.".decode('cp1250', 'replace').encode('ascii', 'replace') % (response.decode('cp1250', 'replace').encode('ascii', 'replace'), result[0].decode('cp1250', 'replace').encode('ascii', 'replace'), result[1].decode('cp1250', 'replace').encode('ascii', 'replace')))
  File "/usr/local/lib/python2.6/encodings/cp1250.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 20: ordinal not in range(128)

TIA!!

Answer 1

The root problem is that one of response , result[0] , or result[1] is actually a unicode string, not an encoded str string.

So, when you call (picking one arbitrarily) response.decode('cp1250', 'replace') , you're asking to decode something that's already decoded to Unicode. What Python 2.x does with this is to first encode it to your default encoding (ASCII) so that it can decode it as you requested. And that's why you're getting a UnicodeEncodeError from trying to call decode .*

To fix this, you're going to have to figure out which one of the three is wrong, and why. That's not possible with a giant mess of a statement with 4 decode calls in it, but it's easy if you break it up into separate statements, or just add some print debugging to see what's in those variables right before they get used.

However, it would make your life a whole lot easier to reorganize your code completely. Instead of converting everything back and forth all over the place, giving yourself dozens of places to make a simple mistake that ends up causing an un-debuggable error halfway across your program, just decode all of your input at input time, process everything as Unicode, then encode everything at output time.

By the way, if you haven't read Python's Unicode HOWTO , and the blog post The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) , go read them before going any further.

* If you think this is a silly design for a language… well, that's the main reason Python 3 exists. In Python 3, you can't decode a unicode or encode a bytes , so the error shows up as early as possible, and tells you exactly what's wrong, instead of making you try to hunt down where you called the wrong method on the wrong type and got an error that makes no sense. So if you want to use Python 2 instead of 3, you don't get to complain that Python 2's design is sillier than 3's.

Python: UnicodeEncodeError 'ascii' codec

Question

1 answers

solution1
1 ACCPTED 2014-01-13 09:15:59

Python: UnicodeEncodeError 'ascii' codec

Question

1 answers

solution1 1 ACCPTED 2014-01-13 09:15:59

solution1
1 ACCPTED 2014-01-13 09:15:59