UnicodeDecodeError when editing HTML code with Python

Question

I'm using mitmproxy to manipulate the returning HTML code of webpages. When I'm using commands on that HTML code I got UnicodeDecodeError .

I tried to do anything, and read any post here and still nothing worked for me.

Two examples of many things I already tried:

msg.response.content = unicode(msg.response.content, errors='ignore'))
msg.response.content = msg.response.content.decode('utf8').encode('ascii', errors='ignore'))

How can I deal with that?

Answer 1

Try using the mitmproxy.flow.decoded context manager, like so:

from mitmproxy.flow import decoded

def response(context, flow):
    with decoded(flow.response):
        flow.response.content = flow.response.content.replace("Google", "Noogle")

From the source:

A context manager that decodes a request, response or error, and then re-encodes it with the same encoding after execution of the block.

Example:
  with decoded(request): request.content = request.content.replace("foo", "bar") 

Note: I used mitmproxy on Ubuntu 14.04.

Answer 2

To be sure you are decoding correctly, you'll need to look in the source code of the HTML page for something like <meta charset="utf-8"> or <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"> . The charset value is the encoding that the page is saying it is using.

if running type(msg.response.content) returns that the type is str, then you need to run msg.response.content = msg.resposne.content.decode(u'utf-8') where "utf-8" is the encoding the page says it is using. This could also be something like ISO-8859-1 or windows-1251 or ASCII.

UnicodeDecodeError when editing HTML code with Python

Question

2 answers

solution1
0 2016-02-24 17:20:04

solution2
0 2016-03-20 20:31:57

UnicodeDecodeError when editing HTML code with Python

Question

2 answers

solution1 0 2016-02-24 17:20:04

solution2 0 2016-03-20 20:31:57

solution1
0 2016-02-24 17:20:04

solution2
0 2016-03-20 20:31:57