简体   繁体   中英

UnicodeDecodeError when editing HTML code with Python

I'm using mitmproxy to manipulate the returning HTML code of webpages. When I'm using commands on that HTML code I got UnicodeDecodeError .

I tried to do anything, and read any post here and still nothing worked for me.

Two examples of many things I already tried:

msg.response.content = unicode(msg.response.content, errors='ignore'))
msg.response.content = msg.response.content.decode('utf8').encode('ascii', errors='ignore'))

How can I deal with that?

Try using the mitmproxy.flow.decoded context manager, like so:

from mitmproxy.flow import decoded

def response(context, flow):
    with decoded(flow.response):
        flow.response.content = flow.response.content.replace("Google", "Noogle")

From the source:

A context manager that decodes a request, response or error, and then re-encodes it with the same encoding after execution of the block.

Example:

  with decoded(request): request.content = request.content.replace("foo", "bar") 

Note: I used mitmproxy on Ubuntu 14.04.

To be sure you are decoding correctly, you'll need to look in the source code of the HTML page for something like <meta charset="utf-8"> or <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"> . The charset value is the encoding that the page is saying it is using.

if running type(msg.response.content) returns that the type is str, then you need to run msg.response.content = msg.resposne.content.decode(u'utf-8') where "utf-8" is the encoding the page says it is using. This could also be something like ISO-8859-1 or windows-1251 or ASCII.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM