繁体 English 中英

如何使用Python 2处理恶意编码的字符？

[英]How can I handle mal-encoded character with Python 2?

原文 2014-11-27 09:28:41 0 1 python/ unicode/ internationalization/ shift-jis

我要提取的HTML文件包含一些HTML标头中指定的编码不支持的字符：

我发现Shift_JIS编码不支持以下功能，但实际使用了以下功能。 我的浏览器可以正确显示这些字符。

∑ n进制求和U + 2211
ﾟ半角片假名半清音标记U + FF9F
西里尔字母大写字母de U + 414

当我尝试读取此HTML文件并解码以进行处理时，出现UnicodeDecodeError。

url = 'http://matsucon.net/material/dic/kao09.html'
response = urllib2.urlopen(url)
response.read().decode('shift_jis_2004')

有什么好的方法可以处理包含错误编码字符的HTML，而不会出现错误？

1 个解决方案

尝试这个：

response.read().decode('shift_jis_2004',errors='ignore')

处理 Python unicode 字符串中错误编码的字符

[英]Handle wrongly encoded character in Python unicode string

如何在python中打印由GBK编码的单词？

[英]How can I print word encoded by GBK in python?

如何在python中处理boto异常？

[英]How can I handle a boto exception in python?

如何绕过Python的字符数限制？

[英]How can I bypass the character limit of Python?

我如何使用 tensorflow 处理 python 代码

[英]How can i handle python code with tensorflow

如何处理python中的属性？

[英]How can I handle attributes in python?

如何在 Python 中逐个字符地删除 Tkinter StringVar？

[英]How can I delete a Tkinter StringVar character by character in Python?

在Debian上显示编码字符python 3

[英]Display encoded character python 3 on debian

如何在Python 2.7中正确处理字符编码？

[英]How to handle character encodings properly in Python 2.7?

Python-Python 3.1似乎无法处理UTF-16编码的文件？

[英]Python - Python 3.1 can't seem to handle UTF-16 encoded files?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 处理 Python unicode 字符串中错误编码的字符如何在python中打印由GBK编码的单词？如何在python中处理boto异常？如何绕过Python的字符数限制？我如何使用 tensorflow 处理 python 代码如何处理python中的属性？如何在 Python 中逐个字符地删除 Tkinter StringVar？在Debian上显示编码字符python 3 如何在Python 2.7中正确处理字符编码？ Python-Python 3.1似乎无法处理UTF-16编码的文件？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM