Python 3 UnicodeDecodeError：“charmap”編解碼器無法解碼字節 0x9d

Question

我想制作搜索引擎，並在某些網站上遵循教程。 我想測試解析html

from bs4 import BeautifulSoup

def parse_html(filename):
    """Extract the Author, Title and Text from a HTML file
    which was produced by pdftotext with the option -htmlmeta."""
    with open(filename) as infile:
        html = BeautifulSoup(infile, "html.parser", from_encoding='utf-8')
        d = {'text': html.pre.text}
        if html.title is not None:
            d['title'] = html.title.text
        for meta in html.findAll('meta'):
            try:
                if meta['name'] in ('Author', 'Title'):
                    d[meta['name'].lower()] = meta['content']
            except KeyError:
                continue
        return d

parse_html("C:\\pdf\\pydf\\data\\muellner2011.html")

它得到錯誤

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 867: character maps to <undefined>enter code here

我在網上看到了一些使用 encode() 的解決方案。 但我不知道如何在代碼中插入 encode() 函數。 誰能幫我？

Answer 1

在 Python 3 中，文件以文本形式打開（解碼為 Unicode）； 您不需要告訴 BeautifulSoup 解碼的編解碼器。

如果數據解碼失敗，那是因為您沒有告訴open()調用在讀取文件時使用什么編解碼器； 使用encoding參數添加正確的編解碼器：

with open(filename, encoding='utf8') as infile:
    html = BeautifulSoup(infile, "html.parser")

否則該文件將使用您的系統默認編解碼器打開，這取決於操作系統。

Python 3 UnicodeDecodeError：“charmap”編解碼器無法解碼字節 0x9d

問題描述

1 個解決方案

解決方案1
79 已采納 2015-06-10 08:36:33

Python 3 UnicodeDecodeError：“charmap”編解碼器無法解碼字節 0x9d

問題描述

1 個解決方案

解決方案1 79 已采納 2015-06-10 08:36:33

解決方案1
79 已采納 2015-06-10 08:36:33