簡體   English   中英

關於“utf-8”編解碼器的 UnicodeDecodeError 無法在 Python 中解碼字節 0x96

[英]UnicodeDecodeError regarding 'utf-8' codec can't decode byte 0x96 in Python

我收到關於 'utf-8' 編解碼器無法解碼位置 68455 中的字節 0x96 的 UnicodeDecodeError。我通過添加編碼行嘗試了一些解決方案。 仍然沒有運氣。 我該如何解決這個錯誤?


def create_HTML(config):

    in_doc = Path(config.doc)
    if config.o:
        out_file = Path(config.o)
    else:
        out_file = Path.cwd() / f'{in_doc.stem}_email.html'
    template_file = config.t

    # Read in the entire file as a list
    # This can be problematic if the file is really large
    with open(in_doc) as f:
        all_content = f.readlines()

    # Get the title line and clean it up
    title_line = all_content[0]
    title = f'My Newsletter - {title_line[7:].strip()}'

    # Parse out the body from the meta data content at the top of the file
    body_content = all_content[6:]

    # Create a markdown object and convert the list of file lines to HTML
    markdowner = Markdown()
    markdown_content = markdowner.convert(''.join(body_content))

    # Set up jinja templates
    env = Environment(loader=FileSystemLoader('.'))
    template = env.get_template(template_file)

    # Define the template variables and render
    template_vars = {'email_content': markdown_content, 'title': title}
    raw_html = template.render(template_vars)

    # Generate the final output string
    # Inline all the CSS using premailer.transform
    # Use BeautifulSoup to make the formatting nicer
    soup = BeautifulSoup(transform(raw_html), 'html.parser').prettify(formatter="html")

    # The unsubscribe tag gets mangled. Clean it up.
    final_HTML = str(soup).replace('%7B%7BUnsubscribeURL%7D%7D', '{{UnsubscribeURL}}')
    out_file.write_text(final_HTML)

File "C:\apps\python\3.7.9\lib\site-packages\jinja2\loaders.py", line 201, in get_source        
    contents = f.read().decode(self.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 68455: invalid start byte

0x96 在二進制 10010110 中,任何匹配模式 10XXXXXX(0x80 到 0xBF)的字節只能是 UTF-8 編碼中的第二個或后續字節。 因此,流要么不是 UTF-8,要么已損壞。 由於沒有說明編碼,您應該嘗試 ISO-8859-1(又名“Latin 1”)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM