简体   繁体   English

Python Unicode错误,“ ascii”编解码器无法编码字符

[英]Python Unicode error, 'ascii' codec can't encode character

I am getting the following error : 我收到以下错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 587: ordinal not in range(128)

My code: 我的代码:

import os
from bs4 import BeautifulSoup

do = dir_with_original_files = 'C:\Users\Me\Directory'
dm = dir_with_modified_files = 'C:\Users\Me\Directory\New'
for root, dirs, files in os.walk(do):
    for f in files:
        if f.endswith('~'): #you don't want to process backups
            continue
        original_file = os.path.join(root, f)
        mf = f.split('.')
        mf = ''.join(mf[:-1])+'_mod.'+mf[-1] # you can keep the same name 
                                             # if you omit the last two lines.
                                             # They are in separate directories
                                             # anyway. In that case, mf = f
        modified_file = os.path.join(dm, mf)
        with open(original_file, 'r') as orig_f, \
             open(modified_file, 'w') as modi_f:
            soup = BeautifulSoup(orig_f.read())
            for t in soup.find_all('td', class_='test'):
                t.string.wrap(soup.new_tag('h2'))
            # This is where you create your new modified file.
            modi_f.write(soup.prettify())

This code is iterating over a directory, and for each file finds all of the tds of class test and adds h2 tags to the text within the td. 这段代码在目录上进行迭代,并且对于每个文件,查找类test的所有tds并将h2标记添加到td中的文本。 So previously, it would have been : 因此,以前是:

<td class="test"> text </td>

After running this program, a new file will be created with : 运行该程序后,将使用以下命令创建一个新文件:

<td class="test"> <h2>text</h2> </td>

Or this is how I would like it to function. 或这就是我希望它起作用的方式。 Unfortunately, currently, I am getting the error described above. 不幸的是,目前,我遇到了上述错误。 I believe this is because I am parsing some text which includes accented characters and is written in Spanish, with special Spanish characters. 我相信这是因为我正在解析一些包含重音符号的文本,这些文本用西班牙语写成特殊的西班牙字符。

What can I do to fix my issue? 我该怎么做才能解决我的问题?

soup.prettify() returns a Unicode string , but your file expects a byte string . soup.prettify()返回Unicode字符串 ,但是您的文件需要一个字节字符串 Python tries to help here and automatically encodes the result, but your Unicode string contains codepoints that are beyond the ASCII standard and thus the encoding fails. Python尝试在此处提供帮助并自动对结果进行编码,但是您的Unicode字符串包含的编码点超出了ASCII标准,因此编码失败。

You'll have to either manually encode to a different codec, or use a different file object type that'll do this automatically for you. 您将必须手动编码为其他编解码器,或者使用其他文件对象类型来自动为您执行此操作。

In this case, I'd encode to the original encoding that BeautifulSoup detected for you: 在这种情况下,我将编码为BeautifulSoup为您检测到的原始编码

modi_f.write(soup.prettify().encode(soup.original_encoding))

The soup.original_encoding reflects what the BeautifulSoup decoded the unmodified HTML as, and is based (if at all available) on the encoding that the HTML itself declared, or an educated guess based on statistical analysis of the bytes of the original data. soup.original_encoding反映了BeautifulSoup解码未经修改的HTML的内容,并且基于(如果有的话)基于HTML本身声明的编码,或者基于对原始数据字节的统计分析得出的有根据的猜测。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python unicode错误。 UnicodeEncodeError:&#39;ascii&#39;编解码器无法编码字符u&#39;\\ u4e3a&#39; - Python unicode error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a' Python错误:UnicodeEncodeError:&#39;ascii&#39;编解码器无法编码字符 - Python error : UnicodeEncodeError: 'ascii' codec can't encode character python&#39;ascii&#39;编解码器无法编码字符 - python 'ascii' codec can't encode character &#39;ascii&#39;编解码器无法编码字符错误 - 'ascii' codec can't encode character error Unicode编码错误:&#39;ascii&#39;编解码器无法编码字符u&#39;\\ u2019&#39; - Unicode Encode Error: 'ascii' codec can't encode character u'\u2019' python csv unicode'ascii'编解码器无法编码位置1中的字符u'\ xf6':序数不在范围内(128) - python csv unicode 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128) 在XML和python中使用Unicode字符:“ ascii”编解码器无法在位置0-3处编码字符:序数不在范围内(128) - Using unicode character in XML with python : 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) 收到UnicodeEncodeError的Python脚本:“ ascii”编解码器无法编码字符 - Python script receiving a UnicodeEncodeError: 'ascii' codec can't encode character Python,Docker - “ascii”编解码器无法编码字符 - Python, Docker - 'ascii' codec can't encode character Python3中的“ UnicodeEncodeError:&#39;ascii&#39;编解码器无法编码字符” - “UnicodeEncodeError: 'ascii' codec can't encode character” in Python3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM