简体   繁体   English

'utf8'编解码器无法解码位置0的字节0xd0:无效的连续字节

[英]'utf8' codec can't decode byte 0xd0 in position 0: invalid continuation byte

I've the following text in an html document: 我在html文件中有以下文字:

<a href="#">�'ам интересна информация</a>

and I'm using the following expression for extracting the text: 并且我使用以下表达式提取文本:

row.xpath("string(./td[@class='col2 td-tags']/h3/a/text())")

This expression works fine for simple english, but for the above string it throws this error: 该表达式对于简单的英语来说效果很好,但是对于上面的字符串,它将引发此错误:

'utf8' codec can't decode byte 0xd0 in position 0: invalid continuation byte

In HTML, &#xxx does NOT specify a byte in the document encoding; 在HTML中,&#xxx不在文档编码中指定字节; it's ALWAYS a unicode codepoint. 它总是一个Unicode代码点。

Thus, you can't put UTF-8 into an HTML like that. 因此,您不能将UTF-8放入这样的HTML中。

What encoding is the document in? 文档采用什么编码? What character starts the text in the <a> ? <a>的文本以什么字符开头? It might be an invalid UTF-8. 它可能是无效的UTF-8。

I first decoded the page contents (which included the string <a href="#"> 'ам интересна информация</a> ) to replace any not convertible strings to question mark and it worked! 我首先对页面内容进行了解码(其中包括字符串<a href="#"> 'ам интересна информация</a> ),以替换所有不可转换的字符串为问号,并且可以正常工作!

ie page_contents_string = page_contents_string.decode("utf-8", "replace") page_contents_string = page_contents_string.decode("utf-8", "replace")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Robot Framework 命令行错误 | 获取错误“utf8”编解码器无法解码位置 0 中的字节 0xd0:继续字节无效 - Robot Framework Command Line Error | Getting Error 'utf8' codec can't decode byte 0xd0 in position 0: invalid continuation byte Pandas read_excel UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: 无效的继续字节 - Pandas read_excel UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte 渲染时捕获UnicodeDecodeError:&#39;utf8&#39;编解码器无法解码位置0中的字节0xd0:意外的数据结束 - Caught UnicodeDecodeError while rendering: 'utf8' codec can't decode byte 0xd0 in position 0: unexpected end of data django UnicodeDecodeError: &#39;utf8&#39; 编解码器无法解码位置 87 的字节 0xe9: 无效的继续字节 - django UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 87: invalid continuation byte Python&#39;utf8&#39;编解码器无法解码位置0的字节0xcd:无效的连续字节 - Python 'utf8' codec can't decode byte 0xcd in position 0: invalid continuation byte &#39;utf8&#39;编解码器无法解码位置59的字节0xdf:无效的连续字节 - 'utf8' codec can't decode byte 0xdf in position 59: invalid continuation byte UnicodeDecodeError:&#39;utf8&#39;编解码器无法解码位置4的字节0xe4:无效的连续字节 - UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 4: invalid continuation byte UnicodeDecodeError:&#39;utf8&#39;编解码器无法解码位置27的字节0xc7:无效的继续字节django - UnicodeDecodeError: 'utf8' codec can't decode byte 0xc7 in position 27: invalid continuation byte django UnicodeDecodeError:&#39;utf8&#39;编解码器无法解码位置5的字节0xcb:无效的连续字节 - UnicodeDecodeError: 'utf8' codec can't decode byte 0xcb in position 5: invalid continuation byte UnicodeDecodeError:“ utf-8”编解码器无法解码位置127的字节0xd0:数据意外结束 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 127: unexpected end of data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM