[英]Remove non ascii characters from a string? (in python)
I am trying to parse a string from an HTML file that has multiple lines that have a mix of ascii and non-ascii characters such as this: 我正在尝试从具有多行混合了ascii和非ascii字符的多行的HTML文件中解析一个字符串,例如:
"industrial light & \u003cbr\u003emagic, lucasarts"
I have tried to encode the string into ascii using the encode function but it only returns the same value that was put into it. 我尝试使用编码函数将字符串编码为ascii,但是它只返回与它相同的值。
str = "industrial light & \u003cbr\u003emagic, lucasarts"
str.encode('ascii',errors='ignore')
returns "industrial light & \u003cbr\u003emagic, lucasarts"
Any help woud be greatly appreciated. 任何帮助将不胜感激。
I found the problem. 我发现了问题。 I was trying to decode it in python 2. Python 2 and python 3 handle this kind of conversion differently. 我试图在python 2中对其进行解码。Python2和python 3对这种转换的处理方式不同。 Once I tried it in python 3 everything worked fine. 一旦我在python 3中尝试过,一切都可以正常工作。 Thank you all for your help! 谢谢大家的帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.