简体   繁体   English

从字符串中删除非ASCII字符? (在python中)

[英]Remove non ascii characters from a string? (in python)

I am trying to parse a string from an HTML file that has multiple lines that have a mix of ascii and non-ascii characters such as this: 我正在尝试从具有多行混合了ascii和非ascii字符的多行的HTML文件中解析一个字符串,例如:

"industrial light & \u003cbr\u003emagic, lucasarts"

I have tried to encode the string into ascii using the encode function but it only returns the same value that was put into it. 我尝试使用编码函数将字符串编码为ascii,但是它只返回与它相同的值。

str = "industrial light & \u003cbr\u003emagic, lucasarts"
str.encode('ascii',errors='ignore')
returns "industrial light & \u003cbr\u003emagic, lucasarts"

Any help woud be greatly appreciated. 任何帮助将不胜感激。

I found the problem. 我发现了问题。 I was trying to decode it in python 2. Python 2 and python 3 handle this kind of conversion differently. 我试图在python 2中对其进行解码。Python2和python 3对这种转换的处理方式不同。 Once I tried it in python 3 everything worked fine. 一旦我在python 3中尝试过,一切都可以正常工作。 Thank you all for your help! 谢谢大家的帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM