简体   繁体   中英

Remove non ascii characters from a string? (in python)

I am trying to parse a string from an HTML file that has multiple lines that have a mix of ascii and non-ascii characters such as this:

"industrial light & \u003cbr\u003emagic, lucasarts"

I have tried to encode the string into ascii using the encode function but it only returns the same value that was put into it.

str = "industrial light & \u003cbr\u003emagic, lucasarts"
str.encode('ascii',errors='ignore')
returns "industrial light & \u003cbr\u003emagic, lucasarts"

Any help woud be greatly appreciated.

I found the problem. I was trying to decode it in python 2. Python 2 and python 3 handle this kind of conversion differently. Once I tried it in python 3 everything worked fine. Thank you all for your help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM