[英]How to get rid of weird characters in python string?
I have lines that contains some pesky control characters: 我的行包含一些讨厌的控制字符:
When I tried to read the file and then do a str.replace()
, these control characters didn't get replaced. 当我尝试读取文件然后执行str.replace()
,这些控制字符没有被替换。 I've tried this but it's still sticking around. 我已经尝试过了,但它仍然存在。
with io.open('infile', 'r', encoding='utf8') as fin:
for line in fin:
line = line.replace(u'\u0094', '"').replace(u'\u0093', '"').replace(u'\u0092', "'").replace(u'\u0096', '"').replace(u'\u0084', '"')
How do I get these strings replaces? 如何获得这些字符串替换? Is there a cannonical way to replace these strings (they look like quotation marks / whitespaces of various kind)? 有没有一种规范的方法可以替换这些字符串(它们看起来像引号/各种空白)?
What are these characters anyway? 这些字符到底是什么? What is u'\'
? 什么是u'\'
?
上次遇到该问题时,是因为我从ascii范围以外获取字符,所以边界错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.