简体   繁体   English

如何摆脱python字符串中的怪异字符?

[英]How to get rid of weird characters in python string?

I have lines that contains some pesky control characters: 我的行包含一些讨厌的控制字符:

在此处输入图片说明

When I tried to read the file and then do a str.replace() , these control characters didn't get replaced. 当我尝试读取文件然后执行str.replace() ,这些控制字符没有被替换。 I've tried this but it's still sticking around. 我已经尝试过了,但它仍然存在。

with io.open('infile', 'r', encoding='utf8') as fin:
    for line in fin:
        line = line.replace(u'\u0094', '"').replace(u'\u0093', '"').replace(u'\u0092', "'").replace(u'\u0096', '"').replace(u'\u0084', '"')

How do I get these strings replaces? 如何获得这些字符串替换? Is there a cannonical way to replace these strings (they look like quotation marks / whitespaces of various kind)? 有没有一种规范的方法可以替换这些字符串(它们看起来像引号/各种空白)?

What are these characters anyway? 这些字符到底是什么? What is u'\„' ? 什么是u'\„'

上次遇到该问题时,是因为我从ascii范围以外获取字符,所以边界错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM