简体   繁体   English

如何从python中的字符串中删除转义符?

[英]How to remove escape characters from string in python?

I have string that look like this text = u'\\xd7\\nRecord has been added successfully, record id: 92' . 我有一个看起来像这样的字符串text = u'\\xd7\\nRecord has been added successfully, record id: 92' I tried to remove the escape character \\xd7 and \\n from my string so that I could use it for another purpose. 我试图从字符串中删除转义字符\\xd7\\n ,以便可以将其用于其他目的。

I tried str(text) . 我尝试了str(text) It works but it could not remove character \\xd7 . 它可以工作,但是不能删除字符\\xd7

UnicodeEncodeError: 'ascii' codec can't encode character u'\\xd7' in position 0: ordinal not in range(128) UnicodeEncodeError:'ascii'编解码器无法在位置0编码字符u'\\ xd7':序数不在范围内(128)

Any way I could do to remove any escape character as such above from string? 我有什么办法可以从字符串中删除上述任何转义字符? Thanks 谢谢

You can try the following using replace : 您可以使用replace尝试以下操作:

text=u'\xd7\nRecord has been added successfully, record id: 92'
bad_chars = ['\xd7', '\n', '\x99m', "\xf0"] 
for i in bad_chars : 
    text = text.replace(i, '') 
text

You could do it by 'slicing' the string: 您可以通过“切片”字符串来实现:

string = '\xd7\nRecord has been added successfully, record id: 92'
text = string[2:]

It seems you have a unicode string like in python 2.x we have unicode strings like 似乎您有一个像python 2.x这样的unicode字符串,我们有一个像

inp_str = u'\\xd7\\nRecord has been added successfully, record id: 92' inp_str = u'\\ xd7 \\ n已成功添加记录,记录ID:92'

if you want to remove escape charecters which means almost special charecters, i hope this is one of the way for getting only ascii charecters without using any regex or any Hardcoded. 如果要删除转义字符,这意味着几乎是特殊的字符,我希望这是不使用任何正则表达式或任何硬编码的仅获取ascii字符的方法之一。

inp_str = u'\xd7\nRecord has been added successfully, record id: 92'
print inp_str.encode('ascii',errors='ignore').strip('\n')

Results :  'Record has been added successfully, record id: 92'

First i did encode because it is already a unicode, So while encoding to ascii if any charecters not in ascii level,It will Ignore.And you just strip '\\n' 首先我确实进行了编码,因为它已经是unicode了,所以在编码为ascii时,如果有任何字符不在ascii级别,它将被忽略。您只需去除'\\ n'

Hope this helps you :) 希望这对您有所帮助:)

I believe Regex can help 我相信正则表达式可以提供帮助

import re
text = u'\xd7\nRecord has been added successfully, record id: 92'
res = re.sub('[^A-Za-z0-9]+', ' ', text).strip()

Result: 结果:

'Record has been added successfully record id 92'

You could use the built-in regex library. 您可以使用内置的正则表达式库。

import re
text = u'\xd7\nRecord has been added successfully, record id: 92'
result = re.sub('[^A-Za-z0-9]+', ' ', text)

print(result)

That spits out Record has been added successfully record id 92 吐出Record has been added successfully record id 92

This seems to pass your test case if you can live without the punctuation. 如果您可以生活在没有标点符号的情况下,这似乎可以通过您的测试案例。

Try regex . 尝试regex


import re
def escape_ansi(line):
    ansi_escape =re.compile(r'(\xd7|\n)')
    return ansi_escape.sub('', line)

text = u'\xd7\nRecord has been added successfully, record id: 92'
print(escape_ansi(text))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM