简体   繁体   English

从字符串中删除反斜杠

[英]removing an backslash from a string

I have a string that is a sentence like I don't want it, there'll be others 我有一个字符串,这是一个I don't want it, there'll be others的句子I don't want it, there'll be others

So the text looks like this I don\\'t want it, there\\'ll be other 因此,文字看起来像I don\\'t want it, there\\'ll be other

for some reason a \\ comes with the text next to the ' . 由于某种原因, \\附带' 。旁边的文字。 It was read in from another source. 它是从另一个来源读入的。 I want to remove it, but can't. 我想删除它,但不能。 I've tried. 我试过了。 sentence.replace("\\'","'")

sentence.replace(r"\\'","'")

sentence.replace("\\\\","")

sentence.replace(r"\\\\","")

sentence.replace(r"\\\\\\\\","")

I know the \\ is to escape something, so not sure how to do it with the quotes 我知道\\是为了逃避某些事情,所以不知道怎么用引号来做

The \\ is just there to escape the ' character. \\只是逃避 '角色' It is only visible in the representation ( repr ) of the string, it's not actually a character in the string. 它只在字符串的表示( repr )中可见,它实际上不是字符串中的字符。 See the following demo 请参阅以下演示

>>> repr("I don't want it, there'll be others")
'"I don\'t want it, there\'ll be others"'

>>> print("I don't want it, there'll be others")
I don't want it, there'll be others

Try to use: 尝试使用:

sentence.replace("\\", "")

You need two backslashes because first of them act as escape symbol, and second is symbol that you need to replace. 您需要两个反斜杠,因为它们中的第一个充当转义符号,第二个是您需要替换的符号。

It is better to use regular expression to remove backslash: 最好使用正则表达式来删除反斜杠:

>>> re.sub(u"u\005c'", r"'", "I don\'t want it, there\'ll be other")
"I don't want it, there'll be other"

If your text comes from crawled text and you didn't clean it up by unescaping before you process it with NLP tools, then you could easily unescape the HTML markups, eg: 如果您的文本来自已爬行的文本,并且在使用NLP工具处理它之前没有通过unescaping进行清理,那么您可以轻松地取消HTML标记,例如:

In python2.x : python2.x

>>> import sys; sys.version
'2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'
>>> import HTMLParser
>>> txt = """I don\'t want it, there\'ll be other"""
>>> HTMLParser.HTMLParser().unescape(txt)
"I don't want it, there'll be other"

In python3 : python3

>>> import sys; sys.version
'3.4.0 (default, Jun 19 2015, 14:20:21) \n[GCC 4.8.2]'
>>> import html
>>> txt = """I don\'t want it, there\'ll be other"""
>>> html.unescape(txt)
"I don't want it, there'll be other"

See also: How do I unescape HTML entities in a string in Python 3.1? 另请参阅: 如何在Python 3.1中的字符串中取消HTML实体?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM