从python字符串中删除字符

Question

I have several python strings from which I want unwanted characters removed. 我有几个python字符串，我希望从中删除不需要的字符。

Examples: 例子：

"This is '-' a test" 
     should be "This is a test"
"This is a test L)[_U_O-Y OH : l’J1.l'}/"
     should be "This is a test"
"> FOO < BAR" 
     should be "FOO BAR"
"I<<W5§!‘1“¢!°\" I" 
     should be "" 
     (because if only words are extracted then it returns I W I and none of them form words)
"l‘?£§l%nbia  ;‘\\~siI.ve_rswinq m"
     should be ""
"2|'J]B"
     should be ""

this is what I have so far, however, it is not keeping the original spaces between words. 到目前为止，这就是我所拥有的，但是它并没有保留单词之间的原始空格。

>>> line = re.sub(r"\W+","","This is '-' a test")
>>> line
'Thisisatest'
>>> line = re.sub(r"\W+","","This is a test L)[_U_O-Y OH : l’J1.l'}/")
>>> line
'ThisisatestL_U_OYOHlJ1l' 
#although i would prefer this to be "This is a test" but if not possible i would 
 prefer "This is a test L_U_OYOHlJ1l"
>>> line = re.sub(r"\W+","","> FOO < BAR")
>>> line
'FOOBAR'
>>> line = re.sub(r"\W+","","I<<W5§!‘1“¢!°\" I")
>>> line
'IW51I'
>>> line = re.sub(r"\W+","","l‘?£§l%nbia  ;‘\\~siI.ve_rswinq m")  
>>> line
'llnbiasiIve_rswinqm'
>>> line = re.sub(r"\W+","","2|'J]B")
>>> line
'2JB'

I will be filtering the regex cleaned words through a list of predefined words later. 稍后，我将通过预定义单词列表过滤正则表达式清除的单词。

Answer 1

I'd go with a split and filter, like this: 我将使用拆分和过滤器，如下所示：

' '.join(word for word in line.split() if word.isalpha() and word.lower() in list)

This will remove all non-alphabetic words and alphabetic words that are not in the list. 这将删除不在列表中的所有非字母词和字母词。

Examples: 例子：

def myfilter(string):
    words = {'this', 'test', 'i', 'a', 'foo', 'bar'}
    return ' '.join(word for word in line.split() if word.isalpha() and word.lower() in words)

>>> myfilter("This is '-' a test")
'This a test'
>>> myfilter("This is a test L)[_U_O-Y OH : l’J1.l'}/")
'This a test'
>>> myfilter("> FOO < BAR")
'FOO BAR'
>>> myfilter("I<<W5§!‘1“¢!°\" I")
'I'
>>> myfilter("l‘?£§l%nbia  ;‘\\~siI.ve_rswinq m")
''
>>> myfilter("2|'J]B")
''

Answer 2

This one clears out any group of non-space symbols with at least one non alphabetic character. 这将清除具有至少一个非字母字符的任何一组非空格符号。 It will leaves some unwanted group of letters though : 它将留下一些不需要的字母：

re.sub(r"\w*[^a-zA-Z ]+\w*","","This is a test L)[_U_O-Y OH : l’J1.l'}/")

gives : 给出：

'This is a test  OH  '

It will also leave groups of more than one space : 它还将留下不止一个空间的组：

re.sub(r"[^a-zA-Z ]+\w*","","This is '-' a test")
'This is  a test'  # two spaces

从python字符串中删除字符

问题描述

2 个解决方案

解决方案1
0 已采纳 2013-10-13 17:31:13

解决方案2
0 2013-10-13 17:35:56

从python字符串中删除字符

问题描述

2 个解决方案

解决方案1 0 已采纳 2013-10-13 17:31:13

解决方案2 0 2013-10-13 17:35:56

解决方案1
0 已采纳 2013-10-13 17:31:13

解决方案2
0 2013-10-13 17:35:56