[英]remove characters from a python string
我有幾個python字符串,我希望從中刪除不需要的字符。
例子:
"This is '-' a test"
should be "This is a test"
"This is a test L)[_U_O-Y OH : l’J1.l'}/"
should be "This is a test"
"> FOO < BAR"
should be "FOO BAR"
"I<<W5§!‘1“¢!°\" I"
should be ""
(because if only words are extracted then it returns I W I and none of them form words)
"l‘?£§l%nbia ;‘\\~siI.ve_rswinq m"
should be ""
"2|'J]B"
should be ""
到目前為止,這就是我所擁有的,但是它並沒有保留單詞之間的原始空格。
>>> line = re.sub(r"\W+","","This is '-' a test")
>>> line
'Thisisatest'
>>> line = re.sub(r"\W+","","This is a test L)[_U_O-Y OH : l’J1.l'}/")
>>> line
'ThisisatestL_U_OYOHlJ1l'
#although i would prefer this to be "This is a test" but if not possible i would
prefer "This is a test L_U_OYOHlJ1l"
>>> line = re.sub(r"\W+","","> FOO < BAR")
>>> line
'FOOBAR'
>>> line = re.sub(r"\W+","","I<<W5§!‘1“¢!°\" I")
>>> line
'IW51I'
>>> line = re.sub(r"\W+","","l‘?£§l%nbia ;‘\\~siI.ve_rswinq m")
>>> line
'llnbiasiIve_rswinqm'
>>> line = re.sub(r"\W+","","2|'J]B")
>>> line
'2JB'
稍后,我將通過預定義單詞列表過濾正則表達式清除的單詞。
我將使用拆分和過濾器,如下所示:
' '.join(word for word in line.split() if word.isalpha() and word.lower() in list)
這將刪除不在列表中的所有非字母詞和字母詞。
例子:
def myfilter(string):
words = {'this', 'test', 'i', 'a', 'foo', 'bar'}
return ' '.join(word for word in line.split() if word.isalpha() and word.lower() in words)
>>> myfilter("This is '-' a test")
'This a test'
>>> myfilter("This is a test L)[_U_O-Y OH : l’J1.l'}/")
'This a test'
>>> myfilter("> FOO < BAR")
'FOO BAR'
>>> myfilter("I<<W5§!‘1“¢!°\" I")
'I'
>>> myfilter("l‘?£§l%nbia ;‘\\~siI.ve_rswinq m")
''
>>> myfilter("2|'J]B")
''
這將清除具有至少一個非字母字符的任何一組非空格符號。 它將留下一些不需要的字母:
re.sub(r"\w*[^a-zA-Z ]+\w*","","This is a test L)[_U_O-Y OH : l’J1.l'}/")
給出:
'This is a test OH '
它還將留下不止一個空間的組:
re.sub(r"[^a-zA-Z ]+\w*","","This is '-' a test")
'This is a test' # two spaces
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.