繁体   English   中英

使用 python 从字符串中删除单词列表

[英]Remove list of word from string using python

我正在尝试使用 python 从字符串中删除单词列表。我尝试了下面的代码,但它在替换字符串中的单词时添加了空格。 是否有任何方法可以帮助仅删除单词列表中存在的单词? 请给我一些建议。

words_to_remove=['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah']

test_data = """RegExr Yeah was created by gskinner.com.
yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
"""

# Remove words
for word in words_to_remove:
    test_data = test_data.replace(word, '')

test_data
Out[46]: 'RegExr  was created by gskinner.com.\nyippe, , ouch,   , ,  ,  , h can I do wonders in this world. , it is out of the world.\nEdit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\nThe side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\nExplore results with the Tools bel. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.\n'

如果您只想删除有问题的词,您可以使用正则表达式并从您的词列表中编译一个模式以进行删除。

>>> r = re.compile(rf"\b(?:{'|'.join(words_to_remove)})\b")
>>> r.sub('', test_data)
'RegExr  was created by gskinner.com.\nyippe, , ouch,   , ,  ,  , how can I do wonders in this world. , it is out of the world.\nEdit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\nThe side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\nExplore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.\n'

现在这显然不能解决标点符号过多的问题,但您可以使用正则表达式解决该问题。 您可能会改进的初步尝试。

>>> re.sub(r'([,.:;?]\s?)[\s,.:;?]*', r'\1', r.sub('', test_data))
'RegExr  was created by gskinner.com.\nyippe, ouch, how can I do wonders in this world. it is out of the world.\nEdit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\nThe side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\nExplore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.\n'
words_to_remove = ['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah'] test_data = """RegExr Yeah was created by gskinner.com. yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world. Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode. The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns. Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English. """ splitted = test_data.split(' ') filtered = list(filter(lambda word: word not in words_to_remove, splitted)) print(' '.join(filtered))

字符串是不可变的,所以不要使用替换并继续重新创建新列表 adn

words_to_remove=set(['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah'])

test_data = """RegExr Yeah was created by gskinner.com.
yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
"""
new_data = ' '.join(i for i in test_data.split() if (i and i not in words_to_remove))
print(new_data)

output

RegExr was created by gskinner.com. yippe, ow, ouch, oh, goodness, well, no, how can I do wonders in this world. Yep, it is out of the world. Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode. The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns. Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.

如果每个单词不在words_to_remove中,您可以尝试对每个单词使用strip(',')

words_to_remove=['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah']

test_data = """RegExr Yeah was created by gskinner.com.
yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
"""

# Remove words
test_data = ' '.join(filter(lambda i: i.strip(',') not in words_to_remove, test_data.split(' ')))

print(test_data)

Output:

RegExr was created by gskinner.com.
yippe, ouch, how can I do wonders in this world. it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM