简体   繁体   English

从列表中存在的字符串中删除所有单词

[英]Remove all words from a string that exist in a list

community.社区。

I need to write a function that goes through a string and checks if each word exists in a list, if the word exists in the (Remove list) it should remove that word if not leave it alone.我需要编写一个 function 来遍历一个字符串并检查每个单词是否存在于列表中,如果该单词存在于(删除列表)中,它应该删除该单词,如果不单独放置的话。

i wrote this:我写了这个:

def remove_make(x):
    a = x.split()
    for word in a: 
        if word in remove: # True
            a = a.remove(word)  
        else:
            pass
        return a

But it returns back the string with the (Remove) word still in there.但它会返回带有 (Remove) 字样的字符串。 Any idea how I can achieve this?知道如何实现这一目标吗?

A more terse way of doing this would be to form a regex alternation based on the list of words to remove, and then do a single regex substitution:一种更简洁的方法是根据要删除的单词列表形成正则表达式替换,然后进行单个正则表达式替换:

inp = "one two three four"
remove = ['two', 'four']
regex = r'\s*(?:' + r'|'.join(remove) + ')\s*'
out = re.sub(regex, ' ', inp).strip()
print(out)   # prints 'one three'

You can try something more simple:您可以尝试更简单的方法:

import re

remove_list = ['abc', 'cde', 'edf']
string = 'abc is walking with cde, wishing good luck to edf.'

''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])

And the result would be:结果将是:

' is walking with, wishing good luck to.' '是走在一起,祝好运。

The important part is the last line:重要的部分是最后一行:

''.join([x for x in re.split(r'(\W+)', string) if x not in remove_list])

What it does:它能做什么:

  • You are converthing the string to list of words with re.split (r'(\W+)', string), preserving all the whitespaces and punctuation as list items.您正在使用re.split (r'(\W+)', string) 将字符串转换为单词列表,并将所有空格和标点符号保留为列表项。
  • You are creating another list with list comprehension , filtering all the items, which are not in remove_list您正在使用列表理解创建另一个列表,过滤所有不在 remove_list 中的项目
  • You are converting the result list back to string with str.join()您正在使用str.join()将结果列表转换回字符串

The BNF notation for list comprehensions and a little bit more information on them may be found here列表推导的 BNF 符号和更多关于它们的信息可以在这里找到

PS: Of course, you may make this a little bit more readable if you break the one-liner into peaces and assign the result of re.split(r'(\W+)', string) to a variable and decouple the join and the comprehension. PS:当然,如果您将单行分解为和平并将re.split(r'(\W+)', string)的结果分配给变量并将连接和解耦,则可以使这更具可读性理解。

list.remove(x) returns None and modifies the list in-place by removing x it exists inside the list. list.remove(x)返回None并通过删除它存在于列表中的x来就地修改list When you do当你这样做

a = a.remove(word)

you will be effectively storing None in a and this would give an exception in the next iteration when you again do a.remove(word) ( None.remove(word) is invalid), but you don't get that either since you immediately return after the conditional (which is wrong, you need to return after the loop has finished, outside its scope).您将有效地将None存储在a中,当您再次执行a.remove(word)None.remove(word)无效)时,这将在下一次迭代中出现异常,但您也不会得到,因为您立即在条件之后return (这是错误的,您需要在循环完成后return ,超出其范围)。 This is how your function should look like (without modifying a list while iterating over it):这就是您的 function 的样子(在迭代列表时不修改列表):

remove_words = ["abc", ...] # your list of words to be removed

def remove_make(x):
    a = x.split()
    temp = a[:]
    for word in temp: 
        if word in remove_words: # True
            a.remove(word)
    # no need of 'else' also, 'return' outside the loop's scope
    return " ".join(a)

You can create a new list without the words you want to remove and then use join() function to concatenate all the words in that list.您可以创建一个不包含要删除的单词的新列表,然后使用 join() function 连接该列表中的所有单词。 Try尝试

def remove_words(string, rmlist):
    final_list = []
    word_list = string.split()
    for word in word_list:
        if word not in rmlist:
            final_list.append(word)
    
    return ' '.join(final_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM