简体   繁体   English

如何从包含子字符串的列表中删除字符串?

[英]How to remove strings from a list which contain a sub-string?

I have a list of strings that's structured as follows:我有一个结构如下的字符串列表:

['3M', 'Saint Paul, Minnesota', 'A. O. Smith', 'Milwaukee, Wisconsin', 'Abbott Laboratories',...]

I want to remove the strings corresponding to cities, which all contain a comma , .我想删除与城市对应的字符串,这些字符串都包含逗号,

So far my code is:到目前为止,我的代码是:

for name in names:
    if '</a>' in name:
        names.remove(name)
    if re.search(', Inc.',name) != None:
        name = name.replace(',',"")
        names.append(name)
    if ',' in name:
        names.remove(name)

But I get an error ValueError: list.remove(x): x not in list at names.remove(name) .但我收到一个错误ValueError: list.remove(x): x not in list at names.remove(name)

I can't seem to understand why the 1st block, which drops if the string contains </a> works fine, but the one with commas does not.我似乎无法理解为什么第一个块(如果字符串包含</a>会下降)工作正常,但带逗号的块却不行。

You can use list comprehension to filter out invalid elements (ie the ones containing comma).您可以使用列表推导过滤掉无效元素(即包含逗号的元素)。

>>> l = ['3M', 'Saint Paul, Minnesota', 'A. O. Smith', 'Milwaukee, Wisconsin', 'Abbott Laboratories']
>>> result = [e for e in l if "," not in e]
>>> result
['3M', 'A. O. Smith', 'Abbott Laboratories']

Going off my commend about how, in Python, we generally want to "retain things we like" rather than "purge things we don't like".在 Python 中,我们通常希望“保留我们喜欢的东西”而不是“清除我们不喜欢的东西”。 This is preferable because we can avoid changing the size of a list as we're iterating over it, which is never a good idea.这是更可取的,因为我们可以避免在迭代列表时更改列表的大小,这绝不是一个好主意。 We achieve this by filtering the original list based on a predicate ( isDesirable in this case).我们通过基于谓词(在本例中为isDesirable )过滤原始列表来实现这一点。 A predicate is a function/callable that accepts a single parameter and returns a boolean.谓词是接受单个参数并返回 boolean 的函数/可调用对象。 When used in conjunction with filter , we can create an iterator that yields only those items that satisfy the condition of the predicate.当与filter结合使用时,我们可以创建一个迭代器,它只产生那些满足谓词条件的项目。 We then consume the contents of that iterator to build up a new list:然后我们使用该迭代器的内容来构建一个新列表:

names = [
    '3M',
    'Saint Paul, Minnesota',
    'A. O. Smith',
    'Milwaukee, Wisconsin',
    'Abbott Laboratories'
]

def isDesirable(name):
    return "</a>" not in name and "," not in name

print(list(filter(isDesirable, names)))

Output: Output:

['3M', 'A. O. Smith', 'Abbott Laboratories']

However, this does not take into account the other operation you're performing: If the substring the current name contains the substring ", Inc."但是,这并没有考虑到您正在执行的其他操作:如果 substring 当前名称包含 substring ", Inc." , we still want to retain this name, but with the comma removed. ,我们仍然想保留这个名字,但是去掉了逗号。

In this situation, I would define a generator that only yields the items we want to retain.在这种情况下,我会定义一个只产生我们想要保留的项目的生成器。 If we come across the substring ", Inc."如果我们遇到 substring ", Inc." , we modify the current name and yield it. ,我们修改当前名称并生成它。 The generator's contents are then consumed to build up a new list:然后使用生成器的内容来构建一个新列表:

def filtered_names(names):
    for name in names:
        if ", Inc." in name:
            name = name.replace(",", "")
        if "</a>" in name or "," in name:
            continue
        yield name

print(list(filtered_names(names)))

This is by no means the only way of doing this.这绝不是这样做的唯一方法。 This is my personal preference.这是我个人的喜好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM