简体   繁体   English

如何从 Python 中的一组字符串中删除特定的子字符串?

[英]How to remove specific substrings from a set of strings in Python?

I have a set of strings and all the strings have one of two specific substrings which I want to remove:我有一组字符串,所有字符串都有我要删除的两个特定子字符串之一:

set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}

I want the " .good " and " .bad " substrings removed from all the strings.我希望从所有字符串中删除“ .good ”和“ .bad ”子字符串。 I tried this:我试过这个:

for x in set1:
    x.replace('.good', '')
    x.replace('.bad', '')

but it doesn't seem to work, set1 stays exactly the same.但它似乎不起作用, set1保持不变。 I tried using for x in list(set1) instead but that doesn't change anything.我尝试for x in list(set1)但这并没有改变任何东西。

Strings are immutable.字符串是不可变的。 str.replace creates a new string. str.replace创建一个字符串。 This is stated in the documentation:这在文档中有所说明:

str.replace(old, new[, count])

Return a copy of the string with all occurrences of substring old replaced by new .返回字符串的副本,其中所有出现的子字符串old替换为new [...] [...]

This means you have to re-allocate the set or re-populate it (re-allocating is easier with a set comprehension ):这意味着您必须重新分配集合或重新填充它(使用集合理解更容易重新分配):

new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}

PS if you're using Python 3.9 or newer, see DineshKumar answer . PS 如果您使用的是Python 3.9或更新版本,请参阅DineshKumar answer

>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'

.replace doesn't change the string, it returns a copy of the string with the replacement. .replace不会更改字符串,它会返回带有替换的字符串的副本。 You can't change the string directly because strings are immutable.您不能直接更改字符串,因为字符串是不可变的。

You need to take the return values from x.replace and put them in a new set.您需要从x.replace获取返回值并将它们放入一个新集合中。

In Python 3.9 + you could remove the suffix using str.removesuffix('mysuffix') .Python 3.9 + 中,您可以使用str.removesuffix('mysuffix')删除后缀。 From the docs :文档

If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)] .如果字符串以后缀字符串结尾并且该后缀不为空,则返回string[:-len(suffix)] Otherwise, return a copy of the original string否则,返回原始字符串的副本

So you can either create a new empty set and add each element without the suffix to it:因此,您可以创建一个新的空集并添加每个不带后缀的元素:

set1  = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}

set2 = set()
for s in set1:
   set2.add(s.removesuffix(".good").removesuffix(".bad"))

Or create the new set using a set comprehension:或者使用集合推导创建新集合:

set2 = {s.removesuffix(".good").removesuffix(".bad") for s in set1}
   
print(set2)

Output:输出:

{'Orange', 'Pear', 'Apple', 'Banana', 'Potato'}

All you need is a bit of black magic!你所需要的只是一点黑魔法!

>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']

When there are multiple substrings to remove, one simple and effective option is to use re.sub with a compiled pattern that involves joining all the substrings-to-remove using the regex OR ( | ) pipe.当有多个要删除的子字符串时,一个简单而有效的选择是将re.sub与编译模式一起使用,该模式涉及使用正则表达式 OR ( | ) 管道连接所有要删除的子字符串。

import re

to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']

p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']

You could do this:你可以这样做:

import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}

for x in set1:
    x.replace('.good',' ')
    x.replace('.bad',' ')
    x = re.sub('\.good$', '', x)
    x = re.sub('\.bad$', '', x)
    print(x)
# practices 2
str = "Amin Is A Good Programmer"
new_set = str.replace('Good', '')
print(new_set)

 

print : Amin Is A  Programmer

I did the test (but it is not your example) and the data does not return them orderly or complete我做了测试(但这不是你的例子)并且数据没有有序或完整地返回它们

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}

I proved that this works:我证明了这行得通:

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']

or或者

>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
...     newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']

If list如果列表

I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this我正在为一个列表做一些事情,它是一组字符串,你想删除所有具有某个子字符串的行,你可以这样做

import re
def RemoveInList(sub,LinSplitUnOr):
    indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
    A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
    return A

where sub is a patter that you do not wish to have in a list of lines LinSplitUnOr其中sub是您不希望在行列表中包含的LinSplitUnOr

for example例如

A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)

Then A will be那么A将是

在此处输入图像描述

I have a set of strings set1 , and all the strings in set1 have a two specific substrings which I don't need and want to remove.我有一组字符串的set1 ,并在所有的字符串set1有我不需要,想删除两个具体子。
Sample Input: set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}示例输入: set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
So basically I want the .good and .bad substrings removed from all the strings.所以基本上我想要的.good.bad从所有的字符串删除子。
What I tried:我试过的

for x in set1:
    x.replace('.good','')
    x.replace('.bad','')

But this doesn't seem to work at all.但这似乎根本不起作用。 There is absolutely no change in the output and it is the same as the input.输出绝对没有变化,它与输入相同。 I tried using for x in list(set1) instead of the original one but that doesn't change anything.我尝试使用for x in list(set1)代替原始的for x in list(set1)但这没有任何改变。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM