从列表中删除元素的重复项或子字符串的python代码

Question

I have the following list as input:我有以下列表作为输入：

['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']

In the output I want to exclude 'temp/date=22-07-2019/' since its a part of 'temp/date=22-07-2019/temp=22-07-2019/'.在输出中，我想排除“temp/date=22-07-2019/”，因为它是“temp/date=22-07-2019/temp=22-07-2019/”的一部分。 Hence the output should be:因此输出应该是：

['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']

I have tried several ways but was not able to achieve this.我尝试了几种方法，但无法实现这一点。 Please suggest.请建议。 Thanks谢谢

Answer 1

You can use any with a list comprehension: 您可以将any一个用于列表理解：

r = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
result = [i for i in r if not any(i in c and len(c) > len(i) for c in r)]

Output: 输出：

['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']

Answer 2

In case your items have specific format ( temp/date=DD-MM-YY/ ): 如果您的项目具有特定格式（ temp/date=DD-MM-YY/ ）：

d = {}
lst = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/',
       'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']

for s in lst:
    k = s[:21]
    if k not in d or len(s) > len(d[k]):
        d[k] = s

print(list(d.values()))

The output: 输出：

['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']

Answer 3

You can use a dictionary: 您可以使用字典：

lst = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
dct = {re.match(r'(temp/.+?)/.*', i).group(1): i for i in sorted(lst, key=len)}
# {'temp/date=20-07-2019': 'temp/date=20-07-2019/', 'temp/date=21-07-2019': 'temp/date=21-07-2019/', 'temp/date=22-07-2019': 'temp/date=22-07-2019/temp=22-07-2019/'}

print(list(dct.values()))
# ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']

Answer 4

This solution is also taking care of identical duplicates creating a set: 此解决方案还负责创建一组相同的重复项：

example_data = ['temp/date=20-07-2019/', 'temp/date=21-07-2019/', 'temp/date=22-07-2019/', 'temp/date=22-07-2019/temp=22-07-2019/']
# Creating a set we get rid of all identical entries
unique_data = set(example_data)
result = []

# Here we cycle through unique_data adding to result list all the long strings
# since we want to keep all the long entries
[result.append(item) for item in unique_data if len(item) > 21]

# Then we cycle again and take care of adding to result all the short strings that
# are not already contained in result items
for item in unique_data:
    if len(item) == 21:
        for element in result:
            if item != element[:21]:
                result.append(item)
            break

# I am not sure you need to sort by date but this can be easily achieved with sorted
print(sorted(result))

Answer 5

I would suggest using a List Comprehensive approach because your data is already in a list. 我建议使用“ 列表综合”方法，因为您的数据已经在列表中。 Here is some basic information related to using list comprehensive and other comprehensive techniques. 这是与使用列表综合和其他综合技术有关的一些基本信息。 In the code example below, I added a couple of extra data items to your original list. 在下面的代码示例中，我在原始列表中添加了两个额外的数据项。

from pprint import pprint

input_data = ['temp/date=20-07-2019/',
            'temp/date=20-07-2019/',
            'temp/date=21-07-2019/',
            'temp/date=22-07-2019/',
            'temp/date=23-07-2019/',
            'temp/date=21-07-2019/temp=22-07-2019/',
            'temp/date=22-07-2019/temp=22-07-2019/']

##################################################################################
# This List Comprehensive does several functions:
# 1. Creates a Set() of unique strings that are contained in the input_data list 
# 2. Sorts the Set() which will produce an ordered output
# 3. Uses the not any(iterable)
##################################################################################
result = [x for x in sorted(set(input_data)) if not any(x in y and len(y) > len(x) for y in input_data)]

# The output removed the duplicate string 'temp/date=20-07-2019/' it
# also removed the strings 'temp/date=21-07-2019/' and 'temp/date=22-07-2019/'
# which were contained in other strings in the list.
pprint (result)
['temp/date=20-07-2019/', 
 'temp/date=21-07-2019/temp=22-07-2019/', 
 'temp/date=22-07-2019/temp=22-07-2019/', 
 'temp/date=23-07-2019/']

从列表中删除元素的重复项或子字符串的python代码

问题描述

4 个解决方案

解决方案1
2 2019-07-22 14:20:06

解决方案2
1 2019-07-22 14:35:18

解决方案3
0 2019-07-22 14:37:02

解决方案4
0 2019-07-22 14:57:20

解决方案5
0 2019-08-05 16:46:50

从列表中删除元素的重复项或子字符串的python代码

问题描述

4 个解决方案

解决方案1 2 2019-07-22 14:20:06

解决方案2 1 2019-07-22 14:35:18

解决方案3 0 2019-07-22 14:37:02

解决方案4 0 2019-07-22 14:57:20

解决方案5 0 2019-08-05 16:46:50

解决方案1
2 2019-07-22 14:20:06

解决方案2
1 2019-07-22 14:35:18

解决方案3
0 2019-07-22 14:37:02

解决方案4
0 2019-07-22 14:57:20

解决方案5
0 2019-08-05 16:46:50