在 append 中查找项目的索引值

Question

dupes is a list of duplicate items found in a list. dupes是在列表中找到的重复项的列表。 clipb is the original list. clipb是原始列表。

I now search for a part string of dupes in clipb .我现在在clipb中搜索部分dupes字符串。 The aim at the end of the day is to append the word "duplicate" to the original list per duplicate item found.最终的目标是 append 将"duplicate"一词添加到每个找到的重复项的原始列表中。

dupes = ['0138', '0243']
clipb = ['ABC2b_0243D_K6_LOPA-PAST', 'ABC2b_0016G_M1_LOPA-PABR', 'ABC2b_0138H_M1_LOBR-BRMU', 'ABC2b_0138G_J1_LOPA-PAST', 'ABC2b_0243A_O§_STMA-MACV']

def Filter(clipb, dupes):
    return [str for str in clipb if
            any(sub in str for sub in dupes)]
            #index = clipb.index(clipb)  <<--- no idea how to add it in here 
    
rs_found = (Filter(clipb, dupes))
print ("found dupicates from the original list are: ","\n", rs_found)

Current output is only the list of duplicates found.当前 output 只是找到的重复项列表。 Found duplicates from the original list are:从原始列表中找到的重复项是：

['ABC2b_0243D_K6_LOPA-PAST', 'ABC2b_0138H_M1_LOBR-BRMU', 'ABC2b_0138G_J1_LOPA-PAST', 'ABC2b_0243A_O§_STMA-MACV']

My problem is that I have no idea how to format the Filter to include outputting the index of found duplicates so I can actually change the items.我的问题是我不知道如何格式化Filter以包括输出找到的重复项的索引，以便我可以实际更改项目。

Answer 1

Instead of just filtering out the duplicates, since you want the duplicate items with a tab and 'DUPLICATE' appended to it, just do that when you find a duplicate, instead of filtering it out:而不是仅仅过滤掉重复项，因为您希望重复项带有一个tab和“DUPLICATE”附加到它，只需在找到重复项时执行此操作，而不是将其过滤掉：

clipb = ['ABC2b_0243D_K6_LOPA-PAST', 'ABC2b_0016G_M1_LOPA-PABR', 'ABC2b_0138H_M1_LOBR-BRMU',
         'ABC2b_0138G_J1_LOPA-PAST', 'ABC2b_0243A_O§_STMA-MACV']

seen = set()
final = []
for item in clipb:
    tag = item[6:10]  # assuming tags are always at this index
    if tag in seen:
        item += '\tDUPLICATE'  # or '<space>DUPLCATE', as needed
    else:
        seen.add(tag)
    final.append(item)

print(final)
# Output:
['ABC2b_0243D_K6_LOPA-PAST',
 'ABC2b_0016G_M1_LOPA-PABR',
 'ABC2b_0138H_M1_LOBR-BRMU',
 'ABC2b_0138G_J1_LOPA-PAST\tDUPLICATE',
 'ABC2b_0243A_O§_STMA-MACV\tDUPLICATE']

Note that you don't need to pre-create a list of the duplicate tags - thats' done in the code;请注意，您不需要预先创建重复标签的列表 - 这是在代码中完成的； vaguely adapted from unique_everseen recipe from https://docs.python.org/3/library/itertools.html .模糊地改编自https://docs.python.org/3/library/itertools.html的unique_everseen配方。

Answer 2

Your current direction is quite good.你现在的方向很好。 You don't really need the index here at all!您根本不需要这里的索引！ You are using any(sub in str for sub in dupes) to check if any of the duplicated patterns is in the string which is good.您正在使用any(sub in str for sub in dupes)来检查是否有任何重复的模式在正确的字符串中。 You only a small logical refinement.你只是一个小的逻辑细化。

What should happen when the condition above is true?当上述条件为真时会发生什么？ You want to add the "duplicate" string.您要添加"duplicate"字符串。 What happens if it is not true?如果不是真的会发生什么？ Add the original string as is.按原样添加原始字符串。 So just modify the list comprehension to be:所以只需将列表理解修改为：

def Filter(clipb, dupes):
    return [s + " duplicate" if any(sub in s for sub in dupes) 
            else s
            for s in clipb]

^{* Note that I changed the str variable's name because str is the built-in type's name.} ^{* 请注意，我更改了str变量的名称，因为str是内置类型的名称。}

The output with your sample data is:带有您的示例数据的 output 是：

found dupicates from the original list are:  
 ['ABC2b_0243D_K6_LOPA-PAST duplicate', 'ABC2b_0016G_M1_LOPA-PABR', 'ABC2b_0138H_M1_LOBR-BRMU duplicate', 'ABC2b_0138G_J1_LOPA-PAST duplicate', 'ABC2b_0243A_O§_STMA-MACV duplicate']

If you want to change the original list in-place, you can use the built-in enumerate() function to iterate over index and item:如果要就地更改原始列表，可以使用内置的enumerate() function 来迭代索引和项目：

for i, s in enumerate(clipb):
    if any(sub in s for sub in dupes):
        clipb[i] = s + " duplicate"

在 append 中查找项目的索引值

问题描述

2 个解决方案

解决方案1
2 2021-04-25 14:23:39

解决方案2
2 已采纳 2021-04-25 14:23:58

在 append 中查找项目的索引值

问题描述

2 个解决方案

解决方案1 2 2021-04-25 14:23:39

解决方案2 2 已采纳 2021-04-25 14:23:58

解决方案1
2 2021-04-25 14:23:39

解决方案2
2 已采纳 2021-04-25 14:23:58