简体   繁体   English

python列表中重复项的索引

[英]Index of duplicates items in a python list

Does anyone know how I can get the index position of duplicate items in a python list?有谁知道如何获取 python 列表中重复项的索引位置? I have tried doing this and it keeps giving me only the index of the 1st occurrence of the of the item in the list.我试过这样做,它一直只给我列表中第一次出现的索引。

List = ['A', 'B', 'A', 'C', 'E']

I want it to give me:我希望它给我:

index 0: A   
index 2: A

You want to pass in the optional second parameter to index, the location where you want index to start looking.您希望将可选的第二个参数传递给 index,即您希望 index 开始查找的位置。 After you find each match, reset this parameter to the location just after the match that was found.找到每个匹配项后,将此参数重置为找到匹配项之后的位置。

def list_duplicates_of(seq,item):
    start_at = -1
    locs = []
    while True:
        try:
            loc = seq.index(item,start_at+1)
        except ValueError:
            break
        else:
            locs.append(loc)
            start_at = loc
    return locs

source = "ABABDBAAEDSBQEWBAFLSAFB"
print(list_duplicates_of(source, 'B'))

Prints:印刷:

[1, 3, 5, 11, 15, 22]

You can find all the duplicates at once in a single pass through source, by using a defaultdict to keep a list of all seen locations for any item, and returning those items that were seen more than once.通过使用 defaultdict 保留任何项目的所有已看到位置的列表,并返回那些多次看到的项目,您可以在一次通过源中一次找到所有重复项。

from collections import defaultdict

def list_duplicates(seq):
    tally = defaultdict(list)
    for i,item in enumerate(seq):
        tally[item].append(i)
    return ((key,locs) for key,locs in tally.items() 
                            if len(locs)>1)

for dup in sorted(list_duplicates(source)):
    print(dup)

Prints:印刷:

('A', [0, 2, 6, 7, 16, 20])
('B', [1, 3, 5, 11, 15, 22])
('D', [4, 9])
('E', [8, 13])
('F', [17, 21])
('S', [10, 19])

If you want to do repeated testing for various keys against the same source, you can use functools.partial to create a new function variable, using a "partially complete" argument list, that is, specifying the seq, but omitting the item to search for:如果要针对同一个源对各种key做重复测试,可以使用functools.partial创建一个新的函数变量,使用“部分完整”的参数列表,即指定seq,但省略要搜索的项为了:

from functools import partial
dups_in_source = partial(list_duplicates_of, source)

for c in "ABDEFS":
    print(c, dups_in_source(c))

Prints:印刷:

A [0, 2, 6, 7, 16, 20]
B [1, 3, 5, 11, 15, 22]
D [4, 9]
E [8, 13]
F [17, 21]
S [10, 19]
>>> def duplicates(lst, item):
...   return [i for i, x in enumerate(lst) if x == item]
... 
>>> duplicates(List, "A")
[0, 2]

To get all duplicates, you can use the below method, but it is not very efficient.要获取所有重复项,您可以使用以下方法,但效率不高。 If efficiency is important you should consider Ignacio's solution instead.如果效率很重要,您应该考虑 Ignacio 的解决方案。

>>> dict((x, duplicates(List, x)) for x in set(List) if List.count(x) > 1)
{'A': [0, 2]}

As for solving it using the index method of list instead, that method takes a second optional argument indicating where to start, so you could just repeatedly call it with the previous index plus 1.至于使用listindex方法解决它,该方法采用第二个可选参数指示从哪里开始,因此您可以使用前一个索引加 1 重复调用它。

>>> List.index("A")
0
>>> List.index("A", 1)
2

EDIT Fixed issue raised in comments.编辑修复了评论中提出的问题。

I made a benchmark of all solutions suggested here and also added another solution to this problem (described in the end of the answer).我对此处建议的所有解决方案进行了基准测试,并为此问题添加了另一个解决方案(在答案末尾进行了描述)。

Benchmarks基准

First, the benchmarks.首先,基准。 I initialize a list of n random ints within a range [1, n/2] and then call timeit over all algorithms我在[1, n/2]范围内初始化一个包含n随机整数的列表,然后在所有算法上调用timeit

The solutions of @ Paul McGuire and @ Ignacio Vazquez-Abrams works about twice as fast as the rest on the list of 100 ints: @ Paul McGuire和 @ Ignacio Vazquez-Abrams的解决方案的运行速度大约是 100 个整数列表中其他解决方案的两倍:

Testing algorithm on the list of 100 items using 10000 loops
Algorithm: dupl_eat
Timing: 1.46247477189
####################
Algorithm: dupl_utdemir
Timing: 2.93324529055
####################
Algorithm: dupl_lthaulow
Timing: 3.89198786645
####################
Algorithm: dupl_pmcguire
Timing: 0.583058259784
####################
Algorithm: dupl_ivazques_abrams
Timing: 0.645062989076
####################
Algorithm: dupl_rbespal
Timing: 1.06523873786
####################

If you change the number of items to 1000, the difference becomes much bigger ( BTW, I'll be happy if someone could explain why ) :如果您将项目数更改为 1000,差异会变得更大(顺便说一句,如果有人能解释原因,我会很高兴):

Testing algorithm on the list of 1000 items using 1000 loops
Algorithm: dupl_eat
Timing: 5.46171654555
####################
Algorithm: dupl_utdemir
Timing: 25.5582547323
####################
Algorithm: dupl_lthaulow
Timing: 39.284285326
####################
Algorithm: dupl_pmcguire
Timing: 0.56558489513
####################
Algorithm: dupl_ivazques_abrams
Timing: 0.615980005148
####################
Algorithm: dupl_rbespal
Timing: 1.21610942322
####################

On the bigger lists, the solution of @ Paul McGuire continues to be the most efficient and my algorithm begins having problems.在更大的列表中,@ Paul McGuire的解决方案仍然是最有效的,我的算法开始出现问题。

Testing algorithm on the list of 1000000 items using 1 loops
Algorithm: dupl_pmcguire
Timing: 1.5019953958
####################
Algorithm: dupl_ivazques_abrams
Timing: 1.70856155898
####################
Algorithm: dupl_rbespal
Timing: 3.95820421595
####################

The full code of the benchmark is here基准测试的完整代码在这里

Another algorithm另一种算法

Here is my solution to the same problem:这是我对同一问题的解决方案:

def dupl_rbespal(c):
    alreadyAdded = False
    dupl_c = dict()
    sorted_ind_c = sorted(range(len(c)), key=lambda x: c[x]) # sort incoming list but save the indexes of sorted items

    for i in xrange(len(c) - 1): # loop over indexes of sorted items
        if c[sorted_ind_c[i]] == c[sorted_ind_c[i+1]]: # if two consecutive indexes point to the same value, add it to the duplicates
            if not alreadyAdded:
                dupl_c[c[sorted_ind_c[i]]] = [sorted_ind_c[i], sorted_ind_c[i+1]]
                alreadyAdded = True
            else:
                dupl_c[c[sorted_ind_c[i]]].append( sorted_ind_c[i+1] )
        else:
            alreadyAdded = False
    return dupl_c

Although it's not the best it allowed me to generate a little bit different structure needed for my problem (i needed something like a linked list of indexes of the same value)虽然它不是最好的,但它允许我为我的问题生成一些不同的结构(我需要类似具有相同值的索引的链接列表)

dups = collections.defaultdict(list)
for i, e in enumerate(L):
  dups[e].append(i)
for k, v in sorted(dups.iteritems()):
  if len(v) >= 2:
    print '%s: %r' % (k, v)

And extrapolate from there.并从那里推断。

I think I found a simple solution after a lot of irritation :我想我在经历了很多刺激后找到了一个简单的解决方案:

if elem in string_list:
    counter = 0
    elem_pos = []
    for i in string_list:
        if i == elem:
            elem_pos.append(counter)
        counter = counter + 1
    print(elem_pos)

This prints a list giving you the indexes of a specific element ("elem")这将打印一个列表,为您提供特定元素的索引(“elem”)

Using new "Counter" class in collections module, based on lazyr's answer:根据lazyr的回答,在集合模块中使用新的“Counter”类:

>>> import collections
>>> def duplicates(n): #n="123123123"
...     counter=collections.Counter(n) #{'1': 3, '3': 3, '2': 3}
...     dups=[i for i in counter if counter[i]!=1] #['1','3','2']
...     result={}
...     for item in dups:
...             result[item]=[i for i,j in enumerate(n) if j==item] 
...     return result
... 
>>> duplicates("123123123")
{'1': [0, 3, 6], '3': [2, 5, 8], '2': [1, 4, 7]}
from collections import Counter, defaultdict

def duplicates(lst):
    cnt= Counter(lst)
    return [key for key in cnt.keys() if cnt[key]> 1]

def duplicates_indices(lst):
    dup, ind= duplicates(lst), defaultdict(list)
    for i, v in enumerate(lst):
        if v in dup: ind[v].append(i)
    return ind

lst= ['a', 'b', 'a', 'c', 'b', 'a', 'e']
print duplicates(lst) # ['a', 'b']
print duplicates_indices(lst) # ..., {'a': [0, 2, 5], 'b': [1, 4]})

A slightly more orthogonal (and thus more useful) implementation would be:稍微更正交(因此更有用)的实现将是:

from collections import Counter, defaultdict

def duplicates(lst):
    cnt= Counter(lst)
    return [key for key in cnt.keys() if cnt[key]> 1]

def indices(lst, items= None):
    items, ind= set(lst) if items is None else items, defaultdict(list)
    for i, v in enumerate(lst):
        if v in items: ind[v].append(i)
    return ind

lst= ['a', 'b', 'a', 'c', 'b', 'a', 'e']
print indices(lst, duplicates(lst)) # ..., {'a': [0, 2, 5], 'b': [1, 4]})

Wow, everyone's answer is so long.哇,大家的回答好长啊。 I simply used a pandas dataframe , masking , and the duplicated function ( keep=False markes all duplicates as True , not just first or last):我只是使用了pandas dataframemasking重复函数( keep=False将所有重复项标记为True ,而不仅仅是第一个或最后一个):

import pandas as pd
import numpy as np
np.random.seed(42)  # make results reproducible

int_df = pd.DataFrame({'int_list': np.random.randint(1, 20, size=10)})
dupes = int_df['int_list'].duplicated(keep=False)
print(int_df['int_list'][dupes].index)

This should return Int64Index([0, 2, 3, 4, 6, 7, 9], dtype='int64') .这应该返回Int64Index([0, 2, 3, 4, 6, 7, 9], dtype='int64')

def index(arr, num):
    for i, x in enumerate(arr):
        if x == num:
            print(x, i)

#index(List, 'A')

string_list = ['A', 'B', 'C', 'B', 'D', 'B']
pos_list = []
for i in range(len(string_list)):
    if string_list[i] = ='B':
        pos_list.append(i)
print pos_list

In a single line with pandas 1.2.2 and numpy :在一行中与pandas 1.2.2numpy

 import numpy as np
 import pandas as pd
 
 idx = np.where(pd.DataFrame(List).duplicated(keep=False))

The argument keep=False will mark every duplicate as True and np.where() will return an array with the indices where the element in the array was True .参数keep=False会将每个重复项标记为Truenp.where()将返回一个数组,其中包含数组中元素为True的索引。

I'll mention the more obvious way of dealing with duplicates in lists.我将提到处理列表中重复项的更明显的方法。 In terms of complexity, dictionaries are the way to go because each lookup is O(1).就复杂性而言,字典是可行的方法,因为每次查找都是 O(1)。 You can be more clever if you're only interested in duplicates...如果您只对重复项感兴趣,那么您可以更聪明...

my_list = [1,1,2,3,4,5,5]
my_dict = {}
for (ind,elem) in enumerate(my_list):
    if elem in my_dict:
        my_dict[elem].append(ind)
    else:
        my_dict.update({elem:[ind]})

for key,value in my_dict.iteritems():
    if len(value) > 1:
        print "key(%s) has indices (%s)" %(key,value)

which prints the following:打印以下内容:

key(1) has indices ([0, 1])
key(5) has indices ([5, 6])
a= [2,3,4,5,6,2,3,2,4,2]
search=2
pos=0
positions=[]

while (search in a):
    pos+=a.index(search)
    positions.append(pos)
    a=a[a.index(search)+1:]
    pos+=1

print "search found at:",positions
def find_duplicate(list_):
    duplicate_list=[""]

    for k in range(len(list_)):
        if duplicate_list.__contains__(list_[k]):
            continue
        for j in range(len(list_)):
            if k == j:
                continue
            if list_[k] == list_[j]:
                duplicate_list.append(list_[j])
                print("duplicate "+str(list_.index(list_[j]))+str(list_.index(list_[k])))

Here is one that works for multiple duplicates and you don't need to specify any values:这是一种适用于多个重复项的方法,您无需指定任何值:

List = ['A', 'B', 'A', 'C', 'E', 'B'] # duplicate two 'A's two 'B's

ix_list = []
for i in range(len(List)):
    try:
        dup_ix = List[(i+1):].index(List[i]) + (i + 1) # dup onwards + (i + 1)
        ix_list.extend([i, dup_ix]) # if found no error, add i also
    except:
        pass
    
ix_list.sort()

print(ix_list)
[0, 1, 2, 5]
def dup_list(my_list, value):
    '''
    dup_list(list,value)
        This function finds the indices of values in a list including duplicated values.

        list: the list you are working on

        value: the item of the list you want to find the index of

            NB: if a value is duplcated, its indices are stored in a list
            If only one occurence of the value, the index is stored as an integer.

            Therefore use isinstance method to know how to handle the returned value
    '''
    value_list = []
    index_list = []
    index_of_duped = []

    if my_list.count(value) == 1:
        return my_list.index(value)  
        
    elif my_list.count(value) < 1:
        return 'Your argument is not in the list'

    else:
        for item in my_list:
            value_list.append(item)
            length = len(value_list)
            index = length - 1
            index_list.append(index)

            if item == value:
                index_of_duped.append(max(index_list))

        return index_of_duped

# function call eg dup_list(my_list, 'john')
def duplicates(list,dup):
  a=[list.index(dup)]
  for i in list:
     try: 
        a.append(list.index(dup,a[-1]+1))
     except:
        for i in a:
           print(f'index {i}: '+dup)
        break
duplicates(['A', 'B', 'A', 'C', 'E'],'A')

I think this is the fastest among all( I checked the time taken ,as a user above gave time for some conditions).我认为这是最快的(我检查了花费的时间,因为上面的用户在某些情况下给出了时间)。

      Output:
            index 0: A
            index 2: A

If you want to get index of all duplicate elements of different types you can try this solution:如果您想获取不同类型的所有重复元素的索引,您可以尝试以下解决方案:

# note: below list has more than one kind of duplicates
List = ['A', 'B', 'A', 'C', 'E', 'E', 'A', 'B', 'A', 'A', 'C']
d1 = {item:List.count(item) for item in List}  # item and their counts
elems = list(filter(lambda x: d1[x] > 1, d1))  # get duplicate elements
d2 = dict(zip(range(0, len(List)), List))  # each item and their indices

# item and their list of duplicate indices
res = {item: list(filter(lambda x: d2[x] == item, d2)) for item in elems}

Now, if you print(res) you'll get to see this:现在,如果你print(res)你会看到这个:

{'A': [0, 2, 6, 8, 9], 'B': [1, 7], 'C': [3, 10], 'E': [4, 5]}

I've decided to work with enumerate instead, so:我决定改用 enumerate ,所以:

numlist = [1, 3, 12, 1, 3]
keylist = []
for key, n in enumerate(numlist): 
  if n < 10:
    keylist.append(key)

print(keylist)

The output should be:输出应该是:

[0, 1, 3, 4]

Enumerate's function is applying a crescent key to each item on the list, and this is my parameter to return the item's "index". Enumerate 的功能是对列表中的每个项目应用一个新月键,这是我返回项目“索引”的参数。

This is a good question and there is a lot of ways to it.这是一个很好的问题,有很多方法可以解决。

The code below is one of the ways to do it下面的代码是其中一种方法

letters = ["a", "b", "c", "d", "e", "a", "a", "b"] 

lettersIndexes = [i for i in range(len(letters))] # i created a list that contains the indexes of my previous list
counter = 0 
for item in letters: 
    if item == "a": 
        print(item, lettersIndexes[counter]) 
    counter += 1 # for each item it increases the counter which means the index 

An other way to get the indexes but this time stored in a list获取索引的另一种方法,但这次存储在列表中

letters = ["a", "b", "c", "d", "e", "a", "a", "b"] 
lettersIndexes = [i for i in range(len(letters)) if letters[i] == "a" ] 
print(lettersIndexes) # as you can see we get a list of the indexes that we want.

Good day再会

Using a dictionary approach based on setdefault instance method.使用基于setdefault实例方法的字典方法。

List = ['A', 'B', 'A', 'C', 'B', 'E', 'B']

# keep track of all indices of every term
duplicates = {}
for i, key in enumerate(List):
    duplicates.setdefault(key, []).append(i)

# print only those terms with more than one index
template = 'index {}: {}'
for k, v in duplicates.items():
    if len(v) > 1:
        print(template.format(k, str(v).strip('][')))    

Remark: Counter , defaultdict and other container class from collections are subclasses of dict hence share the setdefault method as well备注: Counterdefaultdictcollections中的其他容器类是dict的子类,因此也共享setdefault方法

You could also use list comprehensions as follows: 您还可以如下使用列表推导:

List = ['A', 'B', 'A', 'C', 'E']

## you could pass a variable instead of "A"
idx = [i for i in range(len(List)) if List[i] == "A"] 

print(idx)
[0, 2]

I just make it simple:我只是简单地说:

i = [1,2,1,3]
k = 0
for ii in i:    
if ii == 1 :
    print ("index of 1 = ", k)
k = k+1

output:输出:

 index of 1 =  0

 index of 1 =  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM