简体   繁体   English

在列表Python中重复元素

[英]Repeating elements in list Python

Let's say I have a list of strings: 假设我有一个字符串列表:

a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']

I want to make a list of items that appear at least twice in a row: 我想列出连续至少出现两次的项目:

result = ['a', 'c']

I know I have to use a for loop, but I can't figure out how to target the items repeated in a row. 我知道我必须使用for循环,但是我无法弄清楚如何定位连续重复的项目。 How can I do so? 我该怎么办?

EDIT: What if the same item repeats twice in a? 编辑:如果同一项目在a中重复两次该怎么办? Then the set function would be ineffective 那么设置功能将无效

a = ['a', 'b', 'a', 'a', 'c', 'a', 'a', 'a', 'd', 'd']
result = ['a', 'a', 'd']

try itertools.groupby() here: 在这里尝试itertools.groupby()

>>> from itertools import groupby,islice
>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'b']

>>> [list(g) for k,g in groupby(a)]
[['a', 'a'], ['b'], ['c', 'c', 'c'], ['b']] 

>>> [k for k,g in groupby(a) if len(list(g))>=2]
['a', 'c']

using islice() : 使用islice()

>>> [k for k,g in groupby(a) if len(list(islice(g,0,2)))==2]
>>> ['a', 'c']

using zip() and izip() : 使用zip()izip()

In [198]: set(x[0] for x in izip(a,a[1:]) if x[0]==x[1])
Out[198]: set(['a', 'c'])

In [199]: set(x[0] for x in zip(a,a[1:]) if x[0]==x[1])
Out[199]: set(['a', 'c'])

timeit results: timeit结果:

from itertools import *

a='aaaabbbccccddddefgggghhhhhiiiiiijjjkkklllmnooooooppppppppqqqqqqsssstuuvv'

def grp_isl():
    [k for k,g in groupby(a) if len(list(islice(g,0,2)))==2]

def grpby():
    [k for k,g in groupby(a) if len(list(g))>=2]

def chn():
    set(x[1] for x in chain(izip(*([iter(a)] * 2)), izip(*([iter(a[1:])] * 2))) if x[0] == x[1])

def dread():
    set(a[i] for i in range(1, len(a)) if a[i] == a[i-1])

def xdread():
    set(a[i] for i in xrange(1, len(a)) if a[i] == a[i-1])

def inrow():
    inRow = []
    last = None
    for x in a:
        if last == x and (len(inRow) == 0 or inRow[-1] != x):
            inRow.append(last)
        last = x

def zipp():
    set(x[0] for x in zip(a,a[1:]) if x[0]==x[1])

def izipp():
    set(x[0] for x in izip(a,a[1:]) if x[0]==x[1])

if __name__=="__main__":
    import timeit
    print "islice",timeit.timeit("grp_isl()", setup="from __main__ import grp_isl")
    print "grpby",timeit.timeit("grpby()", setup="from __main__ import grpby")
    print "dread",timeit.timeit("dread()", setup="from __main__ import dread")
    print "xdread",timeit.timeit("xdread()", setup="from __main__ import xdread")
    print "chain",timeit.timeit("chn()", setup="from __main__ import chn")
    print "inrow",timeit.timeit("inrow()", setup="from __main__ import inrow")
    print "zip",timeit.timeit("zipp()", setup="from __main__ import zipp")
    print "izip",timeit.timeit("izipp()", setup="from __main__ import izipp")

output: 输出:

islice 39.9123107277
grpby 30.1204478987
dread 17.8041124706
xdread 15.3691785568
chain 17.4777339702
inrow 11.8577565327           
zip 16.6348844045
izip 15.1468557105

Conclusion: 结论:

Poke's solution is the fastest solution in comparison to other alternatives. 与其他替代方案相比, Poke的解决方案是最快的解决方案。

This sounds like homework, so I'll just outline what I would do: 这听起来像是作业,所以我将概述我的工作:

  1. Iterate over a , but keep the index of each element in a variable. 遍历a ,但保持各元素的索引中的一个变量。 enumerate() will be useful. enumerate()将很有用。
  2. Inside of your for loop, start a while loop from the current item's index. for循环内部,从当前项目的索引开始while循环。
  3. Repeat the loop as long as the next element is the same as the previous (or the original). 只要下一个元素与上一个(或原始)相同,就重复循环。 break will be useful here. break将在这里有用。
  4. Count the number of times that loop repeats (you'll need some counter variable for this). 计算循环重复的次数(为此您需要一些计数器变量)。
  5. Append the item to your result if your counter variable is >= 2. 如果您的计数器变量>= 2,则将该项追加到result

My take: 我的看法:

>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
>>> inRow = []
>>> last = None
>>> for x in a:
        if last == x and (len(inRow) == 0 or inRow[-1] != x):
            inRow.append(last)
        last = x
>>> inRow
['a', 'c']

怎么样:

set([a[i] for i in range(1, len(a)) if a[i] == a[i-1]])

Here's a Python one-liner that will do what I think you want. 这是Python的单行代码,可以满足我的要求。 It uses the itertools package: 它使用itertools包:

from itertools import chain, izip

a = "aabbbdeefggh" 

set(x[1] for x in chain(izip(*([iter(a)] * 2)), izip(*([iter(a[1:])] * 2))) if x[0] == x[1])

The edited question asks to avoid the set(), ruling out most of the answers. 编辑后的问题要求避免使用set(),从而排除了大多数答案。

I thought I'd compare the fancy one-liner list comprehensions with the good-old loop from @poke and another I created: 我以为我可以将花哨的单行列表理解与@poke的旧循环以及我创建的另一个循环进行比较:

from itertools import *

a = 'aaaabbbccccaaaaefgggghhhhhiiiiiijjjkkklllmnooooooaaaaaaaaqqqqqqsssstuuvv'

def izipp():
    return set(x[0] for x in izip(a, a[1:]) if x[0] == x[1])

def grpby():
    return [k for k,g in groupby(a) if len(list(g))>=2]

def poke():
    inRow = []
    last = None
    for x in a:
        if last == x and (len(inRow) == 0 or inRow[-1] != x):
            inRow.append(last)
        last = x
    return inRow    

def dread2():
    repeated_chars = []
    previous_char = ''
    for char in a:
        if repeated_chars and char == repeated_chars[-1]:
            continue
        if char == previous_char:
            repeated_chars.append(char)
        else:
            previous_char = char
    return repeated_chars

if __name__=="__main__":
    import timeit
    print "izip",timeit.timeit("izipp()", setup="from __main__ import izipp"),''.join(izipp())
    print "grpby",timeit.timeit("grpby()", setup="from __main__ import grpby"),''.join(grpby())
    print "poke",timeit.timeit("poke()", setup="from __main__ import poke"),''.join(poke())
    print "dread2",timeit.timeit("dread2()", setup="from __main__ import dread2"),''.join(dread2())

Gives me results: 给我结果:

izip 13.2173779011 acbgihkjloqsuv
grpby 18.1190848351 abcaghijkloaqsuv
poke 11.8500328064 abcaghijkloaqsuv
dread2 9.0088801384 abcaghijkloaqsuv

So a basic loop seems faster than all the list comprehensions and as much as twice the speed of the groupby. 因此,基本循环似乎比所有列表理解要快,并且是groupby速度的两倍。 However the basic loops are more complicated to read and write, so I'd probably stick with the groupby() in most circumstances. 但是,基本循环的读写更加复杂,因此在大多数情况下,我可能会坚持使用groupby()。

Here's a regex one-liner: 这是一个正则表达式单线:

>>> mylist = ['a', 'a', 'b', 'c', 'c', 'c', 'd', 'a', 'a']
>>> results = [match[0][0] for match in re.findall(r'((\w)\2{1,})', ''.join(mylist))]
>>> results
['a', 'c', 'a']

Sorry, too lazy to time it. 对不起,太懒了。

a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
res=[]
for i in a:
    if a.count(i)>1 and i not in res:
        res.append(i)
print(res)

Using enumerate to check for two in a row: 使用枚举连续检查两个:

def repetitives(long_list)
  repeaters = []
  for counter,item in enumerate(long_list):
    if item == long_list[counter-1] and item not in repeaters:
      repeaters.append(item)
 return repeaters

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM