简体   繁体   中英

Create a new list from a given list such that the new list can flag consecutive repetitions in the given list

I have a long list (several hundred thousand items) of numbers and I want to create a new list of equal size to find out the places where there are consecutive repetitions of numbers. The new list will have 0 and 1 values, such that for consecutive repeated indexes the new list will have 1 and for remaining indexes it will have 0 value.

If there is something as a pandas column that can be helpful as well.

Sample given list and resultant array. List can have float values also.

given_array = [1, 2, 3, 5, 5, 5, 5, 0, -2, -4, -6, -8, 9, 9, 9]

result_array = [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1]

I have given a small working example of my code below.

import itertools    

def list_from_count(list_item):
    """
    Function takes an iterator and based on the length of the item
    returns 1 if length is 1 or list of 0 for length greater than 1
    """
    if len(list(list_item[1])) == 1:
        return 1
    else:
        return [0] * len(list(list_item[1]))

r0 = list(range(1,4))
r1 = [5]*4
r2 = list(range(0,-10,-2))
r3 = [9]*3
r = r0 + r1 + r2 + r3


gri = itertools.groupby(r)
res = list(map(list_from_count,gri))

print ("Result",'\n',res)

Result

[1, 1, 1, [], 1, 1, 1, 1, 1, []]

Thanks in advance!

You can use itertools.groupby and output repeated 1 s if the length of a group is greater than 1:

from itertools import groupby

result_array = []
for _, g in groupby(given_array):
    size = sum(1 for i in g)
    if size == 1:
        result_array.append(0)
    else:
        result_array.extend([1] * size)

or with a list comprehension:

result_array = [i for _, g in groupby(given_array) for s in (sum(1 for i in g),) for i in ([0] if s == 1 else [1] * s)]

result_array becomes:

[0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1]

You're using len(list(list_item[1])) twice. The first time you use it, it processes all the items in the iterator. When you call it the second time, the iterator is all used up, so it returns 0 , that's why you get a zero-element list.

You need to save the length in a variable the first time:

def list_from_count(list_item):
    l = len(list(list_item[1]))
    if l == 1:
        return [0]
    else:
        return [1] * l

You also need to return a list consistently from this function, then you can concatenate all the results, so you don't get a mix of numbers and sublists.

res = []
for el in gri:
    res += list_from_count(el)
print(res)

This situation is more akin to a run length encoding problem. Consider more_itertools.run_length :

Given

import more_itertools as mit


iterable = [1, 2, 3, 5, 5, 5, 5, 0, -2, -3, -6, -8, 9, 9, 9]

Code

result = [[0] if n == 1 else [1] * n for _, n in mit.run_length.encode(iterable)]
result
# [[0], [0], [0], [1, 1, 1, 1], [0], [0], [0], [0], [0], [1, 1, 1]]

Now simply flatten the sub-lists (however you wish) into one list:

list(mit.flatten(result))
# [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1]

Details

mit.run_length.encode compresses an iterable by yielding tuples of ( value , # of repititions ), eg:

list(mit.run_length.encode("abaabbba"))
# [('a', 1), ('b', 1), ('a', 2), ('b', 3), ('a', 1)]

Our comprehension ignores the value, uses repetitions n and creates sub-lists of [0] and [1] * n .

Note: more_itertools is a third-party package. Install via > pip install more_itertools .

Use the PANDAS shift operator to create a vector shifted 1 element. Compare that to the original. This will give you a vector of True/False values, showing where an element matched the previous one. Run a linear search down that list to extend one element at the front: change [False, True] to [True, True]. Convert to int , and you have the list you specified.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM