简体   繁体   中英

Remove only half of specific adjacent duplicates in python list

I have a tool which is outputting some data . It is known that whenever '10' comes in the data it is added with extra '10' Ie new data becomes ... '10', '10', . Sometimes there can be 4 '10' in consecutive series which means that there is actually 2 '10'.

While reading the data I am trying to remove the duplicates . Till now I have learnt how to remove duplicates if only two adjacent duplicates are found but at the same time if even number of duplicates are found , I want to return half of the duplicates .

x = [ '10', '10', '00', 'DF', '20' ,'10' ,'10' ,'10' ,'10', ....]

Expected output

[ '10', '00' , 'DF', ' 20', ' 10', '10' ..]

You may try to use groupby() from itertools :

X= [ '10', '10', '00', 'DF', '20' ,'10' ,'10' ,'10' ,'10']

from itertools import groupby

result = []
for k, g in groupby(X) :
    group = list(g)
    if k == '10' :
        result.extend(group[:(len(group)+1)/2])
    else :
        result.extend(group)
print result

gives:

['10', '00', 'DF', '20', '10', '10']

A pure python approach

ls = []
dupe = True
for item in x:
    if ls and ls[-1] == item and dupe:
        dupe = False
        continue
    dupe = True
    ls.append(item)

['10', '00', 'DF', '20', '10', '10']

Updated version that should handle any duplicates accurately:

res = []
temp = []

for i in range(1, len(x)):
    if x[i-1] == x[i]:
        temp.append(x[i-1])
    else:
        res.extend(temp[:len(temp)//2])
        temp = []
        res.append(x[i-1])

res.extend(temp[:len(temp)//2])

if x:
    res.append(x[-1])

print(res)

My approach is to inspect each element, compare it to the previous element, and put any duplicates in a buffer. For any comparisons that are unequal, dump half of the buffer into the result, clear the buffer and put the leftmost element in the result array.

Changing if x[i-1] == x[i]: to if x[i-1] == x[i] and x[i] == '10': restricts duplicates to "10" if necessary.

Play with it at a repl and let me know if you find any edge cases I missed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM