简体   繁体   中英

From a list of tuples [(ID, date),(ID, date)..] create a new list of tuples with unique ID and most recent date

I have a list of tuples [(ID,date),(ID,date),...]. The same ID can occur many times or only once. If ID occur more than once i only want the most recent one.

lst = [(587,"2015-01-01"),
        (625,"2011-12-01"),
        (587,"1998-05-01")]

I want this:

list2 = [(587,"2015-01-01"),
        ("625,2011-12-01"),]

One of the tuples look like:

(2, 14, 58875, 1, datetime.datetime(2009, 11, 1, 0, 0), u'RB', u'SYSTEM', datetime.datetime(2016, 6, 21, 9, 7, 38), u'SYSTEM', datetime.datetime(2016, 6, 21, 9, 7, 38))

The ID field has index 2 and date field index 4

You can use a defaultdict() with an empty string as the default value:

lst = [(587,'2015-01-01'),
       (625,'2011-12-01'),
       (587,'1998-05-01')]

from collections import defaultdict
result = defaultdict(lambda: "")

for k, v in lst:
    if result[k] < v:
        result[k] = v

list(result.items())
# [(625, '2011-12-01'), (587, '2015-01-01')]

If the elements in each tuple are too many to unpack as above, you can capture the tuple with a single variable and then use index to access it, for instance:

for x in lst: 
    if result[x[0]] < x[1]: 
        result[x[0]] = x[1]

list(result.items())
# [(625, '2011-12-01'), (587, '2015-01-01')]

Use itertools.groupby to group your tuples by the first element, then select the last element from each group:

groups = itertools.groupby(sorted(lst), lambda x:x[0])
[(list(x[1])[-1]) for x in groups]
# [(587, '2015-01-01'), (625, '2011-12-01')]

If you don't want to use any library, this should work:

list2=[]

for i in list1:
    if i[0] not in [j[0] for j in list2]:
        list2.append(i)
    else: 
      for k in range(len(list2)):
        if i[0] == list2[k][0] and i[1] > list2[k][1]:
            list2[k] = i

Thus, if the ID is not in list2 , it will append the touple, while if it is and the i date value is higher than the one in list2 , it will replace it.

If your touples have other values, then just adapt it for your ID and date positions. For the case (value,ID,value,value,date,value,...) it would be:

list2=[]

for i in list1:
    if i[1] not in [j[1] for j in list2]:
        list2.append(i)
    else: 
      for k in range(len(list2)):
        if i[1] == list2[k][1] and i[4] > list2[k][4]:
            list2[k] = i

Hope this helped!

An approach is using filter() like below:

my_list = [(587, '2015-01-01'),
        (625, '2011-12-01'),
        (587, '1998-05-01')]

my_keys = set(item[0] for item in my_list)  # to eliminate duplicates

res_list = []
for key in my_keys:
    res_list.append(filter(lambda item: item[0] == key, my_list)[0])

Output:

>>> res_list
[(625, '2011-12-01'), (587, '2015-01-01')]
from datetime import datetime

list = [(587,"2015-01-01"),
        (625,"2011-12-01"),
        (587,"1998-05-01")]

listsort = sorted([(e[0], datetime.strptime(e[1], "%Y-%m-%d")) for e in list])[::]
listfilter = sorted([(k, datetime.strftime(v, "%Y-%m-%d")) for k,v in dict(listsort).iteritems()])
print listfilter

Output

[(587, '2015-01-01'),
 (625, '2011-12-01')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM