简体   繁体   中英

Returning Max value grouping by N attributes

I am coming from a Java background and learning Python by applying it in my work environment whenever possible. I have a piece of functioning code that I would really like to improve.

Essentially I have a list of namedtuples with 3 numerical values and 1 time value.

complete=[]
uniquecomplete=set()
screenedPartitions = namedtuple('screenedPartitions'['feedID','partition','date', 'screeeningMode'])

I parse a log and after this is populated, I want to create a reduced set that is essentially the most recently dated member where feedID, partition and screeningMode are identical. So far I can only get it out by using a nasty nested loop.

for a in complete:
    max = a             
    for b in complete:
        if a.feedID == b.feedID and a.partition == b.partition and\
                       a.screeeningMode == b.screeeningMode and a.date < b.date:
            max = b
    uniqueComplete.add(max)

Could anyone give me advice on how to improve this? It would be great to work it out with whats available in the stdlib, as I guess my main task here is to get me thinking about it with the map/filter functionality.

The data looks akin to

FeedID | Partition | Date           | ScreeningMode

68     |    5      |10/04/2017 12:40|   EPEP

164    |    1      |09/04/2017 19:53|   ISCION

164    |    1      |09/04/2017 20:50|   ISCION

180    |    1      |10/04/2017 06:11|   ISAN

128    |    1      |09/04/2017 21:16|   ESAN

So after the code is run line 2 would be removed as line 3 is a more recent version.

Tl;Dr, what would this SQL be in Python :

SELECT feedID,partition,screeeningMode,max(date)
from Complete
group by 'feedID','partition','screeeningMode'

Try something like this:

import pandas as pd

df = pd.DataFrame(screenedPartitions, columns=screenedPartitions._fields)
df = df.groupby(['feedID','partition','screeeningMode']).max()

It really depends on how your date is represented, but if you provide data I think we can work something out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM