I am coming from a Java background and learning Python by applying it in my work environment whenever possible. I have a piece of functioning code that I would really like to improve.
Essentially I have a list of namedtuples with 3 numerical values and 1 time value.
complete=[]
uniquecomplete=set()
screenedPartitions = namedtuple('screenedPartitions'['feedID','partition','date', 'screeeningMode'])
I parse a log and after this is populated, I want to create a reduced set that is essentially the most recently dated member where feedID, partition and screeningMode are identical. So far I can only get it out by using a nasty nested loop.
for a in complete:
max = a
for b in complete:
if a.feedID == b.feedID and a.partition == b.partition and\
a.screeeningMode == b.screeeningMode and a.date < b.date:
max = b
uniqueComplete.add(max)
Could anyone give me advice on how to improve this? It would be great to work it out with whats available in the stdlib, as I guess my main task here is to get me thinking about it with the map/filter functionality.
The data looks akin to
FeedID | Partition | Date | ScreeningMode
68 | 5 |10/04/2017 12:40| EPEP
164 | 1 |09/04/2017 19:53| ISCION
164 | 1 |09/04/2017 20:50| ISCION
180 | 1 |10/04/2017 06:11| ISAN
128 | 1 |09/04/2017 21:16| ESAN
So after the code is run line 2 would be removed as line 3 is a more recent version.
Tl;Dr, what would this SQL be in Python :
SELECT feedID,partition,screeeningMode,max(date)
from Complete
group by 'feedID','partition','screeeningMode'
Try something like this:
import pandas as pd
df = pd.DataFrame(screenedPartitions, columns=screenedPartitions._fields)
df = df.groupby(['feedID','partition','screeeningMode']).max()
It really depends on how your date is represented, but if you provide data I think we can work something out.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.