简体   繁体   中英

How to create a sliding window for merging different entries?

I have the following DataFrame df :

id   datetime_event        cameraid    platenumber
11   2017-05-01T00:00:08   AAA         11A
12   2017-05-01T00:00:08   AAA         223
13   2017-05-01T00:00:08   BBB         11A
14   2017-05-01T00:00:09   BBB         33D
15   2017-05-01T00:00:09   DDD         44F
16   2017-05-01T01:01:00   AAA         44F
17   2017-05-01T01:01:01   BBB         44F
18   2017-05-01T01:01:09   AAA         556
19   2017-05-01T01:01:09   AAA         778
20   2017-05-01T01:01:11   EEE         666

For each hour of each day I want to select up to 100 entries that have title in (AAA, BBB) and the same platenumber appears sequentially first in AAA and secondly in BBB .

For example, for the above-given example DataFrame the output would be this one:

id   datetime_event        cameraid    platenumber
11   2017-05-01T00:00:08   AAA         11A
13   2017-05-01T00:00:08   BBB         11A
16   2017-05-01T01:01:00   AAA         44F
17   2017-05-01T01:01:01   BBB         44F

The first 100 entries for each hour of each day can be extracted in the following way:

df = df[df.groupby(pd.to_datetime(df['datetime_event']).dt.floor('H')).cumcount() < 100]

However, how can I filter by title and (which is most important) how to merge by platenumber , so that the same platenumber values appear subsequently, firstly, in AAA and then in BBB ?

Use filter :

EDIT:

#first filter only AAA, BBB for less data
df = df[df['cameraid'].isin(['AAA','BBB'])]

df1 = (df.groupby([pd.to_datetime(df['datetime_event']).dt.floor('H'),'platenumber'])
         .filter(lambda x: (x['cameraid'].values == ['AAA','BBB']).all()))
print (df1)
    d       datetime_event cameraid platenumber
0  11  2017-05-01T00:00:08      AAA         11A
2  13  2017-05-01T00:00:08      BBB         11A
5  16  2017-05-01T01:01:00      AAA         44F
6  17  2017-05-01T01:01:01      BBB         44F

Old solution:

#first filter only AAA, BBB for less data
df = df[df['cameraid'].isin(['AAA','BBB'])]

#filter only 2 size groups and check if 1. value is AAA and 2. BBB
def f(x):
    return len(x) == 2 and \
           x['cameraid'].iat[0] == 'AAA' and \
           x['cameraid'].iat[1] == 'BBB'

df = df.groupby([pd.to_datetime(df['datetime_event']).dt.floor('H'),'platenumber']).filter(f)
print (df)
    d       datetime_event cameraid platenumber
0  11  2017-05-01T00:00:08      AAA         11A
2  13  2017-05-01T00:00:08      BBB         11A
5  16  2017-05-01T01:01:00      AAA         44F
6  17  2017-05-01T01:01:01      BBB         44F

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM