[英]How to create a sliding window for merging different entries?
I have the following DataFrame df
:我有以下 DataFrame df
:
id datetime_event cameraid platenumber
11 2017-05-01T00:00:08 AAA 11A
12 2017-05-01T00:00:08 AAA 223
13 2017-05-01T00:00:08 BBB 11A
14 2017-05-01T00:00:09 BBB 33D
15 2017-05-01T00:00:09 DDD 44F
16 2017-05-01T01:01:00 AAA 44F
17 2017-05-01T01:01:01 BBB 44F
18 2017-05-01T01:01:09 AAA 556
19 2017-05-01T01:01:09 AAA 778
20 2017-05-01T01:01:11 EEE 666
For each hour of each day I want to select up to 100 entries that have title
in (AAA, BBB) and the same platenumber
appears sequentially first in AAA
and secondly in BBB
.对于每天的每个小时,我想选择最多 100 个title
为 (AAA, BBB) 且相同platenumber
条目, platenumber
出现在AAA
,其次出现在BBB
。
For example, for the above-given example DataFrame the output would be this one:例如,对于上面给出的示例 DataFrame,输出将是这样的:
id datetime_event cameraid platenumber
11 2017-05-01T00:00:08 AAA 11A
13 2017-05-01T00:00:08 BBB 11A
16 2017-05-01T01:01:00 AAA 44F
17 2017-05-01T01:01:01 BBB 44F
The first 100 entries for each hour of each day can be extracted in the following way:每天每小时的前 100 个条目可以通过以下方式提取:
df = df[df.groupby(pd.to_datetime(df['datetime_event']).dt.floor('H')).cumcount() < 100]
However, how can I filter by title
and (which is most important) how to merge by platenumber
, so that the same platenumber values appear subsequently, firstly, in AAA
and then in BBB
?但是,我如何按title
过滤(这是最重要的)如何按platenumber
合并,以便随后出现相同的车牌号值,首先出现在AAA
,然后出现在BBB
?
EDIT:编辑:
#first filter only AAA, BBB for less data
df = df[df['cameraid'].isin(['AAA','BBB'])]
df1 = (df.groupby([pd.to_datetime(df['datetime_event']).dt.floor('H'),'platenumber'])
.filter(lambda x: (x['cameraid'].values == ['AAA','BBB']).all()))
print (df1)
d datetime_event cameraid platenumber
0 11 2017-05-01T00:00:08 AAA 11A
2 13 2017-05-01T00:00:08 BBB 11A
5 16 2017-05-01T01:01:00 AAA 44F
6 17 2017-05-01T01:01:01 BBB 44F
Old solution:旧解决方案:
#first filter only AAA, BBB for less data
df = df[df['cameraid'].isin(['AAA','BBB'])]
#filter only 2 size groups and check if 1. value is AAA and 2. BBB
def f(x):
return len(x) == 2 and \
x['cameraid'].iat[0] == 'AAA' and \
x['cameraid'].iat[1] == 'BBB'
df = df.groupby([pd.to_datetime(df['datetime_event']).dt.floor('H'),'platenumber']).filter(f)
print (df)
d datetime_event cameraid platenumber
0 11 2017-05-01T00:00:08 AAA 11A
2 13 2017-05-01T00:00:08 BBB 11A
5 16 2017-05-01T01:01:00 AAA 44F
6 17 2017-05-01T01:01:01 BBB 44F
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.