[英]Create a new column based on Grouping of similar values in another column in pandas
Hi I have an event data frame with datetimes and event ids and sensor ids. 嗨,我有一个事件数据框,其中包含日期时间,事件ID和传感器ID。 I would like to group events that happen within one hour per sensor and if possible tag them with the group count.
我想将每个传感器在一小时内发生的事件分组,并在可能的情况下用分组计数对其进行标记。 Original Data Frame
原始数据框
sensor_id event_id time
0 A e1 2017-02-14 05:30:00
1 A e2 2017-02-14 05:45:00
2 A e3 2017-02-14 08:30:00
3 B e3 2017-02-14 05:20:00
4 B e4 2017-02-14 05:30:00
5 B e6 2017-02-14 05:45:00
6 C e1 2017-02-14 05:30:00
7 C e3 2017-02-14 07:30:00
8 C e7 2017-02-14 09:35:00
Desired Result: 预期结果:
sensor_id event_id time group
0 A e1 2017-02-14 05:30:00 1
1 A e2 2017-02-14 05:45:00 1
2 A e3 2017-02-14 08:30:00 2
3 B e3 2017-02-14 05:20:00 1
4 B e4 2017-02-14 05:30:00 1
5 B e6 2017-02-14 05:45:00 1
6 C e1 2017-02-14 05:30:00 1
7 C e3 2017-02-14 07:30:00 2
8 C e7 2017-02-14 09:35:00 3
I understand that I should group by user, event and then, time using timdelta of 1 hour but I have no clue how to do the rest. 我知道我应该使用1小时的timdelta按用户,事件和时间进行分组,但是我不知道如何进行其余操作。 Any tips will be appreciated.
任何提示将不胜感激。
I think you need to go for dual groupby (Hope sensor_id is sorted if not we need to sorted them first) ie 我认为您需要进行双重groupby(如果不是我们需要先对它们进行排序,则希望sensor_id进行排序),即
df['new'] = df.groupby('sensor_id').apply( lambda x : x.groupby(x['time'].dt.hour).ngroup()+1).values
Output : 输出:
sensor_id event_id time new 0 A e1 2017-02-14 05:30:00 1 1 A e2 2017-02-14 05:45:00 1 2 A e3 2017-02-14 08:30:00 2 3 B e3 2017-02-14 05:20:00 1 4 B e4 2017-02-14 05:30:00 1 5 B e6 2017-02-14 05:45:00 1 6 C e1 2017-02-14 05:30:00 1 7 C e3 2017-02-14 07:30:00 2 8 C e7 2017-02-14 09:35:00 3
You can use the pd.TimeGrouper
+ ngroup
to group by time frequency. 您可以使用
pd.TimeGrouper
+ ngroup
按时间频率分组。
df['time'] = pd.to_datetime(df.time)
df['group'] = df.set_index('time').groupby(['sensor_id',
pd.TimeGrouper(freq='1H')], sort=False).ngroup().values
So far, we have what we want, but we'll need to reset the group
value for each sensor_id
, so another groupby
call is in order. 到目前为止,我们已经有了所需的东西,但是我们需要为每个
sensor_id
重置group
值,因此需要进行另一个groupby
调用。
df['group'] = df.groupby('sensor_id').group.apply(lambda x: x - x.min() + 1)
df
sensor_id event_id time group
0 A e1 2017-02-14 05:30:00 1
1 A e2 2017-02-14 05:45:00 1
2 A e3 2017-02-14 08:30:00 2
3 B e3 2017-02-14 05:20:00 1
4 B e4 2017-02-14 05:30:00 1
5 B e6 2017-02-14 05:45:00 1
6 C e1 2017-02-14 05:30:00 1
7 C e3 2017-02-14 07:30:00 2
8 C e7 2017-02-14 09:35:00 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.