[英]Counting column value occurrences within a specified amount of time
I have a multi-indexed dataframe, df: 我有一个多索引数据框df:
name time activity
Bill 2013-10-09 05:27:00 run
2013-10-09 07:23:00 play
2013-10-09 07:25:00 hw
2013-10-09 08:25:00 swim
Rick 2014-11-07 06:27:00 eat
2014-11-07 07:25:00 swim
2014-11-07 08:25:00 hw
2014-11-07 10:30:00 play
with name and time as indices. 以名称和时间为索引。 time is a datetime type. time是日期时间类型。 I want a function, 我想要一个功能,
def find_close_activities(df, a, nhr)
that will return the count of activities which occurs within nhr hour(s) away from each instance of activity, a. 它将返回距每个活动实例n小时之内发生的活动计数。
So as an example, 举个例子
find_close_activities(df, 'hw', 1)
would return 会回来
play: 1
swim: 2
IMPORTANT: Counts should not overlap between names. 重要说明:名称之间的计数不应重叠。 We should only be searching for activities occurring n_hrs away within the same person. 我们应该只搜索在同一个人中n_hrs以外发生的活动。 I think that this would require a groupby. 我认为这将需要一个groupby。
IIUC, By using value_counts
with groupby
, join
here is to compare the time range IIUC,通过将value_counts
与groupby
一起使用,此处的join
是比较时间范围
def youfunc(df,my,hour):
df1=df[df.activity==my]
s=df.reset_index(level=1).join(df1.reset_index(level=1),rsuffix ='y')
s=s.loc[s.activity!=s.activityy]
s['New']=abs((s.time-s.timey).dt.total_seconds()/(hour*3600))
return s.groupby(level=0).apply(lambda x : x['activity'][x['New']<=1]).value_counts()
youfunc(df,'hw',1)
Out[363]:
swim 2
play 1
Name: activity, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.