计算指定时间内的列值出现次数

Question

I have a multi-indexed dataframe, df: 我有一个多索引数据框df：

name time                    activity
Bill 2013-10-09 05:27:00     run
     2013-10-09 07:23:00     play
     2013-10-09 07:25:00     hw
     2013-10-09 08:25:00     swim
Rick 2014-11-07 06:27:00     eat
     2014-11-07 07:25:00     swim
     2014-11-07 08:25:00     hw
     2014-11-07 10:30:00     play

with name and time as indices. 以名称和时间为索引。 time is a datetime type. time是日期时间类型。 I want a function, 我想要一个功能，

def find_close_activities(df, a, nhr)

that will return the count of activities which occurs within nhr hour(s) away from each instance of activity, a. 它将返回距每个活动实例n小时之内发生的活动计数。

So as an example, 举个例子

find_close_activities(df, 'hw', 1)

would return 会回来

play: 1
swim: 2

IMPORTANT: Counts should not overlap between names. 重要说明：名称之间的计数不应重叠。 We should only be searching for activities occurring n_hrs away within the same person. 我们应该只搜索在同一个人中n_hrs以外发生的活动。 I think that this would require a groupby. 我认为这将需要一个groupby。

Answer 1

IIUC, By using value_counts with groupby , join here is to compare the time range IIUC，通过将value_counts与groupby一起使用，此处的join是比较时间范围

def youfunc(df,my,hour):
    df1=df[df.activity==my]
    s=df.reset_index(level=1).join(df1.reset_index(level=1),rsuffix ='y')
    s=s.loc[s.activity!=s.activityy]
    s['New']=abs((s.time-s.timey).dt.total_seconds()/(hour*3600))
    return s.groupby(level=0).apply(lambda x : x['activity'][x['New']<=1]).value_counts()
youfunc(df,'hw',1)
Out[363]: 
swim    2
play    1
Name: activity, dtype: int64

计算指定时间内的列值出现次数

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-03-28 22:40:14

计算指定时间内的列值出现次数

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-03-28 22:40:14

解决方案1
1 已采纳 2018-03-28 22:40:14