简体   繁体   English

计算指定时间内的列值出现次数

[英]Counting column value occurrences within a specified amount of time

I have a multi-indexed dataframe, df: 我有一个多索引数据框df:

name time                    activity
Bill 2013-10-09 05:27:00     run
     2013-10-09 07:23:00     play
     2013-10-09 07:25:00     hw
     2013-10-09 08:25:00     swim
Rick 2014-11-07 06:27:00     eat
     2014-11-07 07:25:00     swim
     2014-11-07 08:25:00     hw
     2014-11-07 10:30:00     play

with name and time as indices. 以名称和时间为索引。 time is a datetime type. time是日期时间类型。 I want a function, 我想要一个功能,

def find_close_activities(df, a, nhr)

that will return the count of activities which occurs within nhr hour(s) away from each instance of activity, a. 它将返回距每个活动实例n小时之内发生的活动计数。

So as an example, 举个例子

find_close_activities(df, 'hw', 1)

would return 会回来

play: 1
swim: 2

IMPORTANT: Counts should not overlap between names. 重要说明:名称之间的计数不应重叠。 We should only be searching for activities occurring n_hrs away within the same person. 我们应该只搜索在同一个人中n_hrs以外发生的活动。 I think that this would require a groupby. 我认为这将需要一个groupby。

IIUC, By using value_counts with groupby , join here is to compare the time range IIUC,通过将value_countsgroupby一起使用,此处的join是比较时间范围

def youfunc(df,my,hour):
    df1=df[df.activity==my]
    s=df.reset_index(level=1).join(df1.reset_index(level=1),rsuffix ='y')
    s=s.loc[s.activity!=s.activityy]
    s['New']=abs((s.time-s.timey).dt.total_seconds()/(hour*3600))
    return s.groupby(level=0).apply(lambda x : x['activity'][x['New']<=1]).value_counts()
youfunc(df,'hw',1)
Out[363]: 
swim    2
play    1
Name: activity, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM