简体   繁体   English

查找大熊猫中按星期几分组的标准偏差和时间平均值

[英]Finding Standard Deviation and mean of time grouped by day of week in pandas

Is this the most preferred method to obtain standard deviation and mean of times based on the day of the week? 这是获取基于星期几的标准偏差和平均时间的最优选方法吗?

How do I group the mean time, standard deviations first by TargetName, and second by day_of_week? 如何将平均时间,标准差按目标名称分组,然后按day_of_week分组?

Also, how would I go about converting the series of standard deviations and means to proper time format? 另外,如何将一系列标准偏差和均值转换为正确的时间格式? I have tried to loop through the series' and do datetime.timedelta(seconds=item) with success but would prefer the more pandas way to conduct operations. 我尝试遍历该系列并成功完成datetime.timedelta(seconds = item),但希望使用更多的熊猫方式进行操作。 Thank you for your feedback. 感谢您的反馈意见。

I have a data set that has date time stamps in it as below: 我有一个包含日期时间戳记的数据集,如下所示:

Date        Time       TargetUser
10/10/2012  20:20:01   joe
10/11/2012  02:20:01   bob
10/13/2012  21:20:01   smo
10/16/2012  22:20:01   joe

I am creating a day of week column as below: 我正在创建一个星期几列,如下所示:

df['my_dates'] = pd.to_datetime(df['Date'])
df['day_of_week'] = df['my_dates'].dt.dayofweek
days = {dict of days of week ie 0:"Mon"}
df['day_of_week'] = df['day_of_week'].apply(lambda x: days[x])

I am creating columns to tally up a total of seconds in a day and creating a column: 我正在创建列以将一天中的总时间相加并创建一列:

df[['HH', 'MM','SS']] = df['Time'].str.split(':', expand=True)
df['seconds'] = (((df['HH'].astype(int) * 60) + df['MM'].astype(int)) * 60) + df['SS'].astype(int)

I am then identifying a mean Time and standard deviation by day of week via below: 然后,我通过以下方式确定一周中某天的平均时间和标准差:

meantime = df['seconds'].groupby([df['day_of_week']]).mean()
std = df['seconds'].groupby([df['day_of_week']]).std(ddof=1)

(Not based on above data) Expected Output: (不基于以上数据)预期输出:

Name          Day_of_week       Mean        STD
joe           mon               15:01:01    00:08:02
              tue               10:01:01    00:01:06 
bob           mon               11:11:11    00:20:30
smo           thur              07:07:07    00:03:02

You should be able to greatly simplify your work by concatenating Date and Time and then using pandas excellent datetime accessor dt . 通过连接日期和时间,然后使用pandas出色的datetime访问器dt您应该能够大大简化您的工作。

df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
df['day_of_week'] = df.DateTime.dt.strftime('%a')
df['seconds'] = pd.to_timedelta(df.DateTime.dt.time.astype(str)).dt.seconds

Which gives you this 这给你这个

         Date      Time TargetUser            DateTime day_of_week  seconds
0  10/10/2012  20:20:01        joe 2012-10-10 20:20:01         Wed    73201
1  10/11/2012  02:20:01        bob 2012-10-11 02:20:01         Thu     8401
2  10/13/2012  21:20:01        smo 2012-10-13 21:20:01         Sat    76801
3  10/16/2012  22:20:01        joe 2012-10-16 22:20:01         Tue    80401

And then to group by user and day of week do the following which renames your columns as well. 然后按用户和星期几分组,请执行以下操作,并重命名您的列。

df1 = df.groupby(['TargetUser', 'day_of_week'])\
  .agg({'seconds':{'mean': lambda x: pd.to_timedelta(x.mean(), 's'), 
                   'std': lambda x: pd.to_timedelta(np.std(x, ddof=1))}})

Final output of df1 df1最终输出

                        seconds    
                           mean std
TargetUser day_of_week             
bob        Thu         02:20:01 NaT
joe        Tue         22:20:01 NaT
           Wed         20:20:01 NaT
smo        Sat         21:20:01 NaT

To remove the upper column level and turn the index into columns you can then do this: 要删除较高的列级别并将索引变成列,您可以执行以下操作:

df1.columns = df1.columns.droplevel()
df1.reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM