[英]Finding Standard Deviation and mean of time grouped by day of week in pandas
Is this the most preferred method to obtain standard deviation and mean of times based on the day of the week? 这是获取基于星期几的标准偏差和平均时间的最优选方法吗?
How do I group the mean time, standard deviations first by TargetName, and second by day_of_week? 如何将平均时间,标准差按目标名称分组,然后按day_of_week分组?
Also, how would I go about converting the series of standard deviations and means to proper time format? 另外,如何将一系列标准偏差和均值转换为正确的时间格式? I have tried to loop through the series' and do datetime.timedelta(seconds=item) with success but would prefer the more pandas way to conduct operations.
我尝试遍历该系列并成功完成datetime.timedelta(seconds = item),但希望使用更多的熊猫方式进行操作。 Thank you for your feedback.
感谢您的反馈意见。
I have a data set that has date time stamps in it as below: 我有一个包含日期时间戳记的数据集,如下所示:
Date Time TargetUser
10/10/2012 20:20:01 joe
10/11/2012 02:20:01 bob
10/13/2012 21:20:01 smo
10/16/2012 22:20:01 joe
I am creating a day of week column as below: 我正在创建一个星期几列,如下所示:
df['my_dates'] = pd.to_datetime(df['Date'])
df['day_of_week'] = df['my_dates'].dt.dayofweek
days = {dict of days of week ie 0:"Mon"}
df['day_of_week'] = df['day_of_week'].apply(lambda x: days[x])
I am creating columns to tally up a total of seconds in a day and creating a column: 我正在创建列以将一天中的总时间相加并创建一列:
df[['HH', 'MM','SS']] = df['Time'].str.split(':', expand=True)
df['seconds'] = (((df['HH'].astype(int) * 60) + df['MM'].astype(int)) * 60) + df['SS'].astype(int)
I am then identifying a mean Time and standard deviation by day of week via below: 然后,我通过以下方式确定一周中某天的平均时间和标准差:
meantime = df['seconds'].groupby([df['day_of_week']]).mean()
std = df['seconds'].groupby([df['day_of_week']]).std(ddof=1)
(Not based on above data) Expected Output: (不基于以上数据)预期输出:
Name Day_of_week Mean STD
joe mon 15:01:01 00:08:02
tue 10:01:01 00:01:06
bob mon 11:11:11 00:20:30
smo thur 07:07:07 00:03:02
You should be able to greatly simplify your work by concatenating Date and Time and then using pandas excellent datetime accessor dt
. 通过连接日期和时间,然后使用pandas出色的datetime访问器
dt
您应该能够大大简化您的工作。
df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
df['day_of_week'] = df.DateTime.dt.strftime('%a')
df['seconds'] = pd.to_timedelta(df.DateTime.dt.time.astype(str)).dt.seconds
Which gives you this 这给你这个
Date Time TargetUser DateTime day_of_week seconds
0 10/10/2012 20:20:01 joe 2012-10-10 20:20:01 Wed 73201
1 10/11/2012 02:20:01 bob 2012-10-11 02:20:01 Thu 8401
2 10/13/2012 21:20:01 smo 2012-10-13 21:20:01 Sat 76801
3 10/16/2012 22:20:01 joe 2012-10-16 22:20:01 Tue 80401
And then to group by user and day of week do the following which renames your columns as well. 然后按用户和星期几分组,请执行以下操作,并重命名您的列。
df1 = df.groupby(['TargetUser', 'day_of_week'])\
.agg({'seconds':{'mean': lambda x: pd.to_timedelta(x.mean(), 's'),
'std': lambda x: pd.to_timedelta(np.std(x, ddof=1))}})
Final output of df1
df1
最终输出
seconds
mean std
TargetUser day_of_week
bob Thu 02:20:01 NaT
joe Tue 22:20:01 NaT
Wed 20:20:01 NaT
smo Sat 21:20:01 NaT
To remove the upper column level and turn the index into columns you can then do this: 要删除较高的列级别并将索引变成列,您可以执行以下操作:
df1.columns = df1.columns.droplevel()
df1.reset_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.