简体   繁体   中英

Finding Standard Deviation and mean of time grouped by day of week in pandas

Is this the most preferred method to obtain standard deviation and mean of times based on the day of the week?

How do I group the mean time, standard deviations first by TargetName, and second by day_of_week?

Also, how would I go about converting the series of standard deviations and means to proper time format? I have tried to loop through the series' and do datetime.timedelta(seconds=item) with success but would prefer the more pandas way to conduct operations. Thank you for your feedback.

I have a data set that has date time stamps in it as below:

Date        Time       TargetUser
10/10/2012  20:20:01   joe
10/11/2012  02:20:01   bob
10/13/2012  21:20:01   smo
10/16/2012  22:20:01   joe

I am creating a day of week column as below:

df['my_dates'] = pd.to_datetime(df['Date'])
df['day_of_week'] = df['my_dates'].dt.dayofweek
days = {dict of days of week ie 0:"Mon"}
df['day_of_week'] = df['day_of_week'].apply(lambda x: days[x])

I am creating columns to tally up a total of seconds in a day and creating a column:

df[['HH', 'MM','SS']] = df['Time'].str.split(':', expand=True)
df['seconds'] = (((df['HH'].astype(int) * 60) + df['MM'].astype(int)) * 60) + df['SS'].astype(int)

I am then identifying a mean Time and standard deviation by day of week via below:

meantime = df['seconds'].groupby([df['day_of_week']]).mean()
std = df['seconds'].groupby([df['day_of_week']]).std(ddof=1)

(Not based on above data) Expected Output:

Name          Day_of_week       Mean        STD
joe           mon               15:01:01    00:08:02
              tue               10:01:01    00:01:06 
bob           mon               11:11:11    00:20:30
smo           thur              07:07:07    00:03:02

You should be able to greatly simplify your work by concatenating Date and Time and then using pandas excellent datetime accessor dt .

df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
df['day_of_week'] = df.DateTime.dt.strftime('%a')
df['seconds'] = pd.to_timedelta(df.DateTime.dt.time.astype(str)).dt.seconds

Which gives you this

         Date      Time TargetUser            DateTime day_of_week  seconds
0  10/10/2012  20:20:01        joe 2012-10-10 20:20:01         Wed    73201
1  10/11/2012  02:20:01        bob 2012-10-11 02:20:01         Thu     8401
2  10/13/2012  21:20:01        smo 2012-10-13 21:20:01         Sat    76801
3  10/16/2012  22:20:01        joe 2012-10-16 22:20:01         Tue    80401

And then to group by user and day of week do the following which renames your columns as well.

df1 = df.groupby(['TargetUser', 'day_of_week'])\
  .agg({'seconds':{'mean': lambda x: pd.to_timedelta(x.mean(), 's'), 
                   'std': lambda x: pd.to_timedelta(np.std(x, ddof=1))}})

Final output of df1

                        seconds    
                           mean std
TargetUser day_of_week             
bob        Thu         02:20:01 NaT
joe        Tue         22:20:01 NaT
           Wed         20:20:01 NaT
smo        Sat         21:20:01 NaT

To remove the upper column level and turn the index into columns you can then do this:

df1.columns = df1.columns.droplevel()
df1.reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM