I am attempting to do a group by in Python. What I have is a data frame that has two columns...Name and Time Difference ( Time Difference ) is a timedelta variable that looks like the following -1 days 14:00:0000, 0 days 00:08:0000, ect. Name has duplicates in it...it looks like Brad, Amy, Brad, Brad, Bill, Amy....what I want to do is find the Mean of Time Difference by Name. Also Time Difference does have NA values in it.
I have tried
data_frame['NewMean'] = data_frame['TimeDifference'].values.astype(np.int64)
means = data_frame.groupby(data_frame['Name']).mean()
means['NewMean'] = pd.to_timedelta(means['NewMean'])
But I keep getting the error invalid literal for int()
I know float fixes this but I want to create a new dataframe with this information that just list out the names ( no dupes ) and the mean of each name
Try this:
data_frame['TimeDifference'] = data_frame['TimeDifference'].dt.days
data_frame['mean'] = data_frame.groupby('Name')['TimeDifference'].mean()
There is a way to get the values without casting to int and ignoring nan
or nat
values but involves a lambda expression, the results are a timedelta objects:
import numpy as np
time_groups = data_frame.groupby('Name').apply(
lambda df: np.mean(df.TimeDifference)
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.