I am trying to group on a column based on the sequence it appears (timestamp) and simultaneously finding aggregate (mean) on the other variables within the small group. I can successfully group it but unable to aggregate
Here is my sample input:
Date T/F X1
12/02/19 T 10
12/02/19 T 20
12/02/19 F 15
12/02/19 T 12
12/03/19 F 10
12/03/19 F 20
12/03/19 T 30
12/04/19 T 40
Expected O/P
Date T/F X1 Count
12/02/19 T 15 2
12/02/19 F 15 1
12/02/19 T 12 1
12/03/19 F 15 2
12/03/19 T 35 2
Here is the code I am using, which groups and give me the count for each group, how do I get the avg of X1 as well, within that group
import itertools
for (key,group) in itertools.groupby(df['T/F']):
print (key, len(list(group)))
Thanks for the help!
You can use the function groupby
:
df1 = df.assign(Count=np.nan).\
groupby(df['T/F'].ne(df['T/F'].shift()).cumsum(), as_index=False).\
agg({'Date': 'first', 'T/F': 'first', 'X1': 'mean', 'Count': 'size'})
print(df1)
Output:
Date T/F X1 Count
0 12/02/19 T 15 2
1 12/02/19 F 15 1
2 12/02/19 T 12 1
3 12/03/19 F 15 2
4 12/03/19 T 35 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.