简体   繁体   中英

Pandas- Groupby multiple columns and mean from a single column

I have a large dataframe of the following structure which is simplified for the purpose of this question:

  A     B     C D ... J K 
  date1 time1 1 1 ... 1 1
  date2 time2 2 2 ... 2 2

Essentially the first 3 columns all represent string data ie times and dates. The dataframe I am reading in from a csv file and the data after the times has multiple points on a single day for each column.

What I wish to do is find a way to group all the data by dates and create mean values per day which account for multiple datapoints on a single day. This is handled nicely by using group.by(dates) however I lose all of the other data not included in the dates column so the result is like so:

Before group.by().mean():

   A     B     C D ... J K
   date1 time1 1 1 ... 1 1
   date2 time2 2 2 ... 2 2
   date2 time3 1 1 ... 1 1

After:

   A      C   D   ... J   K
   date1  1   1       1   1
   date2  1.5 1.5 ... 1.5 1.5

My ideal output would be to keep the data in the time column and date column whilst still creating mean values based on the days. This would result in the following:

Ideal output:

   A      B     C   D   ... J   K
   date1  time1 1   1       1   1
   date2  time2 1.5 1.5 ... 1.5 1.5

There is problem you need aggregate columns with strings and times too, eg by first , else are omited .

So possible solution is create dict of aggregation functions and use groupby + agg + reset_index + reindex_axis :

print (df)

   A      B      C  D  E  J  K
0  a  date1  time1  1  1  1  1
1  b  date2  time2  2  2  2  2
2  c  date2  time3  1  1  1  1

cols = ['A','B','C']
d = {x:'mean' for x in df.columns.difference(cols)}
d['A'] = 'first'
d['C'] = 'first'
print (d)
{'E': 'mean', 'D': 'mean', 'J': 'mean', 'A': 'first', 'C': 'first', 'K': 'mean'}

df1 = df.groupby('B').agg(d).reset_index().reindex_axis(df.columns, axis=1)
print (df1)
   A      B      C    D    E    J    K
0  a  date1  time1  1.0  1.0  1.0  1.0
1  b  date2  time2  1.5  1.5  1.5  1.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM