简体   繁体   中英

Groupby or pivot in pandas?

Can someone guide me on aggregating data in pandas?

I have a massive file with per timestamp survey data from about thousands of different people and over 20 different locations. Each survey has a four levels of 'Reasons' which I have listed as Driver1, Driver2 (there are 4). Then there is a column which counts the surveys and a few columns for each question. Since each row of the raw data is an individual survey, the count is always 1 and the score can either be -1,0,1.

       Date        Location    Person  Driver1  Driver2  Surveys   Question1   
-----------------------------------------------------------------------------
 4/30/2014 21:41    a123b      xyz234   Quest    Ion       1         -1

My goal is to:

  • Create a new raw data by aggregating the daily total surveys (sum) and mean scores per question
  • This should be aa daily (no timestamp) level per location and per person and per driver (4 levels)

      Date Location Person Driver1 Driver2 Surveys Question1 ----------------------------------------------------------------------------- 4/30/2014 a123b xyz234 Quest Ion 3 0.33 4/30/2014 a123b xyz234 Quest Bear 6 1 

This will vastly reduce the file size but still give me detailed data. I want to know the performance of each person for survey drivers per day so I can track monthly/weekly progress.

I assume it must be something like:

df2 = df.groupby['Date','Location','Person','Driver1','Driver2','Driver3','Driver4']
df2['Surveys'].sum()
df2['Question1'].mean()

You're close. You need some () around that groupby

df2 = df.groupby(['Date','Location','Person','Driver1','Driver2','Driver3','Driver4'])

Then you combine the next two lines into one if you'd like

df2.agg({'Surveys' : 'sum', 'Question1' : 'mean'})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM