Groupby or pivot in pandas?

Question

Can someone guide me on aggregating data in pandas?

I have a massive file with per timestamp survey data from about thousands of different people and over 20 different locations. Each survey has a four levels of 'Reasons' which I have listed as Driver1, Driver2 (there are 4). Then there is a column which counts the surveys and a few columns for each question. Since each row of the raw data is an individual survey, the count is always 1 and the score can either be -1,0,1.

       Date        Location    Person  Driver1  Driver2  Surveys   Question1   
-----------------------------------------------------------------------------
 4/30/2014 21:41    a123b      xyz234   Quest    Ion       1         -1

My goal is to:

Create a new raw data by aggregating the daily total surveys (sum) and mean scores per question

This should be aa daily (no timestamp) level per location and per person and per driver (4 levels)

  Date Location Person Driver1 Driver2 Surveys Question1 ----------------------------------------------------------------------------- 4/30/2014 a123b xyz234 Quest Ion 3 0.33 4/30/2014 a123b xyz234 Quest Bear 6 1

This will vastly reduce the file size but still give me detailed data. I want to know the performance of each person for survey drivers per day so I can track monthly/weekly progress.

I assume it must be something like:

df2 = df.groupby['Date','Location','Person','Driver1','Driver2','Driver3','Driver4']
df2['Surveys'].sum()
df2['Question1'].mean()

Answer 1

You're close. You need some () around that groupby

df2 = df.groupby(['Date','Location','Person','Driver1','Driver2','Driver3','Driver4'])

Then you combine the next two lines into one if you'd like

df2.agg({'Surveys' : 'sum', 'Question1' : 'mean'})

Groupby or pivot in pandas?

Question

1 answers

solution1
1 ACCPTED 2014-10-23 01:42:56

Groupby or pivot in pandas?

Question

1 answers

solution1 1 ACCPTED 2014-10-23 01:42:56

solution1
1 ACCPTED 2014-10-23 01:42:56