Groupby还是以熊猫为中心？

Question

Can someone guide me on aggregating data in pandas? 有人可以指导我汇总大熊猫中的数据吗？

I have a massive file with per timestamp survey data from about thousands of different people and over 20 different locations. 我有一个包含每个时间戳调查数据的庞大文件，该数据来自大约数千个不同人员和20多个不同位置。 Each survey has a four levels of 'Reasons' which I have listed as Driver1, Driver2 (there are 4). 每个调查都有四个级别的“原因”，我将它们列为Driver1和Driver2（共有4个级别）。 Then there is a column which counts the surveys and a few columns for each question. 然后是一列，用于统计调查，而每个问题都有几列。 Since each row of the raw data is an individual survey, the count is always 1 and the score can either be -1,0,1. 由于原始数据的每一行都是单独的调查，因此计数始终为1，得分可以为-1,0,1。

       Date        Location    Person  Driver1  Driver2  Surveys   Question1   
-----------------------------------------------------------------------------
 4/30/2014 21:41    a123b      xyz234   Quest    Ion       1         -1

My goal is to: 我的目标是：

Create a new raw data by aggregating the daily total surveys (sum) and mean scores per question 通过汇总每日总调查（总和）和每个问题的平均分数来创建新的原始数据

This should be aa daily (no timestamp) level per location and per person and per driver (4 levels) 这应该是每个位置，每个人和每个驾驶员的每日（无时间戳）级别（4个级别）

  Date Location Person Driver1 Driver2 Surveys Question1 ----------------------------------------------------------------------------- 4/30/2014 a123b xyz234 Quest Ion 3 0.33 4/30/2014 a123b xyz234 Quest Bear 6 1

This will vastly reduce the file size but still give me detailed data. 这将大大减少文件大小，但仍能提供详细的数据。 I want to know the performance of each person for survey drivers per day so I can track monthly/weekly progress. 我想知道每个人每天对调查司机的表现，因此我可以跟踪每月/每周的进度。

I assume it must be something like: 我认为它一定是这样的：

df2 = df.groupby['Date','Location','Person','Driver1','Driver2','Driver3','Driver4']
df2['Surveys'].sum()
df2['Question1'].mean()

Answer 1

You're close. 你近了 You need some () around that groupby 您需要在（groupby）周围添加一些（）

df2 = df.groupby(['Date','Location','Person','Driver1','Driver2','Driver3','Driver4'])

Then you combine the next two lines into one if you'd like 然后，如果需要，可以将下两行合并为一

df2.agg({'Surveys' : 'sum', 'Question1' : 'mean'})

Groupby还是以熊猫为中心？

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-10-23 01:42:56

Groupby还是以熊猫为中心？

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-10-23 01:42:56

解决方案1
1 已采纳 2014-10-23 01:42:56