简体   繁体   English

Groupby还是以熊猫为中心?

[英]Groupby or pivot in pandas?

Can someone guide me on aggregating data in pandas? 有人可以指导我汇总大熊猫中的数据吗?

I have a massive file with per timestamp survey data from about thousands of different people and over 20 different locations. 我有一个包含每个时间戳调查数据的庞大文件,该数据来自大约数千个不同人员和20多个不同位置。 Each survey has a four levels of 'Reasons' which I have listed as Driver1, Driver2 (there are 4). 每个调查都有四个级别的“原因”,我将它们列为Driver1和Driver2(共有4个级别)。 Then there is a column which counts the surveys and a few columns for each question. 然后是一列,用于统计调查,而每个问题都有几列。 Since each row of the raw data is an individual survey, the count is always 1 and the score can either be -1,0,1. 由于原始数据的每一行都是单独的调查,因此计数始终为1,得分可以为-1,0,1。

       Date        Location    Person  Driver1  Driver2  Surveys   Question1   
-----------------------------------------------------------------------------
 4/30/2014 21:41    a123b      xyz234   Quest    Ion       1         -1

My goal is to: 我的目标是:

  • Create a new raw data by aggregating the daily total surveys (sum) and mean scores per question 通过汇总每日总调查(总和)和每个问题的平均分数来创建新的原始数据
  • This should be aa daily (no timestamp) level per location and per person and per driver (4 levels) 这应该是每个位置,每个人和每个驾驶员的每日(无时间戳)级别(4个级别)

      Date Location Person Driver1 Driver2 Surveys Question1 ----------------------------------------------------------------------------- 4/30/2014 a123b xyz234 Quest Ion 3 0.33 4/30/2014 a123b xyz234 Quest Bear 6 1 

This will vastly reduce the file size but still give me detailed data. 这将大大减少文件大小,但仍能提供详细的数据。 I want to know the performance of each person for survey drivers per day so I can track monthly/weekly progress. 我想知道每个人每天对调查司机的表现,因此我可以跟踪每月/每周的进度。

I assume it must be something like: 我认为它一定是这样的:

df2 = df.groupby['Date','Location','Person','Driver1','Driver2','Driver3','Driver4']
df2['Surveys'].sum()
df2['Question1'].mean()

You're close. 你近了 You need some () around that groupby 您需要在(groupby)周围添加一些()

df2 = df.groupby(['Date','Location','Person','Driver1','Driver2','Driver3','Driver4'])

Then you combine the next two lines into one if you'd like 然后,如果需要,可以将下两行合并为一

df2.agg({'Surveys' : 'sum', 'Question1' : 'mean'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM