简体   繁体   English

Pandas:转动数据框并保留其他非数字列

[英]Pandas: pivot dataframe and preserve additional non-numeric column

I have some data in a list format: figures for 150-odd organisations, with a figure for each of a 12-month series. 我有一些列表格式的数据:150多个组织的数据,每个12个月系列的数字。 In its raw form it looks like this: 在它的原始形式,它看起来像这样:

Name Size   Date  Figure
Org1 Medium Jun16 8.36
Org1 Medium Jul16 7.55
Org1 Medium Aug16 8.57
...
Org1 Medium May17 9.41
Org2 Large  Jun16 12.12
Org2 Large  Jul16 11.44
...

So each organisation has a unique name, twelve months of data, and one of three sizes (small, medium, large). 因此,每个组织都有一个唯一的名称,十二个月的数据,以及三种尺寸(小,中,大)之一。 I've successfully pivoted these figures to give me a timeseries for each organisation, ie, 我成功地将这些数字转换为每个组织的时间序列,即

Name Jun16 Jul16 Aug16 Sep16 Oct16...
Org1 8.36  7.55  8.57  7.66  9.43
Org2 12.12 11.44 11.01 12.01 10.44...

But I want to include another column containing the size of each organisation. 但我希望包含另一个包含每个组织大小的列。 The code I've used for the pivot is: 我用于支点的代码是:

dataPivot = dataRaw.pivot_table(index='Name', columns ='Date'],
                              aggfunc='sum', values = 'Figure').fillna(0)

where dataRaw is the raw data read in from a .csv. 其中dataRaw是从.csv读入的原始数据。 I've tried adding 'Size' to the columns field, but this just gives me 12 additional columns for each size! 我已经尝试在“ columns字段中添加'Size' ,但这只为每个大小添加了12个列!

One way of doing that is by using concat after creating a new df based on size ie 一种方法是在创建基于大小的新df之后使用concat

table = df.pivot_table(index='Name', columns ='Date', aggfunc='sum', values = 'Figure').fillna(0)

size = df.groupby('Name').size().to_frame().rename(columns={0:'size'})

ndf = pd.concat([table,size],1)

Output based on sample data: 基于样本数据的输出:

Aug16  Jul16  Jun16  May17  size
Name                                  
Org1   8.57   7.55   8.36   9.41     4
Org2   0.00  11.44  12.12   0.00     2

If you mean to add Size column preset in the dataframe then add that column name to index parameter not columns ie 如果您要在数据框中添加“大小”列预设,则将该列名称添加到索引参数而不是列,即

df.pivot_table(index=['Name','Size'], columns =['Date'],aggfunc='sum', values =['Figure','Size']).fillna(0).reset_index()

Output: 输出:

Name    Size Figure                    
Date                Aug16  Jul16  Jun16 May17
0     Org1  Medium   8.57   7.55   8.36  9.41
1     Org2   Large   0.00  11.44  12.12  0.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM