简体   繁体   English

一次用于多列的 Pandas 数据透视表

[英]Pandas pivot table for multiple columns at once

Let's say I have a DataFrame:假设我有一个 DataFrame:

   nj  ptype  wd  wpt
0   2      1   2    1
1   3      2   1    2
2   1      1   3    1
3   2      2   3    3
4   3      1   2    2

I would like to aggregate this data using ptype as the index like so:我想使用ptype作为索引来聚合这些数据,如下所示:

             nj             wd            wpt
       1.0  2.0  3.0  1.0  2.0  3.0  1.0  2.0  3.0
    1    1    1    1    0    2    1    2    1    0
    2    0    1    1    1    0    1    0    1    1

You could build each one of the top level columns for the final value by creating a pivot table with aggfunc='count' and then concatenating them all, like so:您可以通过使用aggfunc='count'创建一个数据透视表,然后将它们全部连接起来,为最终值构建每个顶级列,如下所示:

nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
        nj             wd            wpt
         1    2    3    1    2    3    1    2    3
1      1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
2      0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0

But I really dislike this and it feels wrong.但我真的不喜欢这个,感觉不对。 I would like to know if there is a way to do this in a simpler fashion preferably with a builtin method.我想知道是否有办法以更简单的方式做到这一点,最好使用内置方法。 Thanks in advance!提前致谢!

Instead of doing it in one step, you can do the aggregation firstly and then pivot it using unstack method:您可以先进行聚合,然后使用unstack方法将其pivot ,而不是一步完成:

# to do the count of columns nj, wd, wpt against the column ptype using 
# groupby + value_counts
 .apply(lambda g: g.apply(pd.value_counts))

#      nj             wd            wpt
#       1    2    3    1    2    3    1    2    3
#1    1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
#2    0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0

Another option to avoid using apply method:避免使用apply方法的另一种选择:



Naive Timing on the sample data:样本数据的朴素计时

Original solution:原解决方案:

nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
# 100 loops, best of 3: 12 ms per loop

Option one:选项一:

 .apply(lambda g: g.apply(pd.value_counts))
# 100 loops, best of 3: 10.1 ms per loop

Option two:选项二:

# 100 loops, best of 3: 4.3 ms per loop

Another solution using groupby and unstack.另一个使用 groupby 和 unstack 的解决方案。

df2 = pd.concat([df.groupby(['ptype',e])[e].count().unstack() for e in ['nj','wd','wpt']],axis=1).fillna(0).astype(int)    

       nj          wd         wpt        
      1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0
1       1   1   1   0   2   1   2   1   0
2       0   1   1   1   0   1   0   1   1

An easier solution is一个更简单的解决方案是

employee.pivot_table(index= ‘Title’, values= “Salary”, aggfunc= [np.mean, np.median, min, max, np.std], fill_value=0)

In this case, for the salary column we are using different aggregate functions在这种情况下,对于工资列,我们使用了不同的聚合函数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM