简体   繁体   English

数据透视表还是大熊猫分组依据?

[英]Pivot Tables or Group By for Pandas?

I have a hopefully straightforward question that has been giving me a lot of difficulty for the last 3 hours. 我有一个非常希望直截了当的问题,在过去的3个小时中,这一直给我带来很多困难。 It should be easy. 应该很容易。

Here's the challenge. 这是挑战。

I have a pandas dataframe: 我有一个熊猫数据框:

+--------------------------+
|     Col 'X'    Col 'Y'  |
+--------------------------+
|     class 1      cat 1  |
|     class 2      cat 1  |
|     class 3      cat 2  |
|     class 2      cat 3  |
+--------------------------+

What I am looking to transform the dataframe into: 我想要将数据框转换为:

+------------------------------------------+
|                  cat 1    cat 2    cat 3 |
+------------------------------------------+
|     class 1         1        0        0  |
|     class 2         1        0        1  |
|     class 3         0        1        0  |
+------------------------------------------+

Where the values are value counts. 值是值计数。 Anybody have any insight? 有人有见识吗? Thanks! 谢谢!

Here are couple of ways to reshape your data df 这是重塑数据df的几种方法

In [27]: df
Out[27]:
     Col X  Col Y
0  class 1  cat 1
1  class 2  cat 1
2  class 3  cat 2
3  class 2  cat 3

1) Using pd.crosstab() 1)使用pd.crosstab()

In [28]: pd.crosstab(df['Col X'], df['Col Y'])
Out[28]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

2) Or, use groupby on 'Col X','Col Y' with unstack over Col Y , then fill NaNs with zeros. 2)或者,使用groupby'Col X','Col Y'unstackCol Y ,然后填写NaNs零。

In [29]: df.groupby(['Col X','Col Y']).size().unstack('Col Y', fill_value=0)
Out[29]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

3) Or, use pd.pivot_table() with index=Col X , columns=Col Y 3)或者,将pd.pivot_table()index=Col Xcolumns=Col Y

In [30]: pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
Out[30]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

4) Or, use set_index with unstack 4)或者,使用set_indexunstack

In [492]: df.assign(v=1).set_index(['Col X', 'Col Y'])['v'].unstack(fill_value=0)
Out[492]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM