根据Pandas数据帧的其他列中的值计算列值

Question

I'm trying to count the number of each category of storm for each unique x and y combination. 我正在尝试计算每个独特的x和y组合的每类风暴的数量。 For example. 例如。 My dataframe looks like: 我的数据框看起来像：

x   y  year  Category
1   1  1988     3
2   1  1977     1
2   1  1999     2
3   2  1990     4

I want to create a dataframe that looks like: 我想创建一个看起来像这样的数据框：

x   y   Category 1   Category 2   Category 3  Category 4
1   1        0           0            1           0
2   1        1           1            0           0
3   2        0           0            0           1

I have tried various combinations of .groupby() and .count() , but I am still not getting the desired result. 我曾尝试各种组合.groupby()和.count()但我仍然没有得到想要的结果。 The closet thing I could get is: 我能得到的壁橱是：

df[['x','y','Category']].groupby(['Category']).count()

However, the result counts for all x and y , not the unique pairs: 但是，结果计算所有x和y ，而不是唯一对：

Cat       x           y     
1       3773         3773
2       1230         1230
3       604          604
4       266          266
5       50           50
NA      27620        27620
TS      16884        16884

Does anyone know how to do a count operation on one column based on the uniqueness of two other columns in a dataframe? 有没有人知道如何根据数据框中另外两列的唯一性对一列进行计数操作？

Answer 1

pivot_table sounds like what you want. pivot_table听起来像你想要的。 A bit of a hack is to add a column of 1 's to use to count. 一点点黑客就是添加一列1来用来计算。 This allows pivot_table to add 1 for each occurrence of a particular x - y and Category combination. 这允许pivot_table为特定x - y和Category组合的每次出现添加1 。 You will set this new column as your value parameter in pivot_table and the aggfunc paraemter to np.sum . 您将在pivot_table将此新列设置为value参数，并将aggfunc paraemter设置为np.sum 。 You'll probably want to set fill_value to 0 as well: 您可能还想将fill_value设置为0 ：

df['count'] = 1
result = df.pivot_table(
    index=['x', 'y'], columns='Category', values='count',
    fill_value=0, aggfunc=np.sum
)

result : result ：

Category  1  2  3  4
x y                 
1 1       0  0  1  0
2 1       1  1  0  0
3 2       0  0  0  1

If you're interested in keeping x and y as columns and having the other column names as Category X , you can rename the columns and use reset_index : 如果您有兴趣将x和y保持为列并将其他列名称作为Category X ，则可以重命名列并使用reset_index ：

result.columns = [f'Category {x}' for x in result.columns]
result = a.reset_index()

Answer 2

You can use pd.get_dummies after setting index using set_index , then use sum with level parameter to collapse rows: 您可以使用pd.get_dummies使用设定索引之后set_index ，然后用sum与level参数塌陷行：

pd.get_dummies(df.set_index(['x','y'])['Category'].astype(str),
               prefix='Category ', 
               prefix_sep='')\
  .sum(level=[0,1])\
  .reset_index()

Output: 输出：

   x  y  Category 1  Category 2  Category 3  Category 4
0  1  1           0           0           1           0
1  2  1           1           1           0           0
2  3  2           0           0           0           1

Answer 3

Or use groupby twice, with a lot of additional, ie get_dummies with apply etc... 或者使用groupby两次，还有很多额外的，即get_dummies with apply等...

Like: 喜欢：

>>> df.join(df.groupby(['x','y'])['Category']
           .apply(lambda x: x.astype(str).str.get_dummies().add_prefix('Category ')))
           .groupby(['x','y']).sum().fillna(0).drop(['year','Category'],1).reset_index()
   x  y  Category 1  Category 2  Category 3  Category 4
0  1  1         0.0         0.0         1.0         0.0
1  2  1         1.0         1.0         0.0         0.0
2  3  2         0.0         0.0         0.0         1.0
>>>

Answer 4

You can use groupby first: 您可以先使用groupby ：

df_new = df.groupby(['x', 'y', 'Category']).count()
df_new
                  year  count
x   y   Category        
1   1      3       1    1
2   1      1       1    1
           2       1    1
3   2      4       1    1

Then pivot_table 然后是pivot_table

df_new = df_new.pivot_table(index=['x', 'y'], columns='Category', values='count', fill_value=0)
df_new
Category    1   2   3   4
x   y               
1   1       0   0   1   0
2   1       1   1   0   0
3   2       0   0   0   1

根据Pandas数据帧的其他列中的值计算列值

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-02-05 02:59:54

解决方案2
1 2019-02-05 03:54:02

解决方案3
0 2019-02-05 03:04:37

解决方案4
0 2019-02-05 03:05:33

根据Pandas数据帧的其他列中的值计算列值

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-02-05 02:59:54

解决方案2 1 2019-02-05 03:54:02

解决方案3 0 2019-02-05 03:04:37

解决方案4 0 2019-02-05 03:05:33

解决方案1
2 已采纳 2019-02-05 02:59:54

解决方案2
1 2019-02-05 03:54:02

解决方案3
0 2019-02-05 03:04:37

解决方案4
0 2019-02-05 03:05:33