在多列上使用pandas groupby函数

Question

I have a DataFrame similar to this: 我有一个类似于以下的DataFrame：

Key    Departure    Species1   Species2   Status
1         R          Carlan     Carlan      D
1         R          Scival     Carex       C
2         R          Carlan     Scival      D
2         R          Scival     Bougra      C  
3         D          Carlan     Carlan      D
3         D          Scival     Scival      C

I want to count the occurrences of each unique Species1 for a given Departure and Status of D of C 我想计算C的D的给定Departure和Status下每个唯一Species1的出现

My desired output is: 我想要的输出是：

Species1   RD    RC    DD    DC
Carlan     2     NaN   1     NaN
Scival     NaN   2     NaN   1

Answer 1

Make a new column that is the combination of Departure and Status 新建一个包含“出发时间”和“状态”的组合的列

df['comb'] = df.Departure + df.Status
df
#  Key Departure Species1 Species2 Status comb
#0   1         R   Carlan   Carlan      D   RD
#1   1         R   Scival    Carex      C   RC
#2   2         R   Carlan   Scival      D   RD
#3   2         R   Scival   Bougra      C   RC
#4   3         D   Carlan   Carlan      D   DD
#5   3         D   Scival   Scival      C   DC

Then you can groupby: 然后，您可以分组：

gb    = df.groupby(['Species1', 'comb'])
gb.groups
#{('Carlan', 'DD'): [4],
#('Carlan', 'RD'): [0, 2],
#('Scival', 'DC'): [5],
#('Scival', 'RC'): [1, 3]}

Now organize the results into a list, where each element is a tuple (column, Series(data, index)) representing a single data point in a new dataframe 现在将结果组织成一个列表，其中每个元素都是一个元组(column, Series(data, index))表示新数据帧中的单个数据点

items = [ (key[1], pandas.Series( [len(val)], index=[key[0]] ) )for key,val in gb.groups.items() ]

And make a new dataframe from the items: 并从以下各项创建一个新的数据框：

result = pandas.from_items( items)
result
#        RC  DC  DD  RD
#Carlan NaN NaN   1   2
#Scival   2   1 NaN NaN

Extra info 额外信息

See this link for ideas on crating new dataframes from various objects. 请参阅此链接，以获取有关从各种对象创建新数据框的想法。 When you want to create a dataframe from individual data points (eg (Species1,comb) ), then from_items is your best option. 当您要根据单个数据点（例如（Species1，comb））创建数据框时， from_items是最佳选择。

Answer 2

Use the pandas.crosstab() method. 使用pandas.crosstab（）方法。 A single line of code: 一行代码：

pd.crosstab(df.Species1, [df.Departure, df.Status])

The resulting table: 结果表：

If you combine with @dermen's 'comb' column, 如果与@dermen的“梳子”列结合使用，

df['comb'] = df.Departure + df.Status
pd.crosstab(df.Species1, df.comb)

you'll get: 你会得到：

If you really want those 'NaN', just tack on a .replace('0', np.nan) , like so (assuming an import numpy as np has already been done): 如果您真的想要那些'NaN'，只需在.replace('0', np.nan) ，就像这样（假设已经完成了import numpy as np已经完成）：

pd.crosstab(df.Species1, df.comb).replace('0', np.nan)

Answer 3

您可以对多个列使用groupby查询，并使用.agg函数来计算出现次数：

df.groupby(['Species1', 'Departure', 'Status']).agg(['count'])

在多列上使用pandas groupby函数

问题描述

3 个解决方案

解决方案1
3 2015-07-21 19:16:15

Extra info 额外信息

解决方案2
2 已采纳 2015-07-21 18:54:56

解决方案3
0 2015-07-22 00:46:50

在多列上使用pandas groupby函数

问题描述

3 个解决方案

解决方案1 3 2015-07-21 19:16:15

Extra info 额外信息

解决方案2 2 已采纳 2015-07-21 18:54:56

解决方案3 0 2015-07-22 00:46:50

解决方案1
3 2015-07-21 19:16:15

解决方案2
2 已采纳 2015-07-21 18:54:56

解决方案3
0 2015-07-22 00:46:50