使用 Pandas 根据数据框中的另一列值获取特定值的计数和总数

Question

I have the following dataframe df :我有以下数据框df ：

names名字	status地位
John约翰	Completed完全的
James詹姆士	To Do去做
Jill吉尔	To Do去做
Robert罗伯特	In Progress进行中
Jill吉尔	To Do去做
Jill吉尔	To Do去做
Marina码头	Completed完全的
Evy艾薇	Completed完全的
Evy艾薇	Completed完全的

Now I want the count of each type of status for each user.现在我想要每个用户的每种状态的计数。 I can get it like this for all types of statuses.对于所有类型的状态，我都可以得到这样的结果。

df = pd.crosstab(df.names,df.status).reset_index("names")

So now the resulting df is所以现在产生的df是

status地位	names名字	Completed完全的	In Progress进行中	To Do去做
0 0	James詹姆士	0 0	0 0	1 1
1 1	Robert罗伯特	0 0	1 1	0 0
2 2	John约翰	1 1	0 0	0 0
3 3	Marina码头	1 1	0 0	0 0
4 4	Jill吉尔	0 0	0 0	3 3
5 5	Evy艾薇	2 2	0 0	0 0

So my problem is how can I specify only a particular type of status value to be counted?所以我的问题是如何仅指定要计算的特定类型的状态值？ For eg: I want only the values of In Progress and Completed and not To Do .例如：我只想要In Progress和Completed而不是To Do 。 And how can I add a extra column to the above called as Total Statuses , that will actually be the total number of rows for each name in the original dataframe?以及如何在上面添加一个额外的列，称为Total Statuses ，这实际上是原始数据框中每个名称的总行数？

Desired Dataframe:所需的数据帧：

status地位	names名字	Completed完全的	In Progress进行中	Total全部的
0 0	James詹姆士	0 0	0 0	1 1
1 1	Robert罗伯特	0 0	1 1	1 1
2 2	John约翰	1 1	0 0	1 1
3 3	Marina码头	1 1	0 0	1 1
4 4	Jill吉尔	0 0	0 0	3 3
5 5	Evy艾薇	2 2	0 0	2 2

Answer 1

Another way:其他方式：

pass margins and margins_name parameters in pd.crosstab() :在pd.crosstab()传递margins和margins_name参数：

df=(pd.crosstab(df.names,df.status,margins=True,margins_name='Total').iloc[:-1]
      .reset_index().drop('To Do',1))

OR要么

via crosstab() + assign()通过crosstab() + assign()

df=(pd.crosstab(df.names,df.status).assign(Total=lambda x:x.sum(1))
      .reset_index().drop('To Do',1))

OR要么

In 2 steps:分两步：

df=pd.crosstab(df.names,df.status)
df=df.assign(Total=df.sum(1)).drop('To Do',1).reset_index()

Answer 2

You can create the total from the addition of the three previous columns:您可以通过添加前三列来创建总计：

df['Total'] = (df['Completed'] + df['In Progress'] + df['To Do'])

Then you can drop the 'to-do' from your new data frame as follows :然后，您可以从新数据框中删除“待办事项”，如下所示：

df = df.drop(columns=['To Do'])

Answer 3

df = pd.DataFrame({'names': ['John', 'James', 'Jill', 'Robert', 'Jill', 'Jill', 'Marina', 'Evy', 'Evy'],
                   'status':['Completed', 'To Do', 'To Do', 'In Progress', 'To Do', 'To Do', 'Completed', 'Completed', 'Completed']})
df = pd.crosstab(df.names,df.status).reset_index("names")
df['Total'] = df['Completed'] + df['In Progress'] + df['To Do']
df = df.drop(columns=['To Do'])
print(df)

Output:输出：

status   names  Completed  In Progress  Total
0          Evy          2            0      2
1        James          0            0      1
2         Jill          0            0      3
3         John          1            0      1
4       Marina          1            0      1
5       Robert          0            1      1

I can't comprehend what kind of sorting system you are using.我无法理解您使用的是哪种分类系统。 But I think you will manage to do that yourself.但我认为你自己会设法做到这一点。

使用 Pandas 根据数据框中的另一列值获取特定值的计数和总数

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-07-29 12:11:38

解决方案2
0 2021-07-29 11:31:00

解决方案3
0 2021-07-29 11:38:01

使用 Pandas 根据数据框中的另一列值获取特定值的计数和总数

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-07-29 12:11:38

解决方案2 0 2021-07-29 11:31:00

解决方案3 0 2021-07-29 11:38:01

解决方案1
1 已采纳 2021-07-29 12:11:38

解决方案2
0 2021-07-29 11:31:00

解决方案3
0 2021-07-29 11:38:01