[英]Get count of particular values and the total based on another column value in dataframe using Pandas
I have the following dataframe df
:我有以下数据框df
:
names名字 | status地位 |
---|---|
John约翰 | Completed完全的 |
James詹姆士 | To Do去做 |
Jill吉尔 | To Do去做 |
Robert罗伯特 | In Progress进行中 |
Jill吉尔 | To Do去做 |
Jill吉尔 | To Do去做 |
Marina码头 | Completed完全的 |
Evy艾薇 | Completed完全的 |
Evy艾薇 | Completed完全的 |
Now I want the count of each type of status for each user.现在我想要每个用户的每种状态的计数。 I can get it like this for all types of statuses.对于所有类型的状态,我都可以得到这样的结果。
df = pd.crosstab(df.names,df.status).reset_index("names")
So now the resulting df
is所以现在产生的df
是
status地位 | names名字 | Completed完全的 | In Progress进行中 | To Do去做 |
---|---|---|---|---|
0 0 | James詹姆士 | 0 0 | 0 0 | 1 1 |
1 1 | Robert罗伯特 | 0 0 | 1 1 | 0 0 |
2 2 | John约翰 | 1 1 | 0 0 | 0 0 |
3 3 | Marina码头 | 1 1 | 0 0 | 0 0 |
4 4 | Jill吉尔 | 0 0 | 0 0 | 3 3 |
5 5 | Evy艾薇 | 2 2 | 0 0 | 0 0 |
So my problem is how can I specify only a particular type of status value to be counted?所以我的问题是如何仅指定要计算的特定类型的状态值? For eg: I want only the values of In Progress
and Completed
and not To Do
.例如:我只想要In Progress
和Completed
而不是To Do
。 And how can I add a extra column to the above called as Total Statuses
, that will actually be the total number of rows for each name in the original dataframe?以及如何在上面添加一个额外的列,称为Total Statuses
,这实际上是原始数据框中每个名称的总行数?
Desired Dataframe:所需的数据帧:
status地位 | names名字 | Completed完全的 | In Progress进行中 | Total全部的 |
---|---|---|---|---|
0 0 | James詹姆士 | 0 0 | 0 0 | 1 1 |
1 1 | Robert罗伯特 | 0 0 | 1 1 | 1 1 |
2 2 | John约翰 | 1 1 | 0 0 | 1 1 |
3 3 | Marina码头 | 1 1 | 0 0 | 1 1 |
4 4 | Jill吉尔 | 0 0 | 0 0 | 3 3 |
5 5 | Evy艾薇 | 2 2 | 0 0 | 2 2 |
Another way:其他方式:
pass margins
and margins_name
parameters in pd.crosstab()
:在pd.crosstab()
传递margins
和margins_name
参数:
df=(pd.crosstab(df.names,df.status,margins=True,margins_name='Total').iloc[:-1]
.reset_index().drop('To Do',1))
OR要么
via crosstab()
+ assign()
通过crosstab()
+ assign()
df=(pd.crosstab(df.names,df.status).assign(Total=lambda x:x.sum(1))
.reset_index().drop('To Do',1))
OR要么
In 2 steps:分两步:
df=pd.crosstab(df.names,df.status)
df=df.assign(Total=df.sum(1)).drop('To Do',1).reset_index()
You can create the total from the addition of the three previous columns:您可以通过添加前三列来创建总计:
df['Total'] = (df['Completed'] + df['In Progress'] + df['To Do'])
Then you can drop the 'to-do' from your new data frame as follows :然后,您可以从新数据框中删除“待办事项”,如下所示:
df = df.drop(columns=['To Do'])
df = pd.DataFrame({'names': ['John', 'James', 'Jill', 'Robert', 'Jill', 'Jill', 'Marina', 'Evy', 'Evy'],
'status':['Completed', 'To Do', 'To Do', 'In Progress', 'To Do', 'To Do', 'Completed', 'Completed', 'Completed']})
df = pd.crosstab(df.names,df.status).reset_index("names")
df['Total'] = df['Completed'] + df['In Progress'] + df['To Do']
df = df.drop(columns=['To Do'])
print(df)
Output:输出:
status names Completed In Progress Total
0 Evy 2 0 2
1 James 0 0 1
2 Jill 0 0 3
3 John 1 0 1
4 Marina 1 0 1
5 Robert 0 1 1
I can't comprehend what kind of sorting system you are using.我无法理解您使用的是哪种分类系统。 But I think you will manage to do that yourself.但我认为你自己会设法做到这一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.