简体   繁体   English

使用 Pandas 根据数据框中的另一列值获取特定值的计数和总数

[英]Get count of particular values and the total based on another column value in dataframe using Pandas

I have the following dataframe df :我有以下数据框df

names名字 status地位
John约翰 Completed完全的
James詹姆士 To Do去做
Jill吉尔 To Do去做
Robert罗伯特 In Progress进行中
Jill吉尔 To Do去做
Jill吉尔 To Do去做
Marina码头 Completed完全的
Evy艾薇 Completed完全的
Evy艾薇 Completed完全的

Now I want the count of each type of status for each user.现在我想要每个用户的每种状态的计数。 I can get it like this for all types of statuses.对于所有类型的状态,我都可以得到这样的结果。

df = pd.crosstab(df.names,df.status).reset_index("names")

So now the resulting df is所以现在产生的df

status地位 names名字 Completed完全的 In Progress进行中 To Do去做
0 0 James詹姆士 0 0 0 0 1 1
1 1 Robert罗伯特 0 0 1 1 0 0
2 2 John约翰 1 1 0 0 0 0
3 3 Marina码头 1 1 0 0 0 0
4 4 Jill吉尔 0 0 0 0 3 3
5 5 Evy艾薇 2 2 0 0 0 0

So my problem is how can I specify only a particular type of status value to be counted?所以我的问题是如何仅指定要计算的特定类型的状态值? For eg: I want only the values of In Progress and Completed and not To Do .例如:我只想要In ProgressCompleted而不是To Do And how can I add a extra column to the above called as Total Statuses , that will actually be the total number of rows for each name in the original dataframe?以及如何在上面添加一个额外的列,称为Total Statuses ,这实际上是原始数据框中每个名称的总行数?

Desired Dataframe:所需的数据帧:

status地位 names名字 Completed完全的 In Progress进行中 Total全部的
0 0 James詹姆士 0 0 0 0 1 1
1 1 Robert罗伯特 0 0 1 1 1 1
2 2 John约翰 1 1 0 0 1 1
3 3 Marina码头 1 1 0 0 1 1
4 4 Jill吉尔 0 0 0 0 3 3
5 5 Evy艾薇 2 2 0 0 2 2

Another way:其他方式:

pass margins and margins_name parameters in pd.crosstab() :pd.crosstab()传递marginsmargins_name参数:

df=(pd.crosstab(df.names,df.status,margins=True,margins_name='Total').iloc[:-1]
      .reset_index().drop('To Do',1))

OR要么

via crosstab() + assign()通过crosstab() + assign()

df=(pd.crosstab(df.names,df.status).assign(Total=lambda x:x.sum(1))
      .reset_index().drop('To Do',1))

OR要么

In 2 steps:分两步:

df=pd.crosstab(df.names,df.status)
df=df.assign(Total=df.sum(1)).drop('To Do',1).reset_index()

You can create the total from the addition of the three previous columns:您可以通过添加前三列来创建总计:

df['Total'] = (df['Completed'] + df['In Progress'] + df['To Do'])

Then you can drop the 'to-do' from your new data frame as follows :然后,您可以从新数据框中删除“待办事项”,如下所示:

df = df.drop(columns=['To Do'])
df = pd.DataFrame({'names': ['John', 'James', 'Jill', 'Robert', 'Jill', 'Jill', 'Marina', 'Evy', 'Evy'],
                   'status':['Completed', 'To Do', 'To Do', 'In Progress', 'To Do', 'To Do', 'Completed', 'Completed', 'Completed']})
df = pd.crosstab(df.names,df.status).reset_index("names")
df['Total'] = df['Completed'] + df['In Progress'] + df['To Do']
df = df.drop(columns=['To Do'])
print(df)

Output:输出:

status   names  Completed  In Progress  Total
0          Evy          2            0      2
1        James          0            0      1
2         Jill          0            0      3
3         John          1            0      1
4       Marina          1            0      1
5       Robert          0            1      1

I can't comprehend what kind of sorting system you are using.我无法理解您使用的是哪种分类系统。 But I think you will manage to do that yourself.但我认为你自己会设法做到这一点。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据另一列的日期条件获取熊猫数据框中特定列的值? - How do I get the values of a particular column in a pandas dataframe based on a date condition on another column? 获取基于另一列的列值,其中包含pandas dataframe中的字符串列表 - get column value based on another column with list of strings in pandas dataframe 熊猫:如何通过特定列值的值获取行计数,以及如何将计数添加为另一列。 - Pandas: How to get a row count by the value of a particular column value, and add the count as another column. 根据熊猫数据框中的另一列值计算值的总和? - Calculate the sum of values based on another column value in pandas dataframe? Determining Values in Pandas Dataframe 基于另一列中的前几行值 - Determining Values in Pandas Dataframe Based on Previous Rows Value in Another Column 使用 lambda 如果基于 Pandas dataframe 中另一列的值的列的条件 - Using lambda if condition to column based on value of another column in Pandas dataframe 如何根据pandas dataframe中的另一列过滤dataframe得到记录总数? - How to filter dataframe and get the total number of records based on another column in pandas dataframe? 有没有办法根据 pandas dataframe 中另一列的值获取日期时间范围? - Is there a way to get datetime ranges based on the value of another column in a pandas dataframe? 使用 Pandas 将特定列值替换为另一个数据框列值 - Replace specific column values with another dataframe column value using Pandas 使用 pandas 基于另一列过滤 dataframe 值 - filter dataframe values based on another column using pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM