具有唯一 groupby 的新列在数据框中产生

Question

I have a data frame with duplicate rows ('id').我有一个带有重复行（'id'）的数据框。

I want to aggregate the data, but first need to sum unique sessions per id.我想汇总数据，但首先需要对每个 id 的唯一会话求和。

id     session
123      X
123      X 
123      Y
123      Z
234      T
234      T

This code works well, but not when I want to add this new column 'ncount' to my data frame.此代码运行良好，但当我想将此新列“ncount”添加到我的数据框时却不行。

df['ncount'] = df.groupby('id')['session'].nunique().reset_index()

I tried using transform and it didn't work.我尝试使用转换，但它没有用。

df['ncount'] = df.groupby('id')['session'].transform('nunique')

This is the result from the transform code (my data as duplicates id):这是转换代码的结果（我的数据作为重复 id）：

id     session    ncount
123      X          1
123      X          1
123      Y          1
123      Z          1
234      T          1
234      T          1

This is the result I'm interested in:这是我感兴趣的结果：

id     session    ncount
123      X          3
123      X          3
123      Y          3
123      Z          3
234      T          1
234      T          1

Answer 1

Use the following steps:使用以下步骤：

1.Group data and store in separate variable. 1.分组数据并存储在单独的变量中。

2.Then merge back to original data frame. 2.然后合并回原始数据框。

Code:代码：

import pandas as pd

df = pd.DataFrame({"id":[123,123,123,123,234,234],"session":["X","X","Y","Z","T","T"]})

x = df.groupby(["id"])['session'].nunique().reset_index() 

res = pd.merge(df,x,how="left",on="id")

print(res)

You can rename the column names if required.如果需要，您可以重命名列名。

Answer 2

using .count()使用.count()

Steps:脚步：

1: Group the data by "id" and count the values of id values then 1：按“id”对数据进行分组，然后计算id值的值

2: Decrease the Count by one for index format and Merge to two DataFrames 2：将索引格式的计数减一并合并到两个数据帧

import pandas as pd

df = pd.DataFrame({"id":[123,123,123,123,234,234],"session":["X","X","Y","Z","T","T"]})

uniq_df = df.groupby(["id"])["session"].count().reset_index()
uniq_df["session"] = uniq_df["session"] - 1

result = pd.merge(df,uniq_df,how="left",on="id")

print(result)

具有唯一 groupby 的新列在数据框中产生

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-12 06:05:56

解决方案2
0 2020-08-12 06:47:41

具有唯一 groupby 的新列在数据框中产生

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-12 06:05:56

解决方案2 0 2020-08-12 06:47:41

解决方案1
1 已采纳 2020-08-12 06:05:56

解决方案2
0 2020-08-12 06:47:41