[英]New column with unique groupby results in data frame
I have a data frame with duplicate rows ('id').我有一个带有重复行('id')的数据框。
I want to aggregate the data, but first need to sum unique sessions per id.我想汇总数据,但首先需要对每个 id 的唯一会话求和。
id session
123 X
123 X
123 Y
123 Z
234 T
234 T
This code works well, but not when I want to add this new column 'ncount' to my data frame.此代码运行良好,但当我想将此新列“ncount”添加到我的数据框时却不行。
df['ncount'] = df.groupby('id')['session'].nunique().reset_index()
I tried using transform and it didn't work.我尝试使用转换,但它没有用。
df['ncount'] = df.groupby('id')['session'].transform('nunique')
This is the result from the transform code (my data as duplicates id):这是转换代码的结果(我的数据作为重复 id):
id session ncount
123 X 1
123 X 1
123 Y 1
123 Z 1
234 T 1
234 T 1
This is the result I'm interested in:这是我感兴趣的结果:
id session ncount
123 X 3
123 X 3
123 Y 3
123 Z 3
234 T 1
234 T 1
Use the following steps:使用以下步骤:
1.Group data and store in separate variable. 1.分组数据并存储在单独的变量中。
2.Then merge back to original data frame. 2.然后合并回原始数据框。
Code:代码:
import pandas as pd
df = pd.DataFrame({"id":[123,123,123,123,234,234],"session":["X","X","Y","Z","T","T"]})
x = df.groupby(["id"])['session'].nunique().reset_index()
res = pd.merge(df,x,how="left",on="id")
print(res)
You can rename the column names if required.如果需要,您可以重命名列名。
using .count()
使用
.count()
Steps:脚步:
1: Group the data by "id" and count the values of id values then 1:按“id”对数据进行分组,然后计算id值的值
2: Decrease the Count by one for index format and Merge to two DataFrames 2:将索引格式的计数减一并合并到两个数据帧
import pandas as pd
df = pd.DataFrame({"id":[123,123,123,123,234,234],"session":["X","X","Y","Z","T","T"]})
uniq_df = df.groupby(["id"])["session"].count().reset_index()
uniq_df["session"] = uniq_df["session"] - 1
result = pd.merge(df,uniq_df,how="left",on="id")
print(result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.