简体   繁体   English

具有唯一 groupby 的新列在数据框中产生

[英]New column with unique groupby results in data frame

I have a data frame with duplicate rows ('id').我有一个带有重复行('id')的数据框。

I want to aggregate the data, but first need to sum unique sessions per id.我想汇总数据,但首先需要对每个 id 的唯一会话求和。

id     session
123      X
123      X 
123      Y
123      Z
234      T
234      T

This code works well, but not when I want to add this new column 'ncount' to my data frame.此代码运行良好,但当我想将此新列“ncount”添加到我的数据框时却不行。

df['ncount'] = df.groupby('id')['session'].nunique().reset_index()

I tried using transform and it didn't work.我尝试使用转换,但它没有用。

df['ncount'] = df.groupby('id')['session'].transform('nunique')

This is the result from the transform code (my data as duplicates id):这是转换代码的结果(我的数据作为重复 id):

id     session    ncount
123      X          1
123      X          1
123      Y          1
123      Z          1
234      T          1
234      T          1

This is the result I'm interested in:这是我感兴趣的结果:

id     session    ncount
123      X          3
123      X          3
123      Y          3
123      Z          3
234      T          1
234      T          1

Use the following steps:使用以下步骤:

1.Group data and store in separate variable. 1.分组数据并存储在单独的变量中。

2.Then merge back to original data frame. 2.然后合并回原始数据框。

Code:代码:

import pandas as pd

df = pd.DataFrame({"id":[123,123,123,123,234,234],"session":["X","X","Y","Z","T","T"]})

x = df.groupby(["id"])['session'].nunique().reset_index() 

res = pd.merge(df,x,how="left",on="id")

print(res)

You can rename the column names if required.如果需要,您可以重命名列名。

using .count()使用.count()

Steps:脚步:

1: Group the data by "id" and count the values of id values then 1:按“id”对数据进行分组,然后计算id值的值

2: Decrease the Count by one for index format and Merge to two DataFrames 2:将索引格式的计数减一并合并到两个数据帧

import pandas as pd

df = pd.DataFrame({"id":[123,123,123,123,234,234],"session":["X","X","Y","Z","T","T"]})

uniq_df = df.groupby(["id"])["session"].count().reset_index()
uniq_df["session"] = uniq_df["session"] - 1

result = pd.merge(df,uniq_df,how="left",on="id")

print(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM