熊猫 groupby(col).nunique() 返回 NaN

Question

In a df with 2 columns chain_id and chain_event_id , I'm trying to create a third one that counts unique values of chain_event_id within each chain_id group.在具有 2 列chain_id和chain_event_id的 df 中，我试图创建第三个计算每个chain_id组中chain_event_id唯一值。 For instance, if chain_id number 511 has three rows with chain_event_id values 1,2,1 I expect that new column to hold the value 2.例如，如果chain_id编号 511 有三行， chain_event_id值为1,2,1我希望新列包含值 2。

Consider this sample data set:考虑这个样本数据集：

 d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
 df = pd.DataFrame(data=d)

I tried using我尝试使用

df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()

as well as .apply(lambda x: len(x.unique())) and .agg('nunique') but the result is the same: Each group gets all NaN values.以及.apply(lambda x: len(x.unique()))和.agg('nunique')但结果是一样的：每个组都得到所有的 NaN 值。

The printout for this piece of code:这段代码的打印输出：

    import pandas as pd

    d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
    df = pd.DataFrame(data=d)
    print(df)
    print(df[df['chain_id'] == 511][['chain_id', 'chain_event_id']])
    print(df[df['chain_id'] == 511]['chain_event_id'].unique())
    print(df[df['chain_id'] == 511]['chain_event_id'].nunique())
    df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()
    print(df[df['chain_id'] == 511]['events_in_chain'])

is this:这是：

   chain_id  chain_event_id
0       511               1
1       511               2
2       511               1
3       666               1
   chain_id  chain_event_id
0       511               1
1       511               2
2       511               1
[1 2]
2
0   NaN
1   NaN
2   NaN
Name: events_in_chain, dtype: float64

I'm losing my mind here... Why is events_in_chain keep getting NaN and not 2 :-( What the heck am I missing?我在这里失去理智......为什么events_in_chain不断得到 NaN 而不是 2 :-( 我到底错过了什么？

Thanks谢谢

Answer 1

IIUC, you want to create a new column with the nunique per group, so you need to use transform('nunique') : IIUC，您想使用每个组的 nunique 创建一个新列，因此您需要使用transform('nunique') ：

df['events_in_chain'] = df.groupby('chain_id')['chain_event_id'].transform('nunique')

Output:输出：

   chain_id  chain_event_id  events_in_chain
0       511               1                2
1       511               2                2
2       511               1                2
3       666               1                1

熊猫 groupby(col).nunique() 返回 NaN

问题描述

1 个解决方案

解决方案1
1 2021-11-06 21:55:11

熊猫 groupby(col).nunique() 返回 NaN

问题描述

1 个解决方案

解决方案1 1 2021-11-06 21:55:11

解决方案1
1 2021-11-06 21:55:11