简体   繁体   English

熊猫 groupby(col).nunique() 返回 NaN

[英]pandas groupby(col).nunique() return NaN

In a df with 2 columns chain_id and chain_event_id , I'm trying to create a third one that counts unique values of chain_event_id within each chain_id group.在具有 2 列chain_idchain_event_id的 df 中,我试图创建第三个计算每个chain_id组中chain_event_id唯一值。 For instance, if chain_id number 511 has three rows with chain_event_id values 1,2,1 I expect that new column to hold the value 2.例如,如果chain_id编号 511 有三行, chain_event_id值为1,2,1我希望新列包含值 2。

Consider this sample data set:考虑这个样本数据集:

 d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
 df = pd.DataFrame(data=d)

I tried using我尝试使用

df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()

as well as .apply(lambda x: len(x.unique())) and .agg('nunique') but the result is the same: Each group gets all NaN values.以及.apply(lambda x: len(x.unique())).agg('nunique')但结果是一样的:每个组都得到所有的 NaN 值。

The printout for this piece of code:这段代码的打印输出:

    import pandas as pd

    d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
    df = pd.DataFrame(data=d)
    print(df)
    print(df[df['chain_id'] == 511][['chain_id', 'chain_event_id']])
    print(df[df['chain_id'] == 511]['chain_event_id'].unique())
    print(df[df['chain_id'] == 511]['chain_event_id'].nunique())
    df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()
    print(df[df['chain_id'] == 511]['events_in_chain'])

is this:这是:

   chain_id  chain_event_id
0       511               1
1       511               2
2       511               1
3       666               1
   chain_id  chain_event_id
0       511               1
1       511               2
2       511               1
[1 2]
2
0   NaN
1   NaN
2   NaN
Name: events_in_chain, dtype: float64

I'm losing my mind here... Why is events_in_chain keep getting NaN and not 2 :-( What the heck am I missing?我在这里失去理智......为什么events_in_chain不断得到 NaN 而不是 2 :-( 我到底错过了什么?

Thanks谢谢

IIUC, you want to create a new column with the nunique per group, so you need to use transform('nunique') : IIUC,您想使用每个组的 nunique 创建一个新列,因此您需要使用transform('nunique')

df['events_in_chain'] = df.groupby('chain_id')['chain_event_id'].transform('nunique')

Output:输出:

   chain_id  chain_event_id  events_in_chain
0       511               1                2
1       511               2                2
2       511               1                2
3       666               1                1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM