pandas groupby(col).nunique() return NaN

Question

In a df with 2 columns chain_id and chain_event_id , I'm trying to create a third one that counts unique values of chain_event_id within each chain_id group. For instance, if chain_id number 511 has three rows with chain_event_id values 1,2,1 I expect that new column to hold the value 2.

Consider this sample data set:

 d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
 df = pd.DataFrame(data=d)

I tried using

df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()

as well as .apply(lambda x: len(x.unique())) and .agg('nunique') but the result is the same: Each group gets all NaN values.

The printout for this piece of code:

    import pandas as pd

    d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
    df = pd.DataFrame(data=d)
    print(df)
    print(df[df['chain_id'] == 511][['chain_id', 'chain_event_id']])
    print(df[df['chain_id'] == 511]['chain_event_id'].unique())
    print(df[df['chain_id'] == 511]['chain_event_id'].nunique())
    df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()
    print(df[df['chain_id'] == 511]['events_in_chain'])

is this:

   chain_id  chain_event_id
0       511               1
1       511               2
2       511               1
3       666               1
   chain_id  chain_event_id
0       511               1
1       511               2
2       511               1
[1 2]
2
0   NaN
1   NaN
2   NaN
Name: events_in_chain, dtype: float64

I'm losing my mind here... Why is events_in_chain keep getting NaN and not 2 :-( What the heck am I missing?

Thanks

Answer 1

IIUC, you want to create a new column with the nunique per group, so you need to use transform('nunique') :

df['events_in_chain'] = df.groupby('chain_id')['chain_event_id'].transform('nunique')

Output:

   chain_id  chain_event_id  events_in_chain
0       511               1                2
1       511               2                2
2       511               1                2
3       666               1                1

pandas groupby(col).nunique() return NaN

Question

1 answers

solution1
1 2021-11-06 21:55:11

pandas groupby(col).nunique() return NaN

Question

1 answers

solution1 1 2021-11-06 21:55:11

solution1
1 2021-11-06 21:55:11