In a df with 2 columns chain_id
and chain_event_id
, I'm trying to create a third one that counts unique values of chain_event_id
within each chain_id
group. For instance, if chain_id
number 511 has three rows with chain_event_id
values 1,2,1
I expect that new column to hold the value 2.
Consider this sample data set:
d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
df = pd.DataFrame(data=d)
I tried using
df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()
as well as .apply(lambda x: len(x.unique()))
and .agg('nunique')
but the result is the same: Each group gets all NaN values.
The printout for this piece of code:
import pandas as pd
d = {'chain_id': [511,511,511,666],'chain_event_id':[1,2,1,1]}
df = pd.DataFrame(data=d)
print(df)
print(df[df['chain_id'] == 511][['chain_id', 'chain_event_id']])
print(df[df['chain_id'] == 511]['chain_event_id'].unique())
print(df[df['chain_id'] == 511]['chain_event_id'].nunique())
df['events_in_chain'] = df.groupby('chain_id').chain_event_id.nunique()
print(df[df['chain_id'] == 511]['events_in_chain'])
is this:
chain_id chain_event_id
0 511 1
1 511 2
2 511 1
3 666 1
chain_id chain_event_id
0 511 1
1 511 2
2 511 1
[1 2]
2
0 NaN
1 NaN
2 NaN
Name: events_in_chain, dtype: float64
I'm losing my mind here... Why is events_in_chain
keep getting NaN and not 2 :-( What the heck am I missing?
Thanks
IIUC, you want to create a new column with the nunique per group, so you need to use transform('nunique')
:
df['events_in_chain'] = df.groupby('chain_id')['chain_event_id'].transform('nunique')
Output:
chain_id chain_event_id events_in_chain
0 511 1 2
1 511 2 2
2 511 1 2
3 666 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.