简体   繁体   中英

How do I efficiently assign a single value per groupby group in Pandas

I have a Pandas DataFrame with a column of non-unique numbers. I want to return a different random number for each of the non-unique values, but return the same random number at each row the non-unique value appears ie so the shape of the output dataframe of random numbers matches that of the ungrouped data frame.

I can do this like: df.groupby('NonUnique').transform(lambda x: np.random.rand())

This returns a different random number for each column in df , as desired.

However, this is slow for large dataframes, but np.random.rand(df.size) is very fast. Is there any way to achieve what I want in a more efficient way? I can't seem to find a way to vectorise the assignment per group...

Create array by length of unique values, then use factorize with numpy indexing for repeating:

np.random.seed(123)

df = pd.DataFrame({'A':list('aaabbb')})

a = np.random.rand(len(df['A'].unique()))

df['B'] = a[pd.factorize(df.A)[0]]
print (df)
   A         B
0  a  0.696469
1  a  0.696469
2  a  0.696469
3  b  0.286139
4  b  0.286139
5  b  0.286139

Detail :

print (pd.factorize(df.A)[0])
[0 0 0 1 1 1]

I you're grouping by anyway, you can just use ngroup()

df.groupby('column').ngroup()

or

df.groupby('column').transform('ngroup')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM