I have a Pandas DataFrame with a column of non-unique numbers. I want to return a different random number for each of the non-unique values, but return the same random number at each row the non-unique value appears ie so the shape of the output dataframe of random numbers matches that of the ungrouped data frame.
I can do this like: df.groupby('NonUnique').transform(lambda x: np.random.rand())
This returns a different random number for each column in df
, as desired.
However, this is slow for large dataframes, but np.random.rand(df.size)
is very fast. Is there any way to achieve what I want in a more efficient way? I can't seem to find a way to vectorise the assignment per group...
Create array by length of unique values, then use factorize
with numpy indexing
for repeating:
np.random.seed(123)
df = pd.DataFrame({'A':list('aaabbb')})
a = np.random.rand(len(df['A'].unique()))
df['B'] = a[pd.factorize(df.A)[0]]
print (df)
A B
0 a 0.696469
1 a 0.696469
2 a 0.696469
3 b 0.286139
4 b 0.286139
5 b 0.286139
Detail :
print (pd.factorize(df.A)[0])
[0 0 0 1 1 1]
I you're grouping by anyway, you can just use ngroup()
df.groupby('column').ngroup()
or
df.groupby('column').transform('ngroup')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.