Starting from this simple dataframe df
:
df = pd.DataFrame({'c':[1,1,2,2,2,2,3,3,3], 'n':[1,2,3,4,5,6,7,8,9], 'N':[1,1,2,2,2,2,2,2,2]})
I'm trying to select N
random value from n
for each c
. So far I managed to groupby and get one single element / group with:
sample = df.groupby('c').apply(lambda x :x.iloc[np.random.randint(0, len(x))])
that returns:
N c n
c
1 1 1 2
2 2 2 4
3 2 3 8
My expected output would be something like:
N c n
c
1 1 1 2
2 2 2 4
2 2 2 3
3 2 3 8
3 2 3 7
so getting 1 sample from c=1 and 2 samples for c=2 and c=3, according to the N
column.
Pandas objects now have a .sample
method to return a random number of rows:
>>> df.groupby('c').apply(lambda g: g.n.sample(g.N.iloc[0]))
c
1 1 2
2 5 6
2 3
3 6 7
7 8
Name: n, dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.