Example Dataframe:
Name Group_Id
AAA 1
ABC 1
BDF 1
CCC 2
XYZ 2
DEF 3
How could I randomly select fixed number of rows for each Group_Id
? This answer suggests a method to use:
df.groupby('Group_Id').apply(lambda x: x.sample(2)).reset_index(drop=True)
But it throws an error if there is any group which has less than 2
rows. I want to be able to select all rows in that case. .head()
allows to do that but I want random samples and not the initial rows.
Say that I want max two random draws per Group_Id
, I would get:
Name Group_Id
AAA 1
BDF 1
CCC 2
XYZ 2
DEF 3
You can choose to sample only if you have more row:
n = 2
(df.groupby('Group_Id')
.apply(lambda x: x.sample(n) if len(x)>n else x )
.reset_index(drop=True)
)
You can also try shuffling the whole data and groupby().head()
:
df.sample(frac=1).groupby('Group_Id').head(2)
Output:
Name Group_Id
5 DEF 3
0 AAA 1
2 BDF 1
3 CCC 2
4 XYZ 2
You can shuffle each subgroup and take the first n rows. It will automatically take the min of n or actual.
n=2
df2 = df.groupby('Group_Id').apply(lambda x: x.sample(frac=1)[:n]).reset_index(drop=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.