How to randomly select fixed number of rows (if greater) per group else select all rows in pandas?

Question

Example Dataframe:

    Name Group_Id
    AAA  1
    ABC  1
    BDF  1
    CCC  2
    XYZ  2
    DEF  3

How could I randomly select fixed number of rows for each Group_Id ? This answer suggests a method to use:

df.groupby('Group_Id').apply(lambda x: x.sample(2)).reset_index(drop=True)

But it throws an error if there is any group which has less than 2 rows. I want to be able to select all rows in that case. .head() allows to do that but I want random samples and not the initial rows.

Say that I want max two random draws per Group_Id , I would get:

    Name Group_Id
    AAA  1
    BDF  1
    CCC  2
    XYZ  2
    DEF  3

Answer 1

You can choose to sample only if you have more row:

n = 2
(df.groupby('Group_Id')
   .apply(lambda x: x.sample(n) if len(x)>n else x  )
   .reset_index(drop=True)
)

You can also try shuffling the whole data and groupby().head() :

df.sample(frac=1).groupby('Group_Id').head(2)

Output:

  Name  Group_Id
5  DEF         3
0  AAA         1
2  BDF         1
3  CCC         2
4  XYZ         2

Answer 2

You can shuffle each subgroup and take the first n rows. It will automatically take the min of n or actual.

n=2
df2 = df.groupby('Group_Id').apply(lambda x: x.sample(frac=1)[:n]).reset_index(drop=True)

How to randomly select fixed number of rows (if greater) per group else select all rows in pandas?

Question

2 answers

solution1
1 ACCPTED 2020-10-17 03:16:38

solution2
0 2020-10-17 08:18:59

How to randomly select fixed number of rows (if greater) per group else select all rows in pandas?

Question

2 answers

solution1 1 ACCPTED 2020-10-17 03:16:38

solution2 0 2020-10-17 08:18:59

solution1
1 ACCPTED 2020-10-17 03:16:38

solution2
0 2020-10-17 08:18:59