Randomly selecting a subset of rows from a pandas dataframe based on existing column values

Question

I have a dataframe that contains multiple columns and would like to randomly select an equal number of rows based on the values of specific column. I thought of using a df.groupby['...'] but it did not work. Here is an example:

数据框

Assume that I would like to randomly select one row per GroupID, how can I accomplish this? For example, let's say I select one random row per GroupID, the result would yield the following:

数据帧随机选择

such that it outputs a single row based on the values in GroupID. Assume, for example, that rows are sorted by GroupID (from asc to desc), then select an "n" number of rows from ones that pertain to GroupID 1, 2, 3, and so on. Any information would definitely be helpful.

Also, if I need to select a specific number of rows per GroupID (let's say 1 row for GroupID=100, 4 rows for GroupID=200, etc.), any ideas?

[update] I made a minor revision or extension using the recommended answer below to selectively choose specific n values for each group using the following:

samples = []
values = [1,  1,  2, ...]
index = 0
for group in df.GroupID.unique():
    s = df.loc[df.GroupID== group].sample(n=values[index ]).reset_index(drop=True)
    samples.append(s)
    index = index + 1
    
sample = pd.concat(samples, axis=0)

Answer 1

I hope this code snippet will work for you

samples = []
for group in df.GroupID.unique():
    s = df.loc[df.GroupID== group].sample(n=1).reset_index(drop=True)
    samples.append(s)
    
sample = pd.concat(samples, axis=0)

The code will take each 'GroupID' and sample observation from that subgroup. You can concatenate the subsamples (with one GroupID) for the needed sample.

Randomly selecting a subset of rows from a pandas dataframe based on existing column values

Question

1 answers

solution1
1 ACCPTED 2022-04-13 13:15:44

Randomly selecting a subset of rows from a pandas dataframe based on existing column values

Question

1 answers

solution1 1 ACCPTED 2022-04-13 13:15:44

solution1
1 ACCPTED 2022-04-13 13:15:44