简体   繁体   中英

How can randomly select records of a data frame based on values of a column of that data frame, while there are some conditions as well?

I have a dataset like this:

在此处输入图片说明

df

I want to select "MI" number of samples and add to another dataframe. For instance: if Ranking = 0, I want to randomly select MI=2 rows. or if Ranking = 1, I want to select MI=5 rows. This is my code,but it does not work

All_M=pd.DataFrame()
A= df['Ranking'].min()

for i in range(0 , len(df)):
    x6 = df[(df['Ranking'] == A)].apply(lambda x:x.sample(int(df["MI"][i])).reset_index(drop=True))
    All_M= x6.append(All_M)
    A = A + 1 

IIUC, I think you can do this without loops using a groupby statement:

 new_df = df.groupby('rank', group_keys=False)\
            .apply(lambda x: x.sample(x.iloc[0, x.columns.get_loc('MI')]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM