How can randomly select records of a data frame based on values of a column of that data frame, while there are some conditions as well?

Question

I have a dataset like this:

df

I want to select "MI" number of samples and add to another dataframe. For instance: if Ranking = 0, I want to randomly select MI=2 rows. or if Ranking = 1, I want to select MI=5 rows. This is my code,but it does not work

All_M=pd.DataFrame()
A= df['Ranking'].min()

for i in range(0 , len(df)):
    x6 = df[(df['Ranking'] == A)].apply(lambda x:x.sample(int(df["MI"][i])).reset_index(drop=True))
    All_M= x6.append(All_M)
    A = A + 1

Answer 1

IIUC, I think you can do this without loops using a groupby statement:

 new_df = df.groupby('rank', group_keys=False)\
            .apply(lambda x: x.sample(x.iloc[0, x.columns.get_loc('MI')]))

How can randomly select records of a data frame based on values of a column of that data frame, while there are some conditions as well?

Question

1 answers

solution1
0 ACCPTED 2019-09-04 19:50:07

How can randomly select records of a data frame based on values of a column of that data frame, while there are some conditions as well?

Question

1 answers

solution1 0 ACCPTED 2019-09-04 19:50:07

solution1
0 ACCPTED 2019-09-04 19:50:07