简体   繁体   中英

How to obtain a balanced dataframe in Python

I have a DataFrame containing 4000 rows. I'd like to select 20 random rows from this dataframe.

The new DataFrame must be balanced. That means that I have an attribute called default that can take two values, yes or no. Therefore, the new balanced DataFrame must contain 10 samples with yes and 10 samples with no.

Can you help me?

This may not be the most elegant solution.

First group them by class

group_object = df.groupby('class')

Then for each class apply the lambda function

group_object.apply(lambda x:x.sample(frac = 0.0025))

Check the documentation for the sample method

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM