I have a DataFrame containing 4000 rows. I'd like to select 20 random rows from this dataframe.
The new DataFrame must be balanced. That means that I have an attribute called default that can take two values, yes or no. Therefore, the new balanced DataFrame must contain 10 samples with yes and 10 samples with no.
Can you help me?
This may not be the most elegant solution.
First group them by class
group_object = df.groupby('class')
Then for each class apply the lambda function
group_object.apply(lambda x:x.sample(frac = 0.0025))
Check the documentation for the sample method
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.