简体   繁体   中英

Slice a Pandas Dataframe according to the percentage of a column category

I would like to slice the dataframe according to the percentage of category in a column of the dataset. If I have a datframe something like this

>> age  height weight obese
   28    6.7   82      0
   22    5.10  67      0
   18    6     77      0
   19    5.2   88      1
   21    5.3   89      1
   24    5.9   68      0

I would like to slice the data based on the obese column with its category percentage. For example:

>> df_equal
   age  height weight obese
   28    6.7   82      0
   21    5.3   89      1

>> df_minority
   age  height weight obese
   28    6.7   82      0
   19    5.2   88      1
   21    5.3   89      1

>> df_majority
   age  height weight obese
   28    6.7   82      0
   22    5.10  67      0
   18    6     77      0
   19    5.2   88      1

What I would like is a percentage of the minority class I mention, slice the complete dataframe based on a particular column percentage. For ex: if I want the category percentage of the column to be 50:50 then my dataframe should contain 50% of samples with obese == 0 and 50% of the samples to be obese == 1, something like df_equal and so on for different percentages

What I would like is a percentage of the minority class I mention

df['obese'].value_counts(normalize=True)

will return the relative frequencies of the unique values .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM