Slice a Pandas Dataframe according to the percentage of a column category

Question

I would like to slice the dataframe according to the percentage of category in a column of the dataset. If I have a datframe something like this

>> age  height weight obese
   28    6.7   82      0
   22    5.10  67      0
   18    6     77      0
   19    5.2   88      1
   21    5.3   89      1
   24    5.9   68      0

I would like to slice the data based on the obese column with its category percentage. For example:

>> df_equal
   age  height weight obese
   28    6.7   82      0
   21    5.3   89      1

>> df_minority
   age  height weight obese
   28    6.7   82      0
   19    5.2   88      1
   21    5.3   89      1

>> df_majority
   age  height weight obese
   28    6.7   82      0
   22    5.10  67      0
   18    6     77      0
   19    5.2   88      1

What I would like is a percentage of the minority class I mention, slice the complete dataframe based on a particular column percentage. For ex: if I want the category percentage of the column to be 50:50 then my dataframe should contain 50% of samples with obese == 0 and 50% of the samples to be obese == 1, something like df_equal and so on for different percentages

Answer 1

What I would like is a percentage of the minority class I mention

df['obese'].value_counts(normalize=True)

will return the relative frequencies of the unique values .

Slice a Pandas Dataframe according to the percentage of a column category

Question

1 answers

solution1
0 2020-09-25 12:50:35

Slice a Pandas Dataframe according to the percentage of a column category

Question

1 answers

solution1 0 2020-09-25 12:50:35

solution1
0 2020-09-25 12:50:35