Adding new column in pandas dataframe based on another column

Question

I have a dataframe that has a column for bmi based on that column I want to make another column which will show the bmi range respect to the bmi value of that row. Below is my code:

for i in range(df["bmi"].count()):
if df["bmi"][i] < 18.5:
    df["bmi_category"] = "Under Weight"
elif 25 > df["bmi"][i] >= 18.5:
    df["bmi_category"] = "Healthy Weight"
elif 30 > df["bmi"][i] >= 25:
    df["bmi_category"] = "Overweight"
elif df["bmi"][i] >= 30:
    df["bmi_category"] = "Obese"

But when I am running this code, I am getting this error.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 228

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-220-e7569ff34eec> in <module>
      1 for i in range(cardio["bmi"].count()):
----> 2     if cardio["bmi"][i] < 18.5:
      3         cardio["bmi_category"] = "Under Weight"
      4     elif 25 > cardio["bmi"][i] >= 18.5:
      5         cardio["bmi_category"] = "Healthy Weight"

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    849 
    850         elif key_is_scalar:
--> 851             return self._get_value(key)
    852 
    853         if is_hashable(key):

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    957 
    958         # Similar to Index.get_value, but we do not fall back to positional
--> 959         loc = self.index.get_loc(label)
    960         return self.index._get_values_for_loc(self, loc, label)
    961 

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 228

Can anyone tell me what did I do wrong here? And how to fix this?

Answer 1

The following maps values from the bmi column to values in the bmi_category column

def get_category(bmi):
    if not bmi:
        return None
    if bmi < 18.5:
        return "Under Weight"
    if bmi < 25:
        return "Healthy Weight"
    if bmi < 30:
        return "Overweight"
    return "Obese"

df['bmi_category'] = df['bmi'].apply(get_category)

PS If you find yourself iterating over a dataframe there's almost always a function that will do it more quickly and cleanly.

Answer 2

You can use pd.cut to do this efficiently.

df = pd.DataFrame(np.random.randint(16,35,(50,1)), columns=["bmi"])
df['bmi_category'] = pd.cut(df['bmi'], [0, 18.5, 25, 30, np.infty], labels=["Under Weight", "Healthy Weight", "Overweight", "Obese"], right=False)

Adding new column in pandas dataframe based on another column

Question

2 answers

solution1
2 ACCPTED 2021-04-02 21:11:49

solution2
1 2021-04-02 21:25:54

Adding new column in pandas dataframe based on another column

Question

2 answers

solution1 2 ACCPTED 2021-04-02 21:11:49

solution2 1 2021-04-02 21:25:54

solution1
2 ACCPTED 2021-04-02 21:11:49

solution2
1 2021-04-02 21:25:54