簡體   English   中英

基於另一列在 pandas dataframe 中添加新列

[英]Adding new column in pandas dataframe based on another column

我有一個 dataframe,它有一個基於該列的bmi列我想創建另一列,它將顯示相對於該行的 bmi 值的 bmi 范圍。 下面是我的代碼:

for i in range(df["bmi"].count()):
if df["bmi"][i] < 18.5:
    df["bmi_category"] = "Under Weight"
elif 25 > df["bmi"][i] >= 18.5:
    df["bmi_category"] = "Healthy Weight"
elif 30 > df["bmi"][i] >= 25:
    df["bmi_category"] = "Overweight"
elif df["bmi"][i] >= 30:
    df["bmi_category"] = "Obese"

但是當我運行這段代碼時,我得到了這個錯誤。

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 228

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-220-e7569ff34eec> in <module>
      1 for i in range(cardio["bmi"].count()):
----> 2     if cardio["bmi"][i] < 18.5:
      3         cardio["bmi_category"] = "Under Weight"
      4     elif 25 > cardio["bmi"][i] >= 18.5:
      5         cardio["bmi_category"] = "Healthy Weight"

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    849 
    850         elif key_is_scalar:
--> 851             return self._get_value(key)
    852 
    853         if is_hashable(key):

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    957 
    958         # Similar to Index.get_value, but we do not fall back to positional
--> 959         loc = self.index.get_loc(label)
    960         return self.index._get_values_for_loc(self, loc, label)
    961 

c:\users\hridoy\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 228

誰能告訴我我在這里做錯了什么? 以及如何解決這個問題?

以下將bmi列中的值映射到bmi_category列中的值

def get_category(bmi):
    if not bmi:
        return None
    if bmi < 18.5:
        return "Under Weight"
    if bmi < 25:
        return "Healthy Weight"
    if bmi < 30:
        return "Overweight"
    return "Obese"

df['bmi_category'] = df['bmi'].apply(get_category)

PS 如果您發現自己在迭代 dataframe 幾乎總是有一個 function 會更快更干凈地完成它。

您可以使用pd.cut有效地執行此操作。

df = pd.DataFrame(np.random.randint(16,35,(50,1)), columns=["bmi"])
df['bmi_category'] = pd.cut(df['bmi'], [0, 18.5, 25, 30, np.infty], labels=["Under Weight", "Healthy Weight", "Overweight", "Obese"], right=False)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM