根據 Pandas DataFrame 中其他列的條件創建新列

Question

我有這個數據框：

+------+--------------+------------+
| ID   | Education    |      Score | 
+------+--------------+------------+
|    1 |  High School |      7.884 |     
|    2 |  Bachelors   |      6.952 |     
|    3 |  High School |      8.185 |   
|    4 |  High School |      6.556 | 
|    5 |  Bachelors   |      6.347 | 
|    6 |  Master      |      6.794 |   
+------+--------------+------------+

我想創建一個對分數列進行分類的新列。 我想給它貼上標簽：“差”、“好”、“非常好”。

這可能看起來像這樣：

+------+--------------+------------+------------+
| ID   | Education    |      Score | Labels     |
+------+--------------+------------+------------+
|    1 |  High School |      7.884 | Good       |
|    2 |  Bachelors   |      6.952 | Bad        |
|    3 |  High School |      8.185 | Very good  |   
|    4 |  High School |      6.556 | Bad        |
|    5 |  Bachelors   |      6.347 | Bad        |
|    6 |  Master      |      6.794 | Bad        |
+------+--------------+------------+------------+

我怎樣才能做到這一點？

提前致謝

Answer 1

import pandas as pd 

# initialize list of lists 
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score']) 

df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df

    ID  Education    Score    Labels
0   1   High School  7.884    Good
1   2   Bachelors    6.952    Bad
2   3   High School  8.185    Very Good
3   4   High School  6.556    Bad
4   5   Bachelors    6.347    Bad
5   6   Master       6.794    Bad

Answer 2

我想這是您想要映射到標簽的分數。 您可以定義一個將分數作為輸入然后返回標簽的映射函數：

def map_score(score):
  if score >= 8:
    return "Very good"
  elif score >= 7:
    return "Good"
  else:
    return "Bad"

df["Labels"] = df["Score"].apply(lambda score: map_score(score))

Answer 3

這是我的解決方案。 我試圖避免使用if-else並使解決方案更加靈活。

主要思想是創建帶有最小值和最大值的labels DataFrame ，然后為每個分數值找到正確的標簽。

編碼：

import pandas as pd


class Label(object):
    name = ''
    min = 0
    max = 100

    def __init__(self, name, min, max):
        self.name = name
        self.min = min
        self.max = max

    def data(self):
        return [self.name, self.min, self.max]


class Labels:
    labels = [
        Label('Bad', 0, 7).data(),
        Label('Good', 7, 8).data(),
        Label('Very good', 8, 100).data()]

    labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])

    def get_label(score):
        lbs = Labels.labels_df
        tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
        return tlab.Label.values[0]


class edu:
    hs = 'High School'
    b = 'Bachelors'
    m = 'Master'


df = pd.DataFrame({
        'ID': range(6),
        'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
        'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})

df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)

print(df)

輸出：

   ID    Education  Score      Label
0   0  High School  7.884       Good
1   1    Bachelors  6.952        Bad
2   2  High School  8.185  Very good
3   3  High School  6.556        Bad
4   4    Bachelors  6.347        Bad
5   5       Master  6.794        Bad

根據 Pandas DataFrame 中其他列的條件創建新列

問題描述

3 個解決方案

解決方案1
5 已采納 2020-01-08 09:16:40

解決方案2
4 2020-01-08 09:04:29

解決方案3
1 2020-01-08 10:41:07

根據 Pandas DataFrame 中其他列的條件創建新列

問題描述

3 個解決方案

解決方案1 5 已采納 2020-01-08 09:16:40

解決方案2 4 2020-01-08 09:04:29

解決方案3 1 2020-01-08 10:41:07

解決方案1
5 已采納 2020-01-08 09:16:40

解決方案2
4 2020-01-08 09:04:29

解決方案3
1 2020-01-08 10:41:07