繁体   English   中英

根据 Pandas DataFrame 中其他列的条件创建新列

[英]Creating New Column based on condition on Other Column in Pandas DataFrame

我有这个数据框:

+------+--------------+------------+
| ID   | Education    |      Score | 
+------+--------------+------------+
|    1 |  High School |      7.884 |     
|    2 |  Bachelors   |      6.952 |     
|    3 |  High School |      8.185 |   
|    4 |  High School |      6.556 | 
|    5 |  Bachelors   |      6.347 | 
|    6 |  Master      |      6.794 |   
+------+--------------+------------+

我想创建一个对分数列进行分类的新列。 我想给它贴上标签:“差”、“好”、“非常好”。

这可能看起来像这样:

+------+--------------+------------+------------+
| ID   | Education    |      Score | Labels     |
+------+--------------+------------+------------+
|    1 |  High School |      7.884 | Good       |
|    2 |  Bachelors   |      6.952 | Bad        |
|    3 |  High School |      8.185 | Very good  |   
|    4 |  High School |      6.556 | Bad        |
|    5 |  Bachelors   |      6.347 | Bad        |
|    6 |  Master      |      6.794 | Bad        |
+------+--------------+------------+------------+

我怎样才能做到这一点?

提前致谢

import pandas as pd 

# initialize list of lists 
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score']) 

df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df

    ID  Education    Score    Labels
0   1   High School  7.884    Good
1   2   Bachelors    6.952    Bad
2   3   High School  8.185    Very Good
3   4   High School  6.556    Bad
4   5   Bachelors    6.347    Bad
5   6   Master       6.794    Bad

我想这是您想要映射到标签的分数。 您可以定义一个将分数作为输入然后返回标签的映射函数:

def map_score(score):
  if score >= 8:
    return "Very good"
  elif score >= 7:
    return "Good"
  else:
    return "Bad"

df["Labels"] = df["Score"].apply(lambda score: map_score(score))

这是我的解决方案。 我试图避免使用if-else并使解决方案更加灵活。

主要思想是创建带有最小值和最大值的labels DataFrame ,然后为每个分数值找到正确的标签。

编码:

import pandas as pd


class Label(object):
    name = ''
    min = 0
    max = 100

    def __init__(self, name, min, max):
        self.name = name
        self.min = min
        self.max = max

    def data(self):
        return [self.name, self.min, self.max]


class Labels:
    labels = [
        Label('Bad', 0, 7).data(),
        Label('Good', 7, 8).data(),
        Label('Very good', 8, 100).data()]

    labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])

    def get_label(score):
        lbs = Labels.labels_df
        tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
        return tlab.Label.values[0]


class edu:
    hs = 'High School'
    b = 'Bachelors'
    m = 'Master'


df = pd.DataFrame({
        'ID': range(6),
        'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
        'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})

df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)

print(df)

输出:

   ID    Education  Score      Label
0   0  High School  7.884       Good
1   1    Bachelors  6.952        Bad
2   2  High School  8.185  Very good
3   3  High School  6.556        Bad
4   4    Bachelors  6.347        Bad
5   5       Master  6.794        Bad

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM