![](/img/trans.png)
[英]Creating new column in a Pandas dataframe by concatenating two other column based on some condition
[英]Creating New Column based on condition on Other Column in Pandas DataFrame
我有这个数据框:
+------+--------------+------------+
| ID | Education | Score |
+------+--------------+------------+
| 1 | High School | 7.884 |
| 2 | Bachelors | 6.952 |
| 3 | High School | 8.185 |
| 4 | High School | 6.556 |
| 5 | Bachelors | 6.347 |
| 6 | Master | 6.794 |
+------+--------------+------------+
我想创建一个对分数列进行分类的新列。 我想给它贴上标签:“差”、“好”、“非常好”。
这可能看起来像这样:
+------+--------------+------------+------------+
| ID | Education | Score | Labels |
+------+--------------+------------+------------+
| 1 | High School | 7.884 | Good |
| 2 | Bachelors | 6.952 | Bad |
| 3 | High School | 8.185 | Very good |
| 4 | High School | 6.556 | Bad |
| 5 | Bachelors | 6.347 | Bad |
| 6 | Master | 6.794 | Bad |
+------+--------------+------------+------------+
我怎样才能做到这一点?
提前致谢
import pandas as pd
# initialize list of lists
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score'])
df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df
ID Education Score Labels
0 1 High School 7.884 Good
1 2 Bachelors 6.952 Bad
2 3 High School 8.185 Very Good
3 4 High School 6.556 Bad
4 5 Bachelors 6.347 Bad
5 6 Master 6.794 Bad
我想这是您想要映射到标签的分数。 您可以定义一个将分数作为输入然后返回标签的映射函数:
def map_score(score):
if score >= 8:
return "Very good"
elif score >= 7:
return "Good"
else:
return "Bad"
df["Labels"] = df["Score"].apply(lambda score: map_score(score))
这是我的解决方案。 我试图避免使用if-else
并使解决方案更加灵活。
主要思想是创建带有最小值和最大值的labels
DataFrame
,然后为每个分数值找到正确的标签。
编码:
import pandas as pd
class Label(object):
name = ''
min = 0
max = 100
def __init__(self, name, min, max):
self.name = name
self.min = min
self.max = max
def data(self):
return [self.name, self.min, self.max]
class Labels:
labels = [
Label('Bad', 0, 7).data(),
Label('Good', 7, 8).data(),
Label('Very good', 8, 100).data()]
labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])
def get_label(score):
lbs = Labels.labels_df
tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
return tlab.Label.values[0]
class edu:
hs = 'High School'
b = 'Bachelors'
m = 'Master'
df = pd.DataFrame({
'ID': range(6),
'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})
df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)
print(df)
输出:
ID Education Score Label
0 0 High School 7.884 Good
1 1 Bachelors 6.952 Bad
2 2 High School 8.185 Very good
3 3 High School 6.556 Bad
4 4 Bachelors 6.347 Bad
5 5 Master 6.794 Bad
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.