[英]Creating new column in a Pandas dataframe by concatenating two other column based on some condition
[英]Creating New Column based on condition on Other Column in Pandas DataFrame
我有這個數據框:
+------+--------------+------------+
| ID | Education | Score |
+------+--------------+------------+
| 1 | High School | 7.884 |
| 2 | Bachelors | 6.952 |
| 3 | High School | 8.185 |
| 4 | High School | 6.556 |
| 5 | Bachelors | 6.347 |
| 6 | Master | 6.794 |
+------+--------------+------------+
我想創建一個對分數列進行分類的新列。 我想給它貼上標簽:“差”、“好”、“非常好”。
這可能看起來像這樣:
+------+--------------+------------+------------+
| ID | Education | Score | Labels |
+------+--------------+------------+------------+
| 1 | High School | 7.884 | Good |
| 2 | Bachelors | 6.952 | Bad |
| 3 | High School | 8.185 | Very good |
| 4 | High School | 6.556 | Bad |
| 5 | Bachelors | 6.347 | Bad |
| 6 | Master | 6.794 | Bad |
+------+--------------+------------+------------+
我怎樣才能做到這一點?
提前致謝
import pandas as pd
# initialize list of lists
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score'])
df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df
ID Education Score Labels
0 1 High School 7.884 Good
1 2 Bachelors 6.952 Bad
2 3 High School 8.185 Very Good
3 4 High School 6.556 Bad
4 5 Bachelors 6.347 Bad
5 6 Master 6.794 Bad
我想這是您想要映射到標簽的分數。 您可以定義一個將分數作為輸入然后返回標簽的映射函數:
def map_score(score):
if score >= 8:
return "Very good"
elif score >= 7:
return "Good"
else:
return "Bad"
df["Labels"] = df["Score"].apply(lambda score: map_score(score))
這是我的解決方案。 我試圖避免使用if-else
並使解決方案更加靈活。
主要思想是創建帶有最小值和最大值的labels
DataFrame
,然后為每個分數值找到正確的標簽。
編碼:
import pandas as pd
class Label(object):
name = ''
min = 0
max = 100
def __init__(self, name, min, max):
self.name = name
self.min = min
self.max = max
def data(self):
return [self.name, self.min, self.max]
class Labels:
labels = [
Label('Bad', 0, 7).data(),
Label('Good', 7, 8).data(),
Label('Very good', 8, 100).data()]
labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])
def get_label(score):
lbs = Labels.labels_df
tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
return tlab.Label.values[0]
class edu:
hs = 'High School'
b = 'Bachelors'
m = 'Master'
df = pd.DataFrame({
'ID': range(6),
'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})
df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)
print(df)
輸出:
ID Education Score Label
0 0 High School 7.884 Good
1 1 Bachelors 6.952 Bad
2 2 High School 8.185 Very good
3 3 High School 6.556 Bad
4 4 Bachelors 6.347 Bad
5 5 Master 6.794 Bad
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.