[英]Pandas series replace values
我有一個熊貓系列,其值如下:
Bachelors Degree 639
Diploma 291
O - Level 264
Masters Degree 149
Certificate 126
A - Level 69
PGD 40
Bachelors Degree 28
A-Level 20
O-Level 15
Masters 10
Bachelors 6
diploma 5
certificate 5
Ph.D 4
A- Level 2
Post Graduate Diploma 1
Msc Environment 1
BBA 1
O- Level 1
Masters 1
PhD 1
我從excel中獲取數據。
我想使用熊貓來進行數據清理,比如替換所有擁有碩士學位的案例(我可以在 excel 中完成,但我正在學習熊貓)。
我試過了
mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":"diploma",
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":"certificate",
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].map(mapp)
結果僅針對只有一個值的證書密鑰返回。
似乎我不能使用列表作為字典鍵的值。
任何關於如何替換值的建議將不勝感激。 Ronald 這是實際數據在 excel 列中的顯示方式。
我已經添加了數據如何在列中的圖像。 挑戰是如何替換說“碩士學位”的各種變體。
一種想法是將一個元素值轉換為一個元素列表,例如"diploma"
到["diploma"]
:
mapp1={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":["diploma"],
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":["certificate"],
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k.lower(): oldk for oldk, oldv in mapp1.items() for k in oldv}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
EDUCATION_LEVEL VAL
0 Bachelor's Degree 639
1 Ordinary Diploma 291
2 Ordinary Level 264
3 Master's Degree 149
4 Certificate 126
5 Advanced Level 69
6 Post Graduate Diploma 40
7 Bachelor's Degree 28
8 Advanced Level 20
9 Ordinary Level 15
10 Master's Degree 10
11 Bachelor's Degree 6
12 Ordinary Diploma 5
13 Certificate 5
14 PHD 4
15 NaN 2
16 Post Graduate Diploma 1
17 Master's Degree 1
18 Bachelor's Degree 1
19 Ordinary Level 1
20 Master's Degree 1
21 PHD 1
如果不可能,則使用:
d = {}
for k, v in mapp.items():
if isinstance(v, list):
for x in v:
d[x.lower()] = k
else:
d[v.lower()] = k
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
EDUCATION_LEVEL VAL
0 Bachelor's Degree 639
1 Ordinary Diploma 291
2 Ordinary Level 264
3 Master's Degree 149
4 Certificate 126
5 Advanced Level 69
6 Post Graduate Diploma 40
7 Bachelor's Degree 28
8 Advanced Level 20
9 Ordinary Level 15
10 Master's Degree 10
11 Bachelor's Degree 6
12 Ordinary Diploma 5
13 Certificate 5
14 PHD 4
15 NaN 2
16 Post Graduate Diploma 1
17 Master's Degree 1
18 Bachelor's Degree 1
19 Ordinary Level 1
20 Master's Degree 1
21 PHD 1
首先通過將所有值設置為列表對您的 mapp dict 稍作更改:
mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":["diploma"],
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":["certificate"],
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
mapp_new = [{l:k for l in v} for k,v in mapp.items()]
mapp_new = {k.lower(): v for d in mapp_new for k, v in d.items()}
df.EDUCATION_LEVEL.apply(lambda x: mapp_new.get(x.lower(), x))
0 Bachelor's Degree
1 Ordinary Diploma
2 Ordinary Level
3 Master's Degree
4 Certificate
5 Advanced Level
6 Post Graduate Diploma
7 Bachelor's Degree
8 Advanced Level
9 Ordinary Level
10 Master's Degree
11 Bachelor's Degree
12 Ordinary Diploma
13 Certificate
14 PHD
15 A- Level
16 Post Graduate Diploma
17 Master's Degree
18 Bachelor's Degree
19 Ordinary Level
20 Master's Degree
21 PHD
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.