簡體   English   中英

熊貓系列替換值

[英]Pandas series replace values

我有一個熊貓系列,其值如下:

Bachelors Degree         639
Diploma                  291
O - Level                264
Masters Degree           149
Certificate              126
A - Level                 69
PGD                       40
Bachelors Degree          28
A-Level                   20
O-Level                   15
Masters                   10
Bachelors                  6
diploma                    5
certificate                5
Ph.D                       4
A- Level                   2
Post Graduate Diploma      1
Msc Environment            1
BBA                        1
O- Level                   1
Masters                    1
PhD                        1

我從excel中獲取數據。

我想使用熊貓來進行數據清理,比如替換所有擁有碩士學位的案例(我可以在 excel 中完成,但我正在學習熊貓)。

我試過了

mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":"diploma",
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":"certificate",
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].map(mapp)

結果僅針對只有一個值的證書密鑰返回。

似乎我不能使用列表作為字典鍵的值。

任何關於如何替換值的建議將不勝感激。 Ronald 這是實際數據在 excel 列中的顯示方式。 在此處輸入圖片說明

我已經添加了數據如何在列中的圖像。 挑戰是如何替換說“碩士學位”的各種變體。

一種想法是將一個元素值轉換為一個元素列表,例如"diploma"["diploma"]

mapp1={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":["diploma"],
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":["certificate"],
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k.lower(): oldk for oldk, oldv in mapp1.items() for k in oldv}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
          EDUCATION_LEVEL  VAL
0       Bachelor's Degree  639
1        Ordinary Diploma  291
2          Ordinary Level  264
3         Master's Degree  149
4             Certificate  126
5          Advanced Level   69
6   Post Graduate Diploma   40
7       Bachelor's Degree   28
8          Advanced Level   20
9          Ordinary Level   15
10        Master's Degree   10
11      Bachelor's Degree    6
12       Ordinary Diploma    5
13            Certificate    5
14                    PHD    4
15                    NaN    2
16  Post Graduate Diploma    1
17        Master's Degree    1
18      Bachelor's Degree    1
19         Ordinary Level    1
20        Master's Degree    1
21                    PHD    1

如果不可能,則使用:

d = {}
for k, v in mapp.items():
    if isinstance(v, list):
        for x in v:
            d[x.lower()] = k
    else:
        d[v.lower()] = k


df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
          EDUCATION_LEVEL  VAL
0       Bachelor's Degree  639
1        Ordinary Diploma  291
2          Ordinary Level  264
3         Master's Degree  149
4             Certificate  126
5          Advanced Level   69
6   Post Graduate Diploma   40
7       Bachelor's Degree   28
8          Advanced Level   20
9          Ordinary Level   15
10        Master's Degree   10
11      Bachelor's Degree    6
12       Ordinary Diploma    5
13            Certificate    5
14                    PHD    4
15                    NaN    2
16  Post Graduate Diploma    1
17        Master's Degree    1
18      Bachelor's Degree    1
19         Ordinary Level    1
20        Master's Degree    1
21                    PHD    1

首先通過將所有值設置為列表對您的 mapp dict 稍作更改:

mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":["diploma"],
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":["certificate"],
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }

mapp_new = [{l:k for l in v} for k,v in mapp.items()]
mapp_new = {k.lower(): v for d in mapp_new for k, v in d.items()}
df.EDUCATION_LEVEL.apply(lambda x: mapp_new.get(x.lower(), x))


0         Bachelor's Degree
1          Ordinary Diploma
2            Ordinary Level
3           Master's Degree
4               Certificate
5            Advanced Level
6     Post Graduate Diploma
7         Bachelor's Degree
8            Advanced Level
9            Ordinary Level
10          Master's Degree
11        Bachelor's Degree
12         Ordinary Diploma
13              Certificate
14                      PHD
15                 A- Level
16    Post Graduate Diploma
17          Master's Degree
18        Bachelor's Degree
19           Ordinary Level
20          Master's Degree
21                      PHD

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM