簡體   English   中英

在列中查找匹配值並創建另一列 pandas dataframe

[英]Find matching value in column and create another column pandas dataframe

假設我有以下 dataframe:

ID  Country Employee    Location
1   AE       Jay        AAA
2   AE       Mary       aa
3   AE       Peter      bbb
3   AE       Peter      ddd
6   DK       Donk       ddd
7   CZ       Cesar      fff
7   CZ       Cesar      GGg
7   CZ       Cesar      
8   CZ       Carlos     #

我需要使用下面的 dataframe 來確認位置值是否有效(根據他們的國家/地區)並創建一個名為“舊位置名稱”的額外列,其中包含以下內容:

  • 如果值與查找 dataframe 匹配(無論是否大寫),添加到“舊位置名稱”列“正確值”

  • 如果 Location 的值不正確,請將先前在“Location”列中使用的值添加到“Legacy Location Name”,並在“Location”中添加查找 dataframe 的現有位置的第一個值

  • 如果 Location 的值為空(如倒數第二行),將值“LOCATION NOT PROVIDED”添加到“Legacy Location Name”,並在“Location”中添加查找 dataframe 的現有位置的第一個值


Country Location
AE      bbb
AE      aaa
AE      ccc
DK      ddd
DK      eee
DK      fff
CZ      ggg
CZ      hhh

Output 預計

ID  Country Employee    Location    Legacy Location
1   AE      Jay         AAA         CORRECT VALUE
2   AE      Mary        bbb         aa
3   AE      Peter       bbb         CORRECT VALUE
3   AE      Peter       bbb         ddd
6   DK      Donk        ddd         CORRECT VALUE
7   CZ      Cesar       ggg         fff
7   CZ      Cesar       GGg         CORRECT VALUE
7   CZ      Cesar                   LOCATION NOT PROVIDED
8   CZ      Carlos      ggg         #




s = (lookup_df.drop_duplicates('Country')

out = (df
 # handle location independently of case
 # identify the correct values by merging 
 .merge(lookup_df.assign(**{'Legacy Location': 'CORRECT VALUE'}),
 # replace invalid locations
 .assign(**{'Location': lambda d: df['Location'].mask(d['Legacy Location'].isna()).fillna(df['Country'].map(s).mask(df['Location'].isna())),
 # add previous invalid locations
            'Legacy Location': lambda d: d['Legacy Location'].fillna(df['Location'].fillna('LOCATION NOT PROVIDED'))})


注意。 為簡單起見,假設所有空單元格都是 NaN。


   ID Country Employee Location        Legacy Location
0   1      AE      Jay      AAA          CORRECT VALUE
1   2      AE     Mary      bbb                     aa
2   3      AE    Peter      bbb          CORRECT VALUE
3   3      AE    Peter      bbb                    ddd
4   6      DK     Donk      ddd          CORRECT VALUE
5   7      CZ    Cesar      ggg                    fff
6   7      CZ    Cesar      GGg          CORRECT VALUE
7   7      CZ    Cesar      NaN  LOCATION NOT PROVIDED
8   8      CZ   Carlos      ggg                      #


聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

粵ICP備18138465號  © 2020-2024 STACKOOM.COM