在列中查找匹配值並創建另一列 pandas dataframe

Question

假設我有以下 dataframe：

ID  Country Employee    Location
1   AE       Jay        AAA
2   AE       Mary       aa
3   AE       Peter      bbb
3   AE       Peter      ddd
6   DK       Donk       ddd
7   CZ       Cesar      fff
7   CZ       Cesar      GGg
7   CZ       Cesar      
8   CZ       Carlos     #

我需要使用下面的 dataframe 來確認位置值是否有效（根據他們的國家/地區）並創建一個名為“舊位置名稱”的額外列，其中包含以下內容：

如果值與查找 dataframe 匹配（無論是否大寫），添加到“舊位置名稱”列“正確值”
如果 Location 的值不正確，請將先前在“Location”列中使用的值添加到“Legacy Location Name”，並在“Location”中添加查找 dataframe 的現有位置的第一個值
如果 Location 的值為空（如倒數第二行），將值“LOCATION NOT PROVIDED”添加到“Legacy Location Name”，並在“Location”中添加查找 dataframe 的現有位置的第一個值

查找df：

Country Location
AE      bbb
AE      aaa
AE      ccc
DK      ddd
DK      eee
DK      fff
CZ      ggg
CZ      hhh

Output 預計

ID  Country Employee    Location    Legacy Location
1   AE      Jay         AAA         CORRECT VALUE
2   AE      Mary        bbb         aa
3   AE      Peter       bbb         CORRECT VALUE
3   AE      Peter       bbb         ddd
6   DK      Donk        ddd         CORRECT VALUE
7   CZ      Cesar       ggg         fff
7   CZ      Cesar       GGg         CORRECT VALUE
7   CZ      Cesar                   LOCATION NOT PROVIDED
8   CZ      Carlos      ggg         #

實現它的最佳方法是什么？

謝謝！

Answer 1

並不復雜，但需要很多步驟：

s = (lookup_df.drop_duplicates('Country')
     .set_index('Country')['Location']
     )

out = (df
 # handle location independently of case
 .assign(Location=df['Location'].str.casefold())
 # identify the correct values by merging 
 .merge(lookup_df.assign(**{'Legacy Location': 'CORRECT VALUE'}),
          how='left')
 # replace invalid locations
 .assign(**{'Location': lambda d: df['Location'].mask(d['Legacy Location'].isna()).fillna(df['Country'].map(s).mask(df['Location'].isna())),
 # add previous invalid locations
            'Legacy Location': lambda d: d['Legacy Location'].fillna(df['Location'].fillna('LOCATION NOT PROVIDED'))})
 
 )

print(out)

注意。 為簡單起見，假設所有空單元格都是 NaN。

Output：

   ID Country Employee Location        Legacy Location
0   1      AE      Jay      AAA          CORRECT VALUE
1   2      AE     Mary      bbb                     aa
2   3      AE    Peter      bbb          CORRECT VALUE
3   3      AE    Peter      bbb                    ddd
4   6      DK     Donk      ddd          CORRECT VALUE
5   7      CZ    Cesar      ggg                    fff
6   7      CZ    Cesar      GGg          CORRECT VALUE
7   7      CZ    Cesar      NaN  LOCATION NOT PROVIDED
8   8      CZ   Carlos      ggg                      #

在列中查找匹配值並創建另一列 pandas dataframe

問題描述

1 個解決方案

解決方案1
0 2022-09-07 20:33:16

在列中查找匹配值並創建另一列 pandas dataframe

問題描述

1 個解決方案

解決方案1 0 2022-09-07 20:33:16

解決方案1
0 2022-09-07 20:33:16