根據另一列中的值組合數據框的列

Question

輸入 df（示例）

Country     SubregionA      SubregionB
BRA         State of Acre   BrasilÃ©ia
BRA         State of Acre   Cruzeiro do Sul
USA         AL              Bibb County
USA         AL              Blount County
USA         AL              Bullock County

輸出 df

Country     SubregionA      SubregionB
BRA         State of Acre   State of Acre - BrasilÃ©ia
BRA         State of Acre   State of Acre - Cruzeiro do Sul
USA         AL              AL Bibb County
USA         AL              AL Blount County
USA         AL              AL Bullock County

代碼片段是不言自明的，但執行時似乎永遠運行。 可能出什么問題了（數據框“ data ”也很大，大約有 250K+ 行）

for row in data.itertuples():
     region = data['Country']

     if region == 'ARG' :
          data['SubregionB'] = data[['SubregionA' 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
     elif region == 'BRA' :
          data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
     elif region == 'USA':
          data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
     else:
          pass

說明：嘗試根據列名稱“Country”中的值連接列 SubregionA 和 SubregionB。 分隔符不同，因此編寫了多個 if-else 語句。 執行時間太長，我怎樣才能讓它更快？

Answer 1

您可以使用numpy.select與Series.isin和聯接列與+ ：

print (df)
  Country     SubregionA       SubregionB
0     BRA  State of Acre         Brasilia
1     BRA  State of Acre  Cruzeiro do Sul
2     USA             AL      Bibb County
3     USA             AL    Blount County
4     USA             AL   Bullock County
5     JAP            AAA             BBBB

reg1 = ['ARG','BRA']
reg2 = ['USA']

a = np.select([df['Country'].isin(reg1), df['Country'].isin(reg2)], 
              [df['SubregionA'] + ' - ' + df['SubregionB'],
               df['SubregionA'] + ' ' + df['SubregionB']],
              default=df['SubregionB'])

df['SubregionB'] = a
print (df)
  Country     SubregionA                       SubregionB
0     BRA  State of Acre         State of Acre - Brasilia
1     BRA  State of Acre  State of Acre - Cruzeiro do Sul
2     USA             AL                   AL Bibb County
3     USA             AL                 AL Blount County
4     USA             AL                AL Bullock County
5     JAP            AAA                             BBBB

根據另一列中的值組合數據框的列

問題描述

1 個解決方案

解決方案1
1 已采納 2020-10-30 10:22:28

根據另一列中的值組合數據框的列

問題描述

1 個解決方案

解決方案1 1 已采納 2020-10-30 10:22:28

解決方案1
1 已采納 2020-10-30 10:22:28