根据另一列中的值组合数据框的列

Question

输入 df（示例）

Country     SubregionA      SubregionB
BRA         State of Acre   BrasilÃ©ia
BRA         State of Acre   Cruzeiro do Sul
USA         AL              Bibb County
USA         AL              Blount County
USA         AL              Bullock County

输出 df

Country     SubregionA      SubregionB
BRA         State of Acre   State of Acre - BrasilÃ©ia
BRA         State of Acre   State of Acre - Cruzeiro do Sul
USA         AL              AL Bibb County
USA         AL              AL Blount County
USA         AL              AL Bullock County

代码片段是不言自明的，但执行时似乎永远运行。 可能出什么问题了（数据框“ data ”也很大，大约有 250K+ 行）

for row in data.itertuples():
     region = data['Country']

     if region == 'ARG' :
          data['SubregionB'] = data[['SubregionA' 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
     elif region == 'BRA' :
          data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
     elif region == 'USA':
          data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
     else:
          pass

说明：尝试根据列名称“Country”中的值连接列 SubregionA 和 SubregionB。 分隔符不同，因此编写了多个 if-else 语句。 执行时间太长，我怎样才能让它更快？

Answer 1

您可以使用numpy.select与Series.isin和联接列与+ ：

print (df)
  Country     SubregionA       SubregionB
0     BRA  State of Acre         Brasilia
1     BRA  State of Acre  Cruzeiro do Sul
2     USA             AL      Bibb County
3     USA             AL    Blount County
4     USA             AL   Bullock County
5     JAP            AAA             BBBB

reg1 = ['ARG','BRA']
reg2 = ['USA']

a = np.select([df['Country'].isin(reg1), df['Country'].isin(reg2)], 
              [df['SubregionA'] + ' - ' + df['SubregionB'],
               df['SubregionA'] + ' ' + df['SubregionB']],
              default=df['SubregionB'])

df['SubregionB'] = a
print (df)
  Country     SubregionA                       SubregionB
0     BRA  State of Acre         State of Acre - Brasilia
1     BRA  State of Acre  State of Acre - Cruzeiro do Sul
2     USA             AL                   AL Bibb County
3     USA             AL                 AL Blount County
4     USA             AL                AL Bullock County
5     JAP            AAA                             BBBB

根据另一列中的值组合数据框的列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-30 10:22:28

根据另一列中的值组合数据框的列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-30 10:22:28

解决方案1
1 已采纳 2020-10-30 10:22:28