[英]Setting value of columns in one dataframe to another dataframe column based on condition
[英]Combining columns of dataframe based on value in another column
输入 df(示例)
Country SubregionA SubregionB
BRA State of Acre Brasiléia
BRA State of Acre Cruzeiro do Sul
USA AL Bibb County
USA AL Blount County
USA AL Bullock County
输出 df
Country SubregionA SubregionB
BRA State of Acre State of Acre - Brasiléia
BRA State of Acre State of Acre - Cruzeiro do Sul
USA AL AL Bibb County
USA AL AL Blount County
USA AL AL Bullock County
代码片段是不言自明的,但执行时似乎永远运行。 可能出什么问题了(数据框“ data
”也很大,大约有 250K+ 行)
for row in data.itertuples():
region = data['Country']
if region == 'ARG' :
data['SubregionB'] = data[['SubregionA' 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'BRA' :
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'USA':
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
else:
pass
说明:尝试根据列名称“Country”中的值连接列 SubregionA 和 SubregionB。 分隔符不同,因此编写了多个 if-else 语句。 执行时间太长,我怎样才能让它更快?
您可以使用numpy.select
与Series.isin
和联接列与+
:
print (df)
Country SubregionA SubregionB
0 BRA State of Acre Brasilia
1 BRA State of Acre Cruzeiro do Sul
2 USA AL Bibb County
3 USA AL Blount County
4 USA AL Bullock County
5 JAP AAA BBBB
reg1 = ['ARG','BRA']
reg2 = ['USA']
a = np.select([df['Country'].isin(reg1), df['Country'].isin(reg2)],
[df['SubregionA'] + ' - ' + df['SubregionB'],
df['SubregionA'] + ' ' + df['SubregionB']],
default=df['SubregionB'])
df['SubregionB'] = a
print (df)
Country SubregionA SubregionB
0 BRA State of Acre State of Acre - Brasilia
1 BRA State of Acre State of Acre - Cruzeiro do Sul
2 USA AL AL Bibb County
3 USA AL AL Blount County
4 USA AL AL Bullock County
5 JAP AAA BBBB
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.