[英]Combining columns of dataframe based on value in another column
Input df(example)输入 df(示例)
Country SubregionA SubregionB
BRA State of Acre Brasiléia
BRA State of Acre Cruzeiro do Sul
USA AL Bibb County
USA AL Blount County
USA AL Bullock County
Output df输出 df
Country SubregionA SubregionB
BRA State of Acre State of Acre - Brasiléia
BRA State of Acre State of Acre - Cruzeiro do Sul
USA AL AL Bibb County
USA AL AL Blount County
USA AL AL Bullock County
The code snippet is quite self explanatory, but when executed seems to run forever.代码片段是不言自明的,但执行时似乎永远运行。 What could be going wrong(Also the dataframe '
data
' is quite large around 250K+ rows)可能出什么问题了(数据框“
data
”也很大,大约有 250K+ 行)
for row in data.itertuples():
region = data['Country']
if region == 'ARG' :
data['SubregionB'] = data[['SubregionA' 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'BRA' :
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'USA':
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
else:
pass
Explanation : Trying to join columns SubregionA and SubregionB based on values in the column name 'Country'.说明:尝试根据列名称“Country”中的值连接列 SubregionA 和 SubregionB。 The separators are different and thus have written multiple if-else statements.
分隔符不同,因此编写了多个 if-else 语句。 Takes too long to execute, how can I make this faster?
执行时间太长,我怎样才能让它更快?
You can use numpy.select
with Series.isin
and join columns with +
:您可以使用
numpy.select
与Series.isin
和联接列与+
:
print (df)
Country SubregionA SubregionB
0 BRA State of Acre Brasilia
1 BRA State of Acre Cruzeiro do Sul
2 USA AL Bibb County
3 USA AL Blount County
4 USA AL Bullock County
5 JAP AAA BBBB
reg1 = ['ARG','BRA']
reg2 = ['USA']
a = np.select([df['Country'].isin(reg1), df['Country'].isin(reg2)],
[df['SubregionA'] + ' - ' + df['SubregionB'],
df['SubregionA'] + ' ' + df['SubregionB']],
default=df['SubregionB'])
df['SubregionB'] = a
print (df)
Country SubregionA SubregionB
0 BRA State of Acre State of Acre - Brasilia
1 BRA State of Acre State of Acre - Cruzeiro do Sul
2 USA AL AL Bibb County
3 USA AL AL Blount County
4 USA AL AL Bullock County
5 JAP AAA BBBB
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.