[英]How to create a column in a dataframe based on another value in the row (Python)
I have the following data:我有以下数据:
country国家 | code代码 | continent大陆 | plants植物 | invertebrates无脊椎动物 | vertebrates脊椎动物 | total全部的 |
---|---|---|---|---|---|---|
Afghanistan阿富汗 | AFG AFG | Asia亚洲 | 5 5 | 2 2 | 33 33 | 40 40 |
Albania阿尔巴尼亚 | ALB ALB | Europe欧洲 | 5 5 | 71 71 | 61 61 | 137 137 |
Algeria阿尔及利亚 | DZA DZA | Africa非洲 | 24 24 | 40 40 | 81 81 | 145 145 |
I want to add a hemisphere columns that is determined on by the continent that references a list.我想添加一个由引用列表的大陆确定的半球列。 I want to do it using a custom function (and not using lambda).我想使用自定义 function (而不是使用 lambda)来做到这一点。
I attempted the following:我尝试了以下操作:
northern = ['North America', 'Asia', 'Europe']
southern = ['Africa','South America', 'Oceania']
def hem(x,y):
if y in northern:
x = 'northern'
return x
elif y in southern:
x = 'southern'
return x
else:
x = 'Not Found'
return x
species_custom['hemisphere'] = species_custom.apply(hem, args=(species_custom['continent'],), axis=1)
I receive the following error:我收到以下错误:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
Any ideas where I am going wrong?有什么想法我哪里出错了吗?
hem
is defined as taking two arguments but in the apply
you only pass one. hem
被定义为服用两个 arguments 但在apply
中你只通过一个。 And when you do you are passing the full continent
column to it.当你这样做时,你正在将整个continent
列传递给它。 Probably not what you want.可能不是你想要的。
You could simplify by using nested numpy
where
.您可以通过使用嵌套numpy
来简化where
.
import numpy as np
df['hemisphere'] = np.where(df['continent'].isin(northern), 'northern', np.where(df['continent'].isin(southern),'southern','Not Found'))
Result结果
country code continent plants invertebrates vertebrates total hemisphere
0 Afghanistan AFG Asia 5 2 33 40 northern
1 Albania ALB Europe 5 71 61 137 northern
2 Algeria DZA Africa 24 40 81 145 southern
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.