[英]pandas - change value in column based on another column
说我有一个all_data
这样的数据all_data
:
Id Zone Neighb
1 NaN IDOTRR
2 RL Veenker
3 NaN IDOTRR
4 RM Crawfor
5 NaN Mitchel
我想在“区域”(Zone)列中输入缺失的值,以便在“邻居”(Neighb)为“ IDOTRR”的情况下,将“区域”(Zone)设置为“ RM”,而在“邻居”(Neighb)为“ Mitchel”的情况下,我设置为“ RL”。
all_data.loc[all_data.MSZoning.isnull()
& all_data.Neighborhood == "IDOTRR", "MSZoning"] = "RM"
all_data.loc[all_data.MSZoning.isnull()
& all_data.Neighborhood == "Mitchel", "MSZoning"] = "RL"
我得到:
TypeError:无效的类型比较
C:\\ Users \\ pprun \\ Anaconda3 \\ lib \\ site-packages \\ pandas \\ core \\ ops.py:798:FutureWarning:逐元素比较失败; 而是返回标量,但将来将执行元素比较
结果= getattr(x,名称)(y)
我敢肯定这应该很简单,但是我已经把它弄乱了太久了。 请帮忙。
使用np.select即
df['Zone'] = np.select([df['Neighb'] == 'IDOTRR',df['Neighb'] == 'Mitchel'],['RM','RL'],df['Zone'])
Id Zone Neighb 0 1 RM IDOTRR 1 2 RL Veenker 2 3 RM IDOTRR 3 4 RM Crawfor 4 5 RL Mitchel
如果您有条件,可以使用
# Boolean mask of condition 1
m1 = (all_data.MSZoning.isnull()) & (all_data.Neighborhood == "IDOTRR")
# Boolean mask of condition 2
m2 = (all_data.MSZoning.isnull()) & (all_data.Neighborhood == "Mitchel")
np.select([m1,m2],['RM','RL'],all_data["MSZoning"])
df.Zone=df.Zone.fillna(df.Neighb.replace({'IDOTRR':'RM','Mitchel':'RL'}))
df
Out[784]:
Id Zone Neighb
0 1 RM IDOTRR
1 2 RL Veenker
2 3 RM IDOTRR
3 4 RM Crawfor
4 5 RL Mitchel
在Python中, &
优先于==
http://www.annedawson.net/Python_Precedence.htm
因此,当您执行all_data.MSZoning.isnull() & all_data.Neighborhood == "Mitchel"
,这被解释为(all_data.MSZoning.isnull() & all_data.Neighborhood) == "Mitchel"
,现在Python尝试AND
带有str系列的boolean系列,并查看它是否等于单个str "Mitchel"
。 解决方案是将测试括在括号中: (all_data.MSZoning.isnull()) & (all_data.Neighborhood == "Mitchel")
。 有时候,如果我有很多选择器,我会将它们分配给变量,然后将它们AND
,例如:
null_zoning = all_data.MSZoning.isnull()
Mitchel_neighb = all_data.Neighborhood == "Mitchel"
all_data.loc[null_zoning & Mitchel_neighb, "MSZoning"] = "RL"
这不仅可以解决操作顺序问题,还意味着all_data.loc[null_zoning & Mitchel_neighb, "MSZoning"] = "RL"
放在一行上。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.