[英]Add new column based on boolean values in a different column
I'm trying to add a new column to a DataFrame based on the boolean values in another column. 我正在尝试根据另一列中的布尔值向DataFrame添加新列。
Given a DataFrame like this: 给定这样一个DataFrame:
snr = DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E'], 'seniority': [False, False, False, True, False] })
The furthest I've come so far is this: 到目前为止,我最远的是:
def refine_seniority(contact):
contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'
snr.apply(refine_seniority)
yet I'm getting this error: 但我收到此错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-208-0694ebf79a50> in <module>()
2 contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'
3
----> 4 snr.apply(refine_seniority)
5
6 snr
/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds )
4414 return self._apply_raw(f, axis)
4415 else:
-> 4416 return self._apply_standard(f, axis)
4417 else:
4418 return self._apply_broadcast(f, axis)
/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
4489 # no k defined yet
4490 pass
-> 4491 raise e
4492
4493
KeyError: ('seniority', u'occurred at index name')
Feels like I'm missing some fundamental understanding on DataFrames, but I'm stuck. 感觉好像我对DataFrames缺少一些基本的了解,但是我陷入了困境。
What's the proper way to add a new column based on boolean values in a different column? 根据不同列中的布尔值添加新列的正确方法是什么?
You can create a dict and call map
: 您可以创建字典并调用map
:
In [176]:
temp = {True:'senior', False:'Non-senior'}
snr['refined_seniority'] = snr['seniority'].map(temp)
snr
Out[176]:
name seniority refined_seniority
0 A False Non-senior
1 B False Non-senior
2 C False Non-senior
3 D True senior
4 E False Non-senior
As user @Jeff has pointed out using map
or apply
should be a last resort if a vectorised solution can be applied. 正如用户@Jeff指出的那样,如果可以应用矢量化解决方案,则使用map
或apply
应该是最后的选择。
Or use numpy where
或使用numpy where
In [178]:
snr['refined_seniority'] = np.where(snr['seniority'] == True, 'senior', 'Non-senior')
snr
Out[178]:
name seniority refined_seniority
0 A False Non-senior
1 B False Non-senior
2 C False Non-senior
3 D True senior
4 E False Non-senior
If you modifed your function to this then it would work: 如果您将函数修改为此,那么它将起作用:
In [187]:
def refine_seniority(contact):
if contact == True:
return 'senior'
else:
return 'Non-senior'
snr['refined_seniority'] = snr['seniority'].apply(refine_seniority)
snr
Out[187]:
name seniority refined_seniority
0 A False Non-senior
1 B False Non-senior
2 C False Non-senior
3 D True senior
4 E False Non-senior
What you wrote is incorrect, you are calling apply on the df but the column as a label does not exist, see below: 您所写的内容不正确,您在df上调用了apply,但是作为标签的列不存在,请参见下文:
In [193]:
def refine_seniority(contact):
print(contact)
snr['refined_seniority'] = snr.apply(refine_seniority)
0 A
1 B
2 C
3 D
4 E
Name: name, dtype: object
0 False
1 False
2 False
3 True
4 False
Name: seniority, dtype: object
Here you can see that it outputs 2 pandas series, there is no key value for 'seniority' hence the error. 在这里,您可以看到它输出了2个熊猫系列,没有用于“ seniority”的键值,因此是错误。
snr['refine_seniority']= snr['seniority'].map({True:'senior', False:'Non-senior'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.