简体   繁体   English

根据其他列中的布尔值添加新列

[英]Add new column based on boolean values in a different column

I'm trying to add a new column to a DataFrame based on the boolean values in another column. 我正在尝试根据另一列中的布尔值向DataFrame添加新列。

Given a DataFrame like this: 给定这样一个DataFrame:

snr = DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E'],  'seniority': [False, False, False, True, False] })

The furthest I've come so far is this: 到目前为止,我最远的是:

def refine_seniority(contact):
    contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'

snr.apply(refine_seniority)

yet I'm getting this error: 但我收到此错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-208-0694ebf79a50> in <module>()
      2     contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'
      3 
----> 4 snr.apply(refine_seniority)
      5 
      6 snr

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds )
   4414                     return self._apply_raw(f, axis)
   4415                 else:
-> 4416                     return self._apply_standard(f, axis)
   4417             else:
   4418                 return self._apply_broadcast(f, axis)

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4489                     # no k defined yet
   4490                     pass
-> 4491                 raise e
   4492 
   4493 

KeyError: ('seniority', u'occurred at index name')

Feels like I'm missing some fundamental understanding on DataFrames, but I'm stuck. 感觉好像我对DataFrames缺少一些基本的了解,但是我陷入了困境。

What's the proper way to add a new column based on boolean values in a different column? 根据不同列中的布尔值添加新列的正确方法是什么?

You can create a dict and call map : 您可以创建字典并调用map

In [176]:

temp = {True:'senior', False:'Non-senior'}
snr['refined_seniority'] = snr['seniority'].map(temp)
snr
Out[176]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

As user @Jeff has pointed out using map or apply should be a last resort if a vectorised solution can be applied. 正如用户@Jeff指出的那样,如果可以应用矢量化解决方案,则使用mapapply应该是最后的选择。

Or use numpy where 或使用numpy where

In [178]:

snr['refined_seniority'] = np.where(snr['seniority'] == True, 'senior', 'Non-senior')
snr
Out[178]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

If you modifed your function to this then it would work: 如果您将函数修改为此,那么它将起作用:

In [187]:

def refine_seniority(contact):
    if contact == True:
        return 'senior'
    else:
        return 'Non-senior'

snr['refined_seniority'] = snr['seniority'].apply(refine_seniority)
snr
Out[187]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

What you wrote is incorrect, you are calling apply on the df but the column as a label does not exist, see below: 您所写的内容不正确,您在df上调用了apply,但是作为标签的列不存在,请参见下文:

In [193]:

def refine_seniority(contact):
    print(contact)


snr['refined_seniority'] = snr.apply(refine_seniority)

0    A
1    B
2    C
3    D
4    E
Name: name, dtype: object
0    False
1    False
2    False
3     True
4    False
Name: seniority, dtype: object

Here you can see that it outputs 2 pandas series, there is no key value for 'seniority' hence the error. 在这里,您可以看到它输出了2个熊猫系列,没有用于“ seniority”的键值,因此是错误。

snr['refine_seniority']= snr['seniority'].map({True:'senior', False:'Non-senior'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据布尔条件在新列中设置值 - Set values in a new column based on a boolean condition 将新列中 boolean 值的数量(基于不同列中的值组)相加 - Sum the amount of boolean values (based on value groups within different column) inside a new column Pandas 根据 3 个不同列中的值添加带有 groupby 的新列 - Pandas add new column with groupby based on values in 3 different columns 基于另一列中的 boolean 值的同一列的不同聚合总和 - different aggregated sums of the same column based on boolean values in another column 基于多个不同的值创建新列 - Creating new column based on multiple different values 如何获取1列值并根据布尔标志列将其中一些值放入新列? - How do I take 1 column of values and put some of those values in a new column based on a boolean flag column? 基于比较其他数据帧值生成带有布尔列的新数据帧 - Generate a new dataframe with boolean column based on comparing other dataframes values 根据条件在 df 的新列中添加值 - Add values in new column of df based on a condition Python Pandas:为源列的每个不同值创建一个新列(使用布尔输出作为列值) - Python Pandas: create a new column for each different value of a source column (with boolean output as column values) 如何添加基于 pandas 中另一列的值的新列 - How to add new column with values based on another column in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM