pandas df 在多列上应用条件

Question

I have a df that looks like this:我有一个看起来像这样的df：

pd.DataFrame.from_dict({'master_feature':['ab',float('NaN'),float('NaN')],
    'feature':[float('NaN'),float('NaN'),'pq'],
    'epic':[float('NaN'),'fg',float('NaN')]})

I want to create a new column named promoted from the columns master_feature, epic, and feature:我想创建一个名为从 master_feature、epic 和 feature 列中promoted的新列：

value of promoted will be: promoted的价值将是：

master feature if adjacent master_feature column value is not null.如果相邻的master_feature列值不是 null，则master feature 。
feature if adjacent feature column value is not null,and likewise for epic如果相邻feature列的值不是 null，则feature ，对于epic也是如此

something like:就像是：

df.promoted = 'master feature' if not pd.isnull(df.master_feature) 
               elseif 'feature' if not pd.isnull(df.feature) 
               elseif 'epic' pd.isnull(df.epic) 
               else 'Na'

how can I achieve this using a df.apply ?如何使用df.apply实现这一目标？ is it much more efficient if I use np.select ?如果我使用np.select会更有效吗？

Answer 1

np.select is the way to go. np.select是通往 go 的方式。 Try below.下面试试。 . . . . I think I got the logic correct based on your question.我想我根据你的问题得到了正确的逻辑。 Also, there is some discrepancy in your logic: "feature if adjacent feature column value is not null,and likewise for epic" is not the same as "elseif 'epic' pd.isnull(df.epic)" So I went with if df['epic'] is not null then 'epic' Let me know if that is correct.此外，您的逻辑存在一些差异：“如果相邻特征列值不是 null，并且对于史诗，则特征”与“elseif 'epic' pd.isnull(df.epic)”不同所以我选择了 if df['epic'] is not null then 'epic'让我知道这是否正确。

cond = [~df['master_feature'].isna(), # if master_feater is not null then 'master feater'
        ~df['feature'].isna(), # if feature is not null then 'feature
        ~df['epic'].isna()] # if epic is not null then 'epic'

choice = ['master feature', 
          'feature', 
          'epic']

df['promoted'] = np.select(cond, choice, np.nan)

  master_feature feature epic        promoted
0             ab     NaN  NaN  master feature
1            NaN     NaN   fg            epic
2            NaN      pq  NaN         feature

Answer 2

It can be done with combination of apply() and numpy argmin()可以通过apply()和 numpy argmin argmin()的组合来完成

df = pd.DataFrame.from_dict({'master_feature':['ab',float('NaN'),float('NaN')],
    'feature':[float('NaN'),float('NaN'),'pq'],
    'epic':[float('NaN'),'fg',float('NaN')]})

df.assign(promoted=lambda dfa: dfa.apply(lambda r: r[np.argmin(r.isna())], axis=1))

pandas df 在多列上应用条件

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-01-15 13:44:51

解决方案2
1 2021-01-15 13:59:48

pandas df 在多列上应用条件

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-01-15 13:44:51

解决方案2 1 2021-01-15 13:59:48

解决方案1
1 已采纳 2021-01-15 13:44:51

解决方案2
1 2021-01-15 13:59:48