[英]Create a new column based on value in another column
I have a data frame:我有一个数据框:
id|concept |description
12| |rewards member
12|tier one |
12|not avail |rewards member
GOAL: Create a new column final_desc
with the content in either the concept
or description
column目标:使用
concept
或description
列中的内容创建一个新列final_desc
There are 4 possible scenarios:有4种可能的情况:
There is a value in concept
column and not in description
, in which final_desc
is the value in the concept
在
concept
列有值,在description
中没有,其中final_desc
是concept
中的值
There is a value in description
column and not in concept
, in which final_desc
is the value in the description
description
列中有值, concept
中没有,其中final_desc
是description
中的值
The value in concept
column is not avail , in which final_desc
is the value in the description
concept
列中的值无效,其中final_desc
是description
中的值
Both the concept
and description
column are empty, in which final_desc
is empty concept
和description
栏均为空,其中final_desc
为空
I tried using a where statement but that does not account for scenario 3.我尝试使用 where 语句,但这不考虑场景 3。
df['final_desc'] = np.where(df['concept'].isnull(), df['description'], df['concept'])
I think I need a custom function but am not sure how to write to work across columns我想我需要一个自定义 function 但不知道如何编写跨列工作
You can combine a replace
and ffill/bfill
:您可以结合使用
replace
和ffill/bfill
:
df['final_desc'] = (df[['concept','description']].replace('not avail',np.nan)
.bfill(1)['concept']
)
Output: Output:
id concept description final_desc
0 12 NaN rewards member rewards member
1 12 tier one NaN tier one
2 12 not avail rewards member rewards member
This might do the trick:这可能会奏效:
df['final_desc'] = df.concept.replace('not avail',np.nan).fillna(df.description).fillna(df.concept.replace('not avail',np.nan))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.