简体   繁体   English

Pandas:根据现有列的值创建新列

[英]Pandas: Creating new column based on values from existing column

I have a pandas dataframe with two columns as following:我有一个 pandas dataframe 有两列如下:

A      B
Yes    No
Yes    Yes
No     Yes
No     No
NA     Yes
NA     NA

I want to create a new column based on these values such that if any of the column values are Yes , the value in the new column should also be Yes .我想根据这些值创建一个新列,这样如果任何列值是Yes ,新列中的值也应该Yes If both columns have the value No , the new column would also have the value No .如果两列都具有值No ,则新列也将具有值No And finally, if both columns has value NA , the output would also have NA for the new column.最后,如果两列都具有值NA ,则 output 也将具有新列的NA Example output for above data is:上述数据的示例 output 为:

C
Yes
Yes
Yes
No
Yes
NA

I wrote a loop over the length of dataframe and then checks for each value to get a new column.我在 dataframe 的长度上编写了一个循环,然后检查每个值以获得一个新列。 However, it takes a long time for 10M records.但是,10M 的记录需要很长时间。 Is there a faster pythonic way to achieve this?有没有更快的pythonic方法来实现这一点?

Something like就像是

df.fillna('').max(axis=1)
Out[106]: 
0    Yes
1    Yes
2    Yes
3     No
4    Yes
5       
dtype: object

Try:尝试:

(df == 'Yes').eval('A | B').astype(str).mask(df['A'].isna() & df['B'].isna())

Another way of doing it.另一种方法。 Hard corded though虽然硬线

conditions=((df['A']=='Yes')|(df['B']=='Yes'),(df['A']=='No')&(df['B']=='No'),(df['A']=='NaN')&(df['B']=='NaN'))
choicelist=('Yes','No','NaN')
df['C']=np.select(conditions, choicelist)
df

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM