Pandas: Creating new column based on values from existing column

Question

I have a pandas dataframe with two columns as following:

A      B
Yes    No
Yes    Yes
No     Yes
No     No
NA     Yes
NA     NA

I want to create a new column based on these values such that if any of the column values are Yes , the value in the new column should also be Yes . If both columns have the value No , the new column would also have the value No . And finally, if both columns has value NA , the output would also have NA for the new column. Example output for above data is:

C
Yes
Yes
Yes
No
Yes
NA

I wrote a loop over the length of dataframe and then checks for each value to get a new column. However, it takes a long time for 10M records. Is there a faster pythonic way to achieve this?

Answer 1

Something like

df.fillna('').max(axis=1)
Out[106]: 
0    Yes
1    Yes
2    Yes
3     No
4    Yes
5       
dtype: object

Answer 2

Try:

(df == 'Yes').eval('A | B').astype(str).mask(df['A'].isna() & df['B'].isna())

Answer 3

Another way of doing it. Hard corded though

conditions=((df['A']=='Yes')|(df['B']=='Yes'),(df['A']=='No')&(df['B']=='No'),(df['A']=='NaN')&(df['B']=='NaN'))
choicelist=('Yes','No','NaN')
df['C']=np.select(conditions, choicelist)
df

Pandas: Creating new column based on values from existing column

Question

3 answers

solution1
7 ACCPTED 2020-05-01 22:10:34

solution2
2 2020-05-01 22:03:00

solution3
0 2020-05-01 22:19:10

Pandas: Creating new column based on values from existing column

Question

3 answers

solution1 7 ACCPTED 2020-05-01 22:10:34

solution2 2 2020-05-01 22:03:00

solution3 0 2020-05-01 22:19:10

solution1
7 ACCPTED 2020-05-01 22:10:34

solution2
2 2020-05-01 22:03:00

solution3
0 2020-05-01 22:19:10