简体   繁体   中英

Create Combined Column From Three Columns With The Most Data (Not NA)

I have data like:

import pandas as pd

df = pd.DataFrame(data=[[1,-2,3,0,0], [0,0,0,4,0], [0,0,0,0,5]]).T

df.columns = ['col1', 'col2', 'col3']
    
> df

  col1 col2 col3
    1   0   0
    -2  0   0
    3   0   0
    0   4   0
    0   0   5

I want to create a fourth ("Col4") that takes the col that is non-zero.

So result would be:

  col1 col2 col3 col4
    1   0   0   1  
    -2  0   0   -2
    3   0   0   3
    0   4   0   4
    0   0   5   5

EDIT: If two non-zero, always use col1 . Also, the numbers may be negative. I have updated the df to reflect this.

Using the maximum of the columns is a possibility

df['col4'] = df.max(axis=1)

Here's an example:

def func(a):
  a = set(a)
  assert len(a)==2  # 0 and another number
  for i in a:
    if i!=0:
      return i
df['col4'] = df.apply(func,axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM