How to populate new column based on values in other columns?

Question

I want to make and populate a new column with a value based on whether or not a column has a value in it. I have three columns to compare and there are orders to which I would prefer the values to be populated.

Let's say I have 3 columns (ABC) and I want to populate the new column (Y) with the value in AB or C but I want to rank them. So if Column A has a value I want that to populate column Y with precedence over columns B and C. If B has a value it takes precedence over C and column C takes precedence over nothing.

What I have:

A   B   C   Y        
1   NA  NA             
NA  2   NA
NA  3   NA
NA  NA  4        
5   NA  NA
6   6   NA
7   NA  NA
NA  NA  8
9   NA  9
10  10  10

What I want:

A   B   C   Y        
1   NA  NA  1           
NA  2   NA  2
NA  3   NA  3
NA  NA  4   4     
5   NA  NA  5
6   6   NA  6
7   NA  NA  7
NA  NA  8   8
9   NA  9   9
10  10  10 10

Answer 1

Use np.where() for a vectorized approach.

df['Y'] = np.where(df['A'] != np.nan, df['A'], df['B'])
df['Y'] = np.where(df['B'] == np.nan, df['C'], df['Y'])

Since you don't have a df that can be reused in your question, I just wrote the method line you need.

Next time you ask question, kindly include a snippet of code that can be used to test the possible answer/s. Welcome to the community:D

If NA in your DataFrame is a string:

The above code won't work, use the actual string value to parse through the dataframe.

df['Y'] = np.where(df['A'] != "NA", df['A'], df['B'])
df['Y'] = np.where(df['B'] == "NA", df['C'], df['Y'])

Last note, another possible input is when all three columns has NA values.

That is not specified in your question, but if you want to capture that instance, just add a new line to check values at your C column.

Then in the false return value, place a value you want to use if C value is also NA.

Answer 2

Great question. I think there are a lot of ways to approach this, One that immediately comes to mind for me is to use a loop that converts each row into a series. and then populates the Y column with the first entry in that series that is not a 'NA' value: The a general code would look like:

for row in DF: 
     temp_series = pd.Series(row)
     for entry in temp_series:
          if entry==NA:
                continue
          else:
                df.iloc[row,3] = entry #3 = Y col index

Again this isn't an exact copy and paste solution, but this methodology should give you what you want. Good luck and happy coding!

Edit: And from one new user to another, welcome to the community!

Answer 3

You can use pandas to construct the data structure, and then a careful use of the apply() function can help you get the transformation that you want.

import pandas as pd
import math

data = (
  [1,    None, None,],
  [None, 2   , None,],
  [None, 3   , None,],
  [None, None, 4,   ],
  [5   , None, None,],
  [6   , 6   , None,],
  [7   , None, None,],
  [None, None, 8,   ],
  [9   , None, 9,   ],
  [10,   10,   10,  ],
)

df = pd.DataFrame(columns=('A', 'B', 'C'))

# Load in data
for row in data:
  df = df.append(pd.Series(row, index=df.columns), ignore_index=True)
print(df)

def calc_y(row):
  for item in row:
    if not math.isnan(item): 
      return item

df['Y'] = df.apply(calc_y, axis=1)

print(df)

How to populate new column based on values in other columns?

Question

3 answers

solution1
1 2019-10-07 11:21:25

solution2
0 2019-10-06 23:45:21

solution3
0 2019-10-07 00:08:59

How to populate new column based on values in other columns?

Question

3 answers

solution1 1 2019-10-07 11:21:25

solution2 0 2019-10-06 23:45:21

solution3 0 2019-10-07 00:08:59

solution1
1 2019-10-07 11:21:25

solution2
0 2019-10-06 23:45:21

solution3
0 2019-10-07 00:08:59