Create new column in data.frame based on if-else assessment of other columns

Question

I am trying to add a new column ( x_new )to my dataframe that is dependent on the value given in a 'definition' column. The definition column x_definition contains on of the following record types: - A constant number - A string describing the operation needed - NA

I want the resultant column, x_new , to look as follows: - If x_definition is NA, then x_new remains NA. - If x_definition is a string, then it requires a certain calculation. For example if it's 'equal_to_z' than the result should be z , or if its 'third_of_z' , then x_new should be z/3. There are also more than just these definitions that indicate more complex functions of z are required. - If x_definition is any number, then x_new should just be that number.

I wrote the following code which works to handle those cases, but is a cumbersome group of nested ifelse statements. I am looking for a method that

data <- data %>% mutate(x_new = ifelse(
  is.na(x_definition), NA, ifelse(
    x_definition=='equal_to_z', z, ifelse(
      x_definition=='third_of_z', z/3, NA
      )
    )
  )
)

I also considered using switch but ran into the problem where I don't know how to say "if it's a number, leave it as a number"

a <- data %>% mutate(x_new = switch(x_definition,
  'equal_to_z' = z,
  'third_of_z' = z / 3,
  <number???> = x_definition
  )
)

What would be an appropriate process for addressing this?

Answer 1

I think case_when is exactly what you're looking for.

data = data %>%
    mutate(x_new = case_when(is.na(x_definition) ~ NA,
                             x_definition == 'equal_to_z' ~ z,
                             x_definition == 'third_of_z' ~ z / 3,
                             !is.na(as.numeric(x_definition)) ~ as.numeric(x_definition)))

Answer 2

Yes, this is a very common need and it has a very good solution.

Your logic is:

If x_definition is NA, then x_new remains NA. - If x_definition is a string, then it requires a certain calculation. For example if it's 'equal_to_z' than the result should be z, or if its 'third_of_z', then x_new should be z/3. There are also more than just these definitions that indicate more complex functions of z are required. - If x_definition is any number, then x_new should just be that number.

I can rewrite it as

np.nan if row['x_definition'] is np.nan 
else row['z'] if row['x_definition'] == 'equal_to_z' 
else row['z']/3 if row['x_definition'] == 'third_of_z' 
else row['x_definition'] if isinstance('row['x_definition'], int) 
else np.nan

then you can do

df['x_new'] = df.apply(lambda row: np.nan if row['x_definition'] is np.nan 
                    else row['z'] if row['x_definition'] == 'equal_to_z' 
                    else row['z']/3 if row['x_definition'] == 'third_of_z' 
                    else row['x_definition'] if isinstance('row['x_definition'], int) 
                    else np.nan, axis=1)

or if you want to be more elegant

def logic_for_x_new(row):
 ...
 return x_new

df['x_new'] = df.apply(logic_for_x_new, axis=1)

Just be careful in how you check for nan in Pandas, I use the trick that x==x is false when x is NaN (just be careful with this)

Create new column in data.frame based on if-else assessment of other columns

Question

2 answers

solution1
1 2019-11-08 15:56:46

solution2
0 2019-11-08 15:59:39

Create new column in data.frame based on if-else assessment of other columns

Question

2 answers

solution1 1 2019-11-08 15:56:46

solution2 0 2019-11-08 15:59:39

solution1
1 2019-11-08 15:56:46

solution2
0 2019-11-08 15:59:39