简体   繁体   中英

Create new column in data.frame based on if-else assessment of other columns

I am trying to add a new column ( x_new )to my dataframe that is dependent on the value given in a 'definition' column. The definition column x_definition contains on of the following record types: - A constant number - A string describing the operation needed - NA

I want the resultant column, x_new , to look as follows: - If x_definition is NA, then x_new remains NA. - If x_definition is a string, then it requires a certain calculation. For example if it's 'equal_to_z' than the result should be z , or if its 'third_of_z' , then x_new should be z/3. There are also more than just these definitions that indicate more complex functions of z are required. - If x_definition is any number, then x_new should just be that number.

I wrote the following code which works to handle those cases, but is a cumbersome group of nested ifelse statements. I am looking for a method that

data <- data %>% mutate(x_new = ifelse(
  is.na(x_definition), NA, ifelse(
    x_definition=='equal_to_z', z, ifelse(
      x_definition=='third_of_z', z/3, NA
      )
    )
  )
)

I also considered using switch but ran into the problem where I don't know how to say "if it's a number, leave it as a number"

a <- data %>% mutate(x_new = switch(x_definition,
  'equal_to_z' = z,
  'third_of_z' = z / 3,
  <number???> = x_definition
  )
)

What would be an appropriate process for addressing this?

I think case_when is exactly what you're looking for.

data = data %>%
    mutate(x_new = case_when(is.na(x_definition) ~ NA,
                             x_definition == 'equal_to_z' ~ z,
                             x_definition == 'third_of_z' ~ z / 3,
                             !is.na(as.numeric(x_definition)) ~ as.numeric(x_definition)))

Yes, this is a very common need and it has a very good solution.

Your logic is:

If x_definition is NA, then x_new remains NA. - If x_definition is a string, then it requires a certain calculation. For example if it's 'equal_to_z' than the result should be z, or if its 'third_of_z', then x_new should be z/3. There are also more than just these definitions that indicate more complex functions of z are required. - If x_definition is any number, then x_new should just be that number.

I can rewrite it as

np.nan if row['x_definition'] is np.nan 
else row['z'] if row['x_definition'] == 'equal_to_z' 
else row['z']/3 if row['x_definition'] == 'third_of_z' 
else row['x_definition'] if isinstance('row['x_definition'], int) 
else np.nan

then you can do

df['x_new'] = df.apply(lambda row: np.nan if row['x_definition'] is np.nan 
                    else row['z'] if row['x_definition'] == 'equal_to_z' 
                    else row['z']/3 if row['x_definition'] == 'third_of_z' 
                    else row['x_definition'] if isinstance('row['x_definition'], int) 
                    else np.nan, axis=1)

or if you want to be more elegant

def logic_for_x_new(row):
 ...
 return x_new

df['x_new'] = df.apply(logic_for_x_new, axis=1)

Just be careful in how you check for nan in Pandas, I use the trick that x==x is false when x is NaN (just be careful with this)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM