I am trying to add a new column ( x_new
)to my dataframe that is dependent on the value given in a 'definition' column. The definition column x_definition
contains on of the following record types: - A constant number - A string describing the operation needed - NA
I want the resultant column, x_new
, to look as follows: - If x_definition
is NA, then x_new
remains NA. - If x_definition
is a string, then it requires a certain calculation. For example if it's 'equal_to_z'
than the result should be z
, or if its 'third_of_z'
, then x_new
should be z/3. There are also more than just these definitions that indicate more complex functions of z are required. - If x_definition
is any number, then x_new
should just be that number.
I wrote the following code which works to handle those cases, but is a cumbersome group of nested ifelse
statements. I am looking for a method that
data <- data %>% mutate(x_new = ifelse(
is.na(x_definition), NA, ifelse(
x_definition=='equal_to_z', z, ifelse(
x_definition=='third_of_z', z/3, NA
)
)
)
)
I also considered using switch
but ran into the problem where I don't know how to say "if it's a number, leave it as a number"
a <- data %>% mutate(x_new = switch(x_definition,
'equal_to_z' = z,
'third_of_z' = z / 3,
<number???> = x_definition
)
)
What would be an appropriate process for addressing this?
I think case_when
is exactly what you're looking for.
data = data %>%
mutate(x_new = case_when(is.na(x_definition) ~ NA,
x_definition == 'equal_to_z' ~ z,
x_definition == 'third_of_z' ~ z / 3,
!is.na(as.numeric(x_definition)) ~ as.numeric(x_definition)))
Yes, this is a very common need and it has a very good solution.
Your logic is:
If x_definition is NA, then x_new remains NA. - If x_definition is a string, then it requires a certain calculation. For example if it's 'equal_to_z' than the result should be z, or if its 'third_of_z', then x_new should be z/3. There are also more than just these definitions that indicate more complex functions of z are required. - If x_definition is any number, then x_new should just be that number.
I can rewrite it as
np.nan if row['x_definition'] is np.nan
else row['z'] if row['x_definition'] == 'equal_to_z'
else row['z']/3 if row['x_definition'] == 'third_of_z'
else row['x_definition'] if isinstance('row['x_definition'], int)
else np.nan
then you can do
df['x_new'] = df.apply(lambda row: np.nan if row['x_definition'] is np.nan
else row['z'] if row['x_definition'] == 'equal_to_z'
else row['z']/3 if row['x_definition'] == 'third_of_z'
else row['x_definition'] if isinstance('row['x_definition'], int)
else np.nan, axis=1)
or if you want to be more elegant
def logic_for_x_new(row):
...
return x_new
df['x_new'] = df.apply(logic_for_x_new, axis=1)
Just be careful in how you check for nan in Pandas, I use the trick that x==x is false when x is NaN (just be careful with this)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.