[英]Creating a column whose values are dependent on multiple other columns
I'm trying to create a new column ("newcol") in a dataframe ("data"), whose values will be determined by the contents of up to two other columns in the dataframe ("B_stance" and "C_stance"). 我正在尝试在数据帧(“ data”)中创建一个新列(“ newcol”),其值将由该数据帧中其他两个列(“ B_stance”和“ C_stance”)的内容决定。 The values within B_stance are either "L", "R", "U" or "N". B_stance中的值是“ L”,“ R”,“ U”或“ N”。 Within C_stance they are either "L" or "R". 在C_stance中,它们是“ L”或“ R”。
Please excuse the semi-logical language, but I need R code which will achieve this for the contents of newcol: 请原谅半逻辑语言,但是我需要R代码,它将为newcol的内容实现这一点:
if (data$B_stance = "L" AND data$C_stance = "L") then (data$newcol = "N")
if (data$B_stance = "L" AND data$C_stance = "R") then (data$newcol = "Y")
if (data$B_stance = "R" AND data$C_stance = "R") then (data$newcol = "N")
if (data$B_stance = "R" AND data$C_stance = "L") then (data$newcol = "Y")
if (data$B_stance = "U") then (data$newcol = "N")
if (data$B_stance = "N") then (data$newcol = "N")
I've tried to see if/how "ifelse" could achieve this, but cannot find an example of how to draw from multiple column values in determining the new value. 我尝试查看“ ifelse”是否可以/如何实现此目的,但是找不到如何在确定新值时从多个列值中提取的示例。
It may be easier to create a key/val
dataset and then do a join 创建key/val
数据集然后进行key/val
可能会更容易
keydat <- data.frame(B_stance = c('L', 'L', 'R', 'R'),
C_stance = c('L', 'R', 'R', 'L'),
newcol = c('N', 'Y', 'N', 'Y'),
stringsAsFactors = FALSE)
library(dplyr)
left_join(data, keydat) %>%
mutate(newcol = replace(newcol, is.na(newcol), 'N'))
In base R the ifelse
function is most useful for these conditions. 在基数R中, ifelse
函数对于这些条件最有用。 The dplyr
library includesa more robust if_else
function and a case_when
function. dplyr
库包括一个更强大的if_else
函数和case_when
函数。 The ifelse
returns the second argument if the first is true and returns the third argument if the first argument is false. ifelse
如果第一个参数为true,则返回第二个参数;如果第一个参数为false,则返回第三个参数。
data <- read.table(text="
B_stance C_stance
L R
L L
U X
R L
R R
N X
X X
", header= TRUE)
data$newcol = ifelse(data$B_stance == "L" & data$C_stance == "L", "N",
ifelse(data$B_stance == "L" & data$C_stance == "R", "Y",
ifelse(data$B_stance == "R" & data$C_stance == "R", "N",
ifelse(data$B_stance == "R" & data$C_stance == "L", "Y",
ifelse(data$B_stance == "U", "N",
ifelse(data$B_stance == "N", "N",
NA))))))
data
# B_stance C_stance newcol
# 1 L R Y
# 2 L L N
# 3 U X N
# 4 R L Y
# 5 R R N
# 6 N X N
# 7 X X <NA>
With dplyr
you can use case_when
. 使用dplyr
可以使用case_when
。 It's a little cleaner than nested if_else
s if you have numerous conditions. 如果您有很多条件,它比嵌套的if_else
干净一点。
df <- data.frame(
B_stance = c('L', 'L', 'R', 'R'),
C_stance = c('L', 'R', 'R', 'L'),
stringsAsFactors = FALSE
)
df %>% mutate(
newcol = case_when(
B_stance == 'U' ~ 'N',
B_stance == 'N' ~ 'N',
B_stance == 'L' & C_stance == 'L' ~ 'N',
B_stance == 'L' & C_stance == 'R' ~ 'Y',
B_stance == 'R' & C_stance == 'L' ~ 'Y',
B_stance == 'R' & C_stance == 'R' ~ 'N',
TRUE ~ B_stance
)
)
# B_stance C_stance newcol
# 1 L L N
# 2 L R Y
# 3 R R N
# 4 R L Y
Note that the conditioning within case_when
is lazy; 注意case_when
中的条件是惰性的; the first true statement is executed. 第一条true语句被执行。 The final TRUE
ensures there's a fallback in case no statement is true. 最后的TRUE
确保在没有语句为true的情况下进行回退。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.