[英]efficient way to create a new variable from multiple columns in R dataframe
I'm trying to create a new variable called DRG from a set of 480 variables based on some condition, the new variable is a binary if the condition is true.我正在尝试根据某些条件从一组 480 个变量中创建一个名为 DRG 的新变量,如果条件为真,则新变量是一个二进制变量。 if any of the columns in the data frame has the values 060 or 191 then DRG =1 else DRG =0;如果数据框中的任何列的值为 060 或 191,则 DRG =1 否则 DRG =0;
for (i in 1:nrow(DATA_opioid)){
for (j in 42:480)
{ if (!is.na(DATA_opioid[i,j]) {
if ( (DATA_opioid[i,j]) == '060' | (DATA_opioid[i,j]) == '191'| (DATA_opioid[i,j+1]))==
'060' |(!is.na(DATA_opioid[i,j+1]))=='191')
{
DATA_opioid$DRG =1
}
else DATA_opioid$DRG =0
}
}
I have been unable to get arrive at a working code although I did succeeded when I tried it for one of the columns.我一直无法得到一个工作代码,尽管当我为其中一列尝试它时我确实成功了。 but there are 480 variables of all starting with prefix 'RX'.但是有 480 个变量都以前缀“RX”开头。 Any useful suggestion to solve this is most welcome.任何解决此问题的有用建议都非常受欢迎。
for (i in 1:nrow(DATA_opioid)){
if (DATA_opioid$RX1CAT1[i] == "060" | DATA_opioid$RX1CAT1[i] == "191"){
DATA_opioid$DRG[i] =1
}
else DATA_opioid$DRG[i] =0
}
You don't need to use loops for such operations.您不需要为此类操作使用循环。 There are many ways to do this.有很多方法可以做到这一点。 Here are few one.这里有几个。
Using rowSums
使用rowSums
df$DRG <- +(rowSums(df == '191' | df == '060') > 0)
# a b DRG
#1 1 2 0
#2 2 3 0
#3 3 4 0
#4 4 060 1
#5 5 3 0
#6 191 4 1
Using apply
使用apply
df$DRG <- +(apply(df == '191' | df == '060', 1, any))
We can also use rowSums
in dplyr
chain我们也可以在dplyr
链中使用rowSums
library(dplyr)
df %>% mutate(DRG = +(rowSums(. == '191' | . == '060') > 0))
If you want to test this only on some columns subset the dataframe for those columns in the above solution.如果您只想在某些列上进行测试,则对上述解决方案中的这些列的数据框进行子集化。 For example to test for columns 3 to 5 you can do例如要测试第 3 到 5 列,您可以执行
df$DRG <- +(apply(df[3:5] == '191' | df == '060', 1, any))
data数据
Tested on this data :对此数据进行了测试:
df <- data.frame(a = c(1:5, 191), b = c(2:4, '060', 3:4))
assuming your dataframe is called df:假设您的数据框称为 df:
DRG<-apply(df,1,function(x){
max(x == "060" | x == 191)
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.