从 R 数据框中的多列创建新变量的有效方法

Question

I'm trying to create a new variable called DRG from a set of 480 variables based on some condition, the new variable is a binary if the condition is true.我正在尝试根据某些条件从一组 480 个变量中创建一个名为 DRG 的新变量，如果条件为真，则新变量是一个二进制变量。 if any of the columns in the data frame has the values 060 or 191 then DRG =1 else DRG =0;如果数据框中的任何列的值为 060 或 191，则 DRG =1 否则 DRG =0；

 for (i in 1:nrow(DATA_opioid)){

   for (j in 42:480)

     { if (!is.na(DATA_opioid[i,j])  {

     if ( (DATA_opioid[i,j]) == '060' | (DATA_opioid[i,j]) == '191'| (DATA_opioid[i,j+1]))==           
    '060' |(!is.na(DATA_opioid[i,j+1]))=='191')

        { 
          DATA_opioid$DRG =1
        }
      else DATA_opioid$DRG =0

       }
   }

I have been unable to get arrive at a working code although I did succeeded when I tried it for one of the columns.我一直无法得到一个工作代码，尽管当我为其中一列尝试它时我确实成功了。 but there are 480 variables of all starting with prefix 'RX'.但是有 480 个变量都以前缀“RX”开头。 Any useful suggestion to solve this is most welcome.任何解决此问题的有用建议都非常受欢迎。

for (i in 1:nrow(DATA_opioid)){
    if (DATA_opioid$RX1CAT1[i]  == "060" | DATA_opioid$RX1CAT1[i] == "191"){

    DATA_opioid$DRG[i] =1 

}
else DATA_opioid$DRG[i] =0
}

Answer 1

You don't need to use loops for such operations.您不需要为此类操作使用循环。 There are many ways to do this.有很多方法可以做到这一点。 Here are few one.这里有几个。

Using rowSums使用rowSums

df$DRG <- +(rowSums(df == '191' | df == '060') > 0)

#    a   b DRG
#1   1   2   0
#2   2   3   0
#3   3   4   0
#4   4 060   1
#5   5   3   0
#6 191   4   1

Using apply使用apply

df$DRG <- +(apply(df == '191' | df == '060', 1, any))

We can also use rowSums in dplyr chain我们也可以在dplyr链中使用rowSums

library(dplyr)
df %>% mutate(DRG = +(rowSums(. == '191' | . == '060') > 0))

If you want to test this only on some columns subset the dataframe for those columns in the above solution.如果您只想在某些列上进行测试，则对上述解决方案中的这些列的数据框进行子集化。 For example to test for columns 3 to 5 you can do例如要测试第 3 到 5 列，您可以执行

df$DRG <- +(apply(df[3:5] == '191' | df == '060', 1, any))

data数据

Tested on this data :对此数据进行了测试：

df <- data.frame(a = c(1:5, 191), b = c(2:4, '060', 3:4))

Answer 2

assuming your dataframe is called df:假设您的数据框称为 df：

DRG<-apply(df,1,function(x){
  max(x == "060" | x == 191)
})

从 R 数据框中的多列创建新变量的有效方法

问题描述

2 个解决方案

解决方案1
4 2019-12-27 02:10:54

解决方案2
2 2019-12-27 01:41:27

从 R 数据框中的多列创建新变量的有效方法

问题描述

2 个解决方案

解决方案1 4 2019-12-27 02:10:54

解决方案2 2 2019-12-27 01:41:27

解决方案1
4 2019-12-27 02:10:54

解决方案2
2 2019-12-27 01:41:27