简体   繁体   English

从 R 数据框中的多列创建新变量的有效方法

[英]efficient way to create a new variable from multiple columns in R dataframe

I'm trying to create a new variable called DRG from a set of 480 variables based on some condition, the new variable is a binary if the condition is true.我正在尝试根据某些条件从一组 480 个变量中创建一个名为 DRG 的新变量,如果条件为真,则新变量是一个二进制变量。 if any of the columns in the data frame has the values 060 or 191 then DRG =1 else DRG =0;如果数据框中的任何列的值为 060 或 191,则 DRG =1 否则 DRG =0;

 for (i in 1:nrow(DATA_opioid)){

   for (j in 42:480)

     { if (!is.na(DATA_opioid[i,j])  {

     if ( (DATA_opioid[i,j]) == '060' | (DATA_opioid[i,j]) == '191'| (DATA_opioid[i,j+1]))==           
    '060' |(!is.na(DATA_opioid[i,j+1]))=='191')

        { 
          DATA_opioid$DRG =1
        }
      else DATA_opioid$DRG =0

       }
   }

I have been unable to get arrive at a working code although I did succeeded when I tried it for one of the columns.我一直无法得到一个工作代码,尽管当我为其中一列尝试它时我确实成功了。 but there are 480 variables of all starting with prefix 'RX'.但是有 480 个变量都以前缀“RX”开头。 Any useful suggestion to solve this is most welcome.任何解决此问题的有用建议都非常受欢迎。

for (i in 1:nrow(DATA_opioid)){
    if (DATA_opioid$RX1CAT1[i]  == "060" | DATA_opioid$RX1CAT1[i] == "191"){

    DATA_opioid$DRG[i] =1 

}
else DATA_opioid$DRG[i] =0
}

You don't need to use loops for such operations.您不需要为此类操作使用循环。 There are many ways to do this.有很多方法可以做到这一点。 Here are few one.这里有几个。

Using rowSums使用rowSums

df$DRG <- +(rowSums(df == '191' | df == '060') > 0)

#    a   b DRG
#1   1   2   0
#2   2   3   0
#3   3   4   0
#4   4 060   1
#5   5   3   0
#6 191   4   1

Using apply使用apply

df$DRG <- +(apply(df == '191' | df == '060', 1, any))

We can also use rowSums in dplyr chain我们也可以在dplyr链中使用rowSums

library(dplyr)
df %>% mutate(DRG = +(rowSums(. == '191' | . == '060') > 0))

If you want to test this only on some columns subset the dataframe for those columns in the above solution.如果您只想在某些列上进行测试,则对上述解决方案中的这些列的数据框进行子集化。 For example to test for columns 3 to 5 you can do例如要测试第 3 到 5 列,您可以执行

df$DRG <- +(apply(df[3:5] == '191' | df == '060', 1, any))

data数据

Tested on this data :对此数据进行了测试:

df <- data.frame(a = c(1:5, 191), b = c(2:4, '060', 3:4))

assuming your dataframe is called df:假设您的数据框称为 df:

DRG<-apply(df,1,function(x){
  max(x == "060" | x == 191)
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于分组 dataframe 使用 ZE28396D3D40DZAF17 中的 dplyr 创建具有多个汇总列的 dataframe 的有效方法 - Efficient way to create a dataframe with multiple summary columns based on a grouped dataframe using dplyr in R 如何从 R 中的多列创建合并值的新数据框 - How to a create a new dataframe of consolidated values from multiple columns in R 基于 R 中 2 个单独数据帧的值创建新 dataframe 的有效方法 - An efficient way to create a new dataframe based on values from 2 separate dataframes in R 从多个列R创建名义变量 - Create nominal variable from multiple columns R 基于R中的单个字符列创建具有多列的新数据框 - Create new dataframe with multiple columns based on single character column in R R循环根据数据框名称创建多个新列 - R loop to create multiple new columns based on dataframe name 使用循环从 R 中的 dataframe 中的另一列创建多个列 - Use loop for create multiple columns from another columns in dataframe in R 根据其他3列的结果在R数据框中创建新列 - Create new column in R dataframe based on results from 3 other columns 从 R 中的现有列在空间数据框中创建一个新列 - Create a new column in a spatial dataframe from existing columns in R 在多列上使用 grep 在 R 中创建新变量 - Using grep on multiple columns to create new variable in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM