简体   繁体   English

如何在 R 中为每列添加不同条件的多列?

[英]How to add multiple columns in R with different condition for each column?

Here is my data set.这是我的数据集。 I would like to add 5 new columns to mydata with 5 different conditions.我想用5不同的条件向mydata添加5新列。

mydata=data.frame(sub=rep(c(1:4),c(3,4,5,5)),t=c(1:3,1:4,1:5,1:5),
                      y.val=c(10,20,13,
                          5,7,8,0,
                          45,17,25,12,10,
                          40,0,0,5,8))
mydata
   sub t y.val
1    1 1    10
2    1 2    20
3    1 3    13
4    2 1     5
5    2 2     7
6    2 3     8
7    2 4     0
8    3 1    45
9    3 2    17
10   3 3    25
11   3 4    12
12   3 5    10
13   4 1    40
14   4 2     0
15   4 3     0
16   4 4     5
17   4 5     8

I would like to add the following 5 (max of 't' column) columns as我想添加以下5 (max of 't' column)列作为

mydata$It1=ifelse(mydata$t==1 & mydata$y.val>0,1,0)
mydata$It2=ifelse(mydata$t==2 & mydata$y.val>0,1,0)
mydata$It3=ifelse(mydata$t==3 & mydata$y.val>0,1,0)
mydata$It4=ifelse(mydata$t==4 & mydata$y.val>0,1,0)
mydata$It5=ifelse(mydata$t==5 & mydata$y.val>0,1,0)

Here is the expected outcome.这是预期的结果。

> mydata
   sub t y.val It1 It2 It3 It4 It5
1    1 1    10   1   0   0   0   0
2    1 2    20   0   1   0   0   0
3    1 3    13   0   0   1   0   0
4    2 1     5   1   0   0   0   0
5    2 2     7   0   1   0   0   0
6    2 3     8   0   0   1   0   0
7    2 4     0   0   0   0   0   0
8    3 1    45   1   0   0   0   0
9    3 2    17   0   1   0   0   0
10   3 3    25   0   0   1   0   0
11   3 4    12   0   0   0   1   0
12   3 5    10   0   0   0   0   1
13   4 1    40   1   0   0   0   0
14   4 2     0   0   0   0   0   0
15   4 3     0   0   0   0   0   0
16   4 4     5   0   0   0   1   0
17   4 5     8   0   0   0   0   1

I appreciate your help if it can be written as a function using for loop or any other technique.如果可以使用 for 循环或任何其他技术将它写成 function,我将感谢您的帮助。

You could use sapply / lapply你可以使用sapply / lapply

n <- seq_len(5)
mydata[paste0("It", n)] <- +(sapply(n, function(x) mydata$t==x & mydata$y.val>0))
mydata

#   sub t y.val It1 It2 It3 It4 It5
#1    1 1    10   1   0   0   0   0
#2    1 2    20   0   1   0   0   0
#3    1 3    13   0   0   1   0   0
#4    2 1     5   1   0   0   0   0
#5    2 2     7   0   1   0   0   0
#6    2 3     8   0   0   1   0   0
#7    2 4     0   0   0   0   0   0
#8    3 1    45   1   0   0   0   0
#9    3 2    17   0   1   0   0   0
#10   3 3    25   0   0   1   0   0
#11   3 4    12   0   0   0   1   0
#12   3 5    10   0   0   0   0   1
#13   4 1    40   1   0   0   0   0
#14   4 2     0   0   0   0   0   0
#15   4 3     0   0   0   0   0   0
#16   4 4     5   0   0   0   1   0
#17   4 5     8   0   0   0   0   1

mydata$t==x & mydata$y.val>0 returns a logical value of TRUE / FALSE based on condition. mydata$t==x & mydata$y.val>0根据条件返回逻辑值TRUE / FALSE The + changes those logical values to 1/0 respectively. +将这些逻辑值分别更改为 1/0。 (Try +c(FALSE, TRUE) ). (尝试+c(FALSE, TRUE) )。 It avoids using ifelse ie ifelse(condition, 1, 0) .它避免使用ifelseifelse(condition, 1, 0)

Here's another approach based on multiplying a model matrix by the logical y.val > 0 .这是另一种基于将 model 矩阵乘以逻辑y.val > 0的方法。

df <- cbind(mydata[1:3], model.matrix(~ factor(t) + 0, mydata)*(mydata$y.val>0))

Which gives:这使:

   sub t y.val factor.t.1 factor.t.2 factor.t.3 factor.t.4 factor.t.5
1    1 1    10          1          0          0          0          0
2    1 2    20          0          1          0          0          0
3    1 3    13          0          0          1          0          0
4    2 1     5          1          0          0          0          0
5    2 2     7          0          1          0          0          0
6    2 3     8          0          0          1          0          0
7    2 4     0          0          0          0          0          0
8    3 1    45          1          0          0          0          0
9    3 2    17          0          1          0          0          0
10   3 3    25          0          0          1          0          0
11   3 4    12          0          0          0          1          0
12   3 5    10          0          0          0          0          1
13   4 1    40          1          0          0          0          0
14   4 2     0          0          0          0          0          0
15   4 3     0          0          0          0          0          0
16   4 4     5          0          0          0          1          0
17   4 5     8          0          0          0          0          1

To clean up the names you can do:要清理您可以执行的名称:

names(df) <- sub("factor.t.", "It", names(df), fixed = TRUE)

You can use sapply to compare each t for equality against 1:5 and combine this with an & of y.val>0 .您可以使用sapply比较每个t1:5的相等性,并将其与y.val>0&组合。

within(mydata, It <- +(sapply(1:5, `==`, t) & y.val>0))
#   sub t y.val It.1 It.2 It.3 It.4 It.5
#1    1 1    10    1    0    0    0    0
#2    1 2    20    0    1    0    0    0
#3    1 3    13    0    0    1    0    0
#4    2 1     5    1    0    0    0    0
#5    2 2     7    0    1    0    0    0
#6    2 3     8    0    0    1    0    0
#7    2 4     0    0    0    0    0    0
#8    3 1    45    1    0    0    0    0
#9    3 2    17    0    1    0    0    0
#10   3 3    25    0    0    1    0    0
#11   3 4    12    0    0    0    1    0
#12   3 5    10    0    0    0    0    1
#13   4 1    40    1    0    0    0    0
#14   4 2     0    0    0    0    0    0
#15   4 3     0    0    0    0    0    0
#16   4 4     5    0    0    0    1    0
#17   4 5     8    0    0    0    0    1

Here's a tidyverse solution, using pivot_wider :这是一个 tidyverse 解决方案,使用pivot_wider

library(tidyverse)

mydata %>%
  mutate(new_col = paste0("It", t),
         y_test = as.integer(y.val > 0)) %>%
  pivot_wider(id_cols = c(sub, t, y.val),
              names_from = new_col,
              values_from = y_test,
              values_fill = list(y_test = 0))

     sub     t y.val   It1   It2   It3   It4   It5
   <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1    10     1     0     0     0     0
 2     1     2    20     0     1     0     0     0
 3     1     3    13     0     0     1     0     0
 4     2     1     5     1     0     0     0     0
 5     2     2     7     0     1     0     0     0
 6     2     3     8     0     0     1     0     0
 7     2     4     0     0     0     0     0     0
 8     3     1    45     1     0     0     0     0
 9     3     2    17     0     1     0     0     0
10     3     3    25     0     0     1     0     0
11     3     4    12     0     0     0     1     0
12     3     5    10     0     0     0     0     1
13     4     1    40     1     0     0     0     0
14     4     2     0     0     0     0     0     0
15     4     3     0     0     0     0     0     0
16     4     4     5     0     0     0     1     0
17     4     5     8     0     0     0     0     1

Explanation:解释:

  • Make two columns, new_col (new column names with "It") and y_test ( y.val > 0).创建两列, new_col (带有“It”的新列名)和y_testy.val > 0)。
  • Pivot new_col values into column names. Pivot new_col值转换为列名。
  • Fill in the NA values with zeros.用零填充NA值。

One purrr and dplyr option could be:一个purrrdplyr选项可以是:

map_dfc(.x = 1:5,
        ~ mydata %>%
         mutate(!!paste0("It", .x) := as.integer(t == .x & y.val > 0)) %>%
         select(starts_with("It"))) %>%
 bind_cols(mydata)

   It1 It2 It3 It4 It5 sub t y.val
1    1   0   0   0   0   1 1    10
2    0   1   0   0   0   1 2    20
3    0   0   1   0   0   1 3    13
4    1   0   0   0   0   2 1     5
5    0   1   0   0   0   2 2     7
6    0   0   1   0   0   2 3     8
7    0   0   0   0   0   2 4     0
8    1   0   0   0   0   3 1    45
9    0   1   0   0   0   3 2    17
10   0   0   1   0   0   3 3    25
11   0   0   0   1   0   3 4    12
12   0   0   0   0   1   3 5    10
13   1   0   0   0   0   4 1    40
14   0   0   0   0   0   4 2     0
15   0   0   0   0   0   4 3     0
16   0   0   0   1   0   4 4     5
17   0   0   0   0   1   4 5     8

Or if you want to perform it dynamically according the range in t column:或者如果你想根据 t 列中的范围动态执行它:

map_dfc(.x = reduce(as.list(range(mydata$t)), `:`),
        ~ mydata %>%
         mutate(!!paste0("It", .x) := as.integer(t == .x & y.val > 0)) %>%
         select(starts_with("It"))) %>%
 bind_cols(mydata)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM