简体   繁体   中英

How do I take a subset of a column and make it into a new column with 0s instead of NAs for missing values in R?

Let's say I have a column of values and another column of factors:

    vals factors
 1:   58       B
 2:   42       B
 3:   64       A
 4:   64       A
 5:   26       B
 6:   64       A
 7:   20       A
 8:   20       A
 9:   22       A
10:   29       B
11:   60       B
12:   41       A
13:   79       A
14:   82       A
15:   11       A
16:   97       A
17:    1       B
18:   29       B
19:   90       B
20:    2       A

I want to create a new column for each factor with the vals value still in the corresponding row and a 0 for rows that that are for the other factors . So, to do this manually, it would look like this:

    vals factors     A
 1:   58       B     0
 2:   42       B     0
 3:   64       A    64
 4:   64       A    64
 5:   26       B     0
 6:   64       A    64
 7:   20       A    20
 8:   20       A    20
 9:   22       A    22
10:   29       B     0

and the same for all factors. How would I go about doing this? I tried the simplest way, which is to do:

dt$A <-  dt[factors == "A",]$vals

But that very predictably doesn't work:

Error in set(x, j = name, value = value) : 
  Supplied 12 items to be assigned to 20 items of column 'A'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

in base R you would do:

cbind(df, as.data.frame.matrix(xtabs(vals~.,cbind(x=1:nrow(df), df))))

    vals factors  A  B
1:    58       B  0 58
2:    42       B  0 42
3:    64       A 64  0
4:    64       A 64  0
5:    26       B  0 26
6:    64       A 64  0
7:    20       A 20  0
8:    20       A 20  0
9:    22       A 22  0
10:   29       B  0 29
11:   60       B  0 60
12:   41       A 41  0
13:   79       A 79  0
14:   82       A 82  0
15:   11       A 11  0
16:   97       A 97  0
17:    1       B  0  1
18:   29       B  0 29
19:   90       B  0 90
20:    2       A  2  0

Another way in base R will be:

cbind(df, model.matrix(~0+factors, df)* df$vals)

    vals factors factorsA factorsB
1:    58       B        0       58
2:    42       B        0       42
3:    64       A       64        0
4:    64       A       64        0
5:    26       B        0       26
6:    64       A       64        0
7:    20       A       20        0
8:    20       A       20        0
9:    22       A       22        0
10:   29       B        0       29
11:   60       B        0       60
12:   41       A       41        0
13:   79       A       79        0
14:   82       A       82        0
15:   11       A       11        0
16:   97       A       97        0
17:    1       B        0        1
18:   29       B        0       29
19:   90       B        0       90
20:    2       A        2        0

Using dplyr you could mutate the new variables with an ifelse statement where if the name of the factor is equal to the name of the variable, then show the vals value, else, get a zero.

library(dplyr)

df <- df %>% mutate(A = ifelse(factors == "A", vals, 0),
                    B = ifelse(factors == "B", vals, 0))


   vals factors  A  B
1    58       B  0 58
2    42       B  0 42
3    64       A 64  0
4    64       A 64  0
5    26       B  0 26
6    64       A 64  0
7    20       A 20  0
8    20       A 20  0
9    22       A 22  0
10   29       B  0 29
11   60       B  0 60
12   41       A 41  0
13   79       A 79  0
14   82       A 82  0
15   11       A 11  0
16   97       A 97  0
17    1       B  0  1
18   29       B  0 29
19   90       B  0 90
20    2       A  2  0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM