How to replace data in current columns using mutate?

Question

I want to group my dataframe by year and standardize certain columns (In this case BioTest, MathExam, and WritingScore) and replace the old data with the new data.Below is an example of my dataframe:

DF:

Var1   Var2   Year  BioTest   MathExam   WritingScore   Var3  Var 4
 X      X     2016   165        140         10           X     X
 X      X     2017   172        128         11           X     X
 X      X     2018   169        115          8           X     X
 X      X     2016   166        139         10           X     X
 X      X     2017   165        140         12           X     X

I have tried variations of the following code:

DF<- DF %>% group_by(Year)%>% mutate(across(BioTest:WritingScore),scale)

DF<- DF %>% group_by(Year)%>% mutate(across(select(BioTest:WritingScore)),scale)

What I get in return is the same DF without any changes. What I want is:

 DF:

 Var1   Var2   Year  BioTest   MathExam   WritingScore   Var3  Var 4
 X      X     2016   NewData     NewData      NewData      X     X
 X      X     2017   NewData     NewData      NewData      X     X
 X      X     2018   NewData     NewData      NewData      X     X
 X      X     2016   NewData     NewData      NewData      X     X
 X      X     2017   NewData     NewData      NewData      X     X

Any help is much appreciated.

Answer 1

The issue could be that dplyr::mutate was masked by the plyr::mutate . It can be reproduced with (along with the fact that across is closed without a function)

iris %>%
    group_by(Species) %>%
    plyr::mutate(across(where(is.numeric), scale))
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
# 1          5.1         3.5          1.4         0.2 setosa 
# 2          4.9         3            1.4         0.2 setosa 
# 3          4.7         3.2          1.3         0.2 setosa 
# 4          4.6         3.1          1.5         0.2 setosa 
# 5          5           3.6          1.4         0.2 setosa 
# 6          5.4         3.9          1.7         0.4 setosa 
# 7          4.6         3.4          1.4         0.3 setosa 
# 8          5           3.4          1.5         0.2 setosa 
# 9          4.4         2.9          1.4         0.2 setosa 
#10          4.9         3.1          1.5         0.1 setosa 
# … with 140 more rows

which is the same as the initial 'iris' dataset

Now, check with the correct dplyr::mutate

iris %>% 
   group_by(Species) %>%
   dplyr::mutate(across(where(is.numeric), scale))
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length[,1] Sepal.Width[,1] Petal.Length[,1] Petal.Width[,1] Species
#              <dbl>           <dbl>            <dbl>           <dbl> <fct>  
# 1           0.267           0.190            -0.357          -0.436 setosa 
# 2          -0.301          -1.13             -0.357          -0.436 setosa 
# 3          -0.868          -0.601            -0.933          -0.436 setosa 
# 4          -1.15           -0.865             0.219          -0.436 setosa 
# 5          -0.0170          0.454            -0.357          -0.436 setosa 
# 6           1.12            1.25              1.37            1.46  setosa 
# 7          -1.15           -0.0739           -0.357           0.512 setosa 
# 8          -0.0170         -0.0739            0.219          -0.436 setosa 
# 9          -1.72           -1.39             -0.357          -0.436 setosa 
#10          -0.301          -0.865             0.219          -1.39  setosa 
# … with 140 more rows

So, in the OP's code, we just need to use dplyr::mutate or restart a fresh R session with only dplyr loaded

DF %>% 
   group_by(Year)%>% 
   dplyr::mutate(across(BioTest:WritingScore, scale))

scale returns a matrix with some attributes. If we only need the numeric vector part, we can either use as.vector or as.numeric

DF %>% 
   group_by(Year)%>% 
   dplyr::mutate(across(BioTest:WritingScore, ~ as.numeric(scale(.)))

NOTE: The select is not needed within across

Answer 2

Maybe try this. THe issue is on your across() statement. The function must be inside on it:

library(dplyr)
#Code
DF %>%
  group_by(Year) %>%
  mutate(across(BioTest:WritingScore,~scale(.)[,1]))

Output:

# A tibble: 5 x 9
# Groups:   Year [3]
  Var1  Var2   Year BioTest[,1] MathExam[,1] WritingScore[,1] Var3  Var   X4   
  <chr> <chr> <int>       <dbl>        <dbl>            <dbl> <chr> <chr> <lgl>
1 X     X      2016      -0.707        0.707          NaN     X     X     NA   
2 X     X      2017       0.707       -0.707           -0.707 X     X     NA   
3 X     X      2018     NaN          NaN              NaN     X     X     NA   
4 X     X      2016       0.707       -0.707          NaN     X     X     NA   
5 X     X      2017      -0.707        0.707            0.707 X     X     NA

Some data used:

#Data
DF <- structure(list(Var1 = c("X", "X", "X", "X", "X"), Var2 = c("X", 
"X", "X", "X", "X"), Year = c(2016L, 2017L, 2018L, 2016L, 2017L
), BioTest = c(165L, 172L, 169L, 166L, 165L), MathExam = c(140L, 
128L, 115L, 139L, 140L), WritingScore = c(10L, 11L, 8L, 10L, 
12L), Var3 = c("X", "X", "X", "X", "X"), Var = c("X", "X", "X", 
"X", "X"), X4 = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))

How to replace data in current columns using mutate?

Question

2 answers

solution1
3 ACCPTED 2020-09-17 15:41:31

solution2
1 2020-09-17 15:40:43

How to replace data in current columns using mutate?

Question

2 answers

solution1 3 ACCPTED 2020-09-17 15:41:31

solution2 1 2020-09-17 15:40:43

solution1
3 ACCPTED 2020-09-17 15:41:31

solution2
1 2020-09-17 15:40:43