简体   繁体   中英

How to replace data in current columns using mutate?

I want to group my dataframe by year and standardize certain columns (In this case BioTest, MathExam, and WritingScore) and replace the old data with the new data.Below is an example of my dataframe:

DF:

Var1   Var2   Year  BioTest   MathExam   WritingScore   Var3  Var 4
 X      X     2016   165        140         10           X     X
 X      X     2017   172        128         11           X     X
 X      X     2018   169        115          8           X     X
 X      X     2016   166        139         10           X     X
 X      X     2017   165        140         12           X     X

I have tried variations of the following code:

DF<- DF %>% group_by(Year)%>% mutate(across(BioTest:WritingScore),scale)

DF<- DF %>% group_by(Year)%>% mutate(across(select(BioTest:WritingScore)),scale)

What I get in return is the same DF without any changes. What I want is:

 DF:

 Var1   Var2   Year  BioTest   MathExam   WritingScore   Var3  Var 4
 X      X     2016   NewData     NewData      NewData      X     X
 X      X     2017   NewData     NewData      NewData      X     X
 X      X     2018   NewData     NewData      NewData      X     X
 X      X     2016   NewData     NewData      NewData      X     X
 X      X     2017   NewData     NewData      NewData      X     X

Any help is much appreciated.

The issue could be that dplyr::mutate was masked by the plyr::mutate . It can be reproduced with (along with the fact that across is closed without a function)

iris %>%
    group_by(Species) %>%
    plyr::mutate(across(where(is.numeric), scale))
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
# 1          5.1         3.5          1.4         0.2 setosa 
# 2          4.9         3            1.4         0.2 setosa 
# 3          4.7         3.2          1.3         0.2 setosa 
# 4          4.6         3.1          1.5         0.2 setosa 
# 5          5           3.6          1.4         0.2 setosa 
# 6          5.4         3.9          1.7         0.4 setosa 
# 7          4.6         3.4          1.4         0.3 setosa 
# 8          5           3.4          1.5         0.2 setosa 
# 9          4.4         2.9          1.4         0.2 setosa 
#10          4.9         3.1          1.5         0.1 setosa 
# … with 140 more rows

which is the same as the initial 'iris' dataset

Now, check with the correct dplyr::mutate

iris %>% 
   group_by(Species) %>%
   dplyr::mutate(across(where(is.numeric), scale))
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length[,1] Sepal.Width[,1] Petal.Length[,1] Petal.Width[,1] Species
#              <dbl>           <dbl>            <dbl>           <dbl> <fct>  
# 1           0.267           0.190            -0.357          -0.436 setosa 
# 2          -0.301          -1.13             -0.357          -0.436 setosa 
# 3          -0.868          -0.601            -0.933          -0.436 setosa 
# 4          -1.15           -0.865             0.219          -0.436 setosa 
# 5          -0.0170          0.454            -0.357          -0.436 setosa 
# 6           1.12            1.25              1.37            1.46  setosa 
# 7          -1.15           -0.0739           -0.357           0.512 setosa 
# 8          -0.0170         -0.0739            0.219          -0.436 setosa 
# 9          -1.72           -1.39             -0.357          -0.436 setosa 
#10          -0.301          -0.865             0.219          -1.39  setosa 
# … with 140 more rows

So, in the OP's code, we just need to use dplyr::mutate or restart a fresh R session with only dplyr loaded

DF %>% 
   group_by(Year)%>% 
   dplyr::mutate(across(BioTest:WritingScore, scale))

scale returns a matrix with some attributes. If we only need the numeric vector part, we can either use as.vector or as.numeric

DF %>% 
   group_by(Year)%>% 
   dplyr::mutate(across(BioTest:WritingScore, ~ as.numeric(scale(.)))

NOTE: The select is not needed within across

Maybe try this. THe issue is on your across() statement. The function must be inside on it:

library(dplyr)
#Code
DF %>%
  group_by(Year) %>%
  mutate(across(BioTest:WritingScore,~scale(.)[,1]))

Output:

# A tibble: 5 x 9
# Groups:   Year [3]
  Var1  Var2   Year BioTest[,1] MathExam[,1] WritingScore[,1] Var3  Var   X4   
  <chr> <chr> <int>       <dbl>        <dbl>            <dbl> <chr> <chr> <lgl>
1 X     X      2016      -0.707        0.707          NaN     X     X     NA   
2 X     X      2017       0.707       -0.707           -0.707 X     X     NA   
3 X     X      2018     NaN          NaN              NaN     X     X     NA   
4 X     X      2016       0.707       -0.707          NaN     X     X     NA   
5 X     X      2017      -0.707        0.707            0.707 X     X     NA   

Some data used:

#Data
DF <- structure(list(Var1 = c("X", "X", "X", "X", "X"), Var2 = c("X", 
"X", "X", "X", "X"), Year = c(2016L, 2017L, 2018L, 2016L, 2017L
), BioTest = c(165L, 172L, 169L, 166L, 165L), MathExam = c(140L, 
128L, 115L, 139L, 140L), WritingScore = c(10L, 11L, 8L, 10L, 
12L), Var3 = c("X", "X", "X", "X", "X"), Var = c("X", "X", "X", 
"X", "X"), X4 = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM