简体   繁体   中英

R: Scale a subset of multiple columns (with similar names) with dplyr

I recently moved from common dataframe manipulation in R to the tidyverse. But I got a problem regarding scaling of columns with the scale() function. My data consists of columns of whom some are numerical and some categorical features. Also the last column is the y value of data. So I want to scale all numerical columns but not the last column. With the select() function i am able to write a very short line of code and select all my numerical columns that need to be scaled if i add the ends_with("...") argument. But I can't really make use of that with scaling. There I have to use transmute(feature1=scale(feature1),feature2=scale(feature2)...) and name each feature individually. This works fine but bloats up the code. So my question is:

Is there a smart solution to manipulate column by column without the need to address every single column name with transmute?

I imagine something like:

transmute(ends_with("...")=scale(ends_with("..."),featureX,featureZ)

(well aware that this does not work)

Many thanks in advance

library(tidyverse)
data("economics") 

# add variables that are not numeric
economics[7:9] <- sample(LETTERS[1:10], size = dim(economics)[1], replace = TRUE)

# add a 'y' column (for illustration)
set.seed(1)
economics$y <- rnorm(n = dim(economics)[1])

economics_modified <- economics %>%
                       select(-y) %>%
                       transmute_if(is.numeric, scale) %>% 
                       add_column(y = economics$y)

If you want to keep those columns that are not numeric replace transmute_if with modify_if . (There might be a smarter way to exclude column y from being scaled.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM