简体   繁体   English

R:使用dplyr缩放多列(具有相似名称)的子集

[英]R: Scale a subset of multiple columns (with similar names) with dplyr

I recently moved from common dataframe manipulation in R to the tidyverse. 我最近从R中的常见数据帧操作转到了tidyverse。 But I got a problem regarding scaling of columns with the scale() function. 但是我有一个关于使用scale()函数缩放列的问题。 My data consists of columns of whom some are numerical and some categorical features. 我的数据由一些列组成,其中一些是数字列,一些是分类特征。 Also the last column is the y value of data. 最后一列也是数据的y值。 So I want to scale all numerical columns but not the last column. 所以我想缩放所有数字列,但不缩放最后一列。 With the select() function i am able to write a very short line of code and select all my numerical columns that need to be scaled if i add the ends_with("...") argument. 使用select()函数,我可以编写很短的代码行,并且如果我添加了ends_with("...")参数,则可以选择需要缩放的所有数字列。 But I can't really make use of that with scaling. 但是我不能真正在缩放中使用它。 There I have to use transmute(feature1=scale(feature1),feature2=scale(feature2)...) and name each feature individually. 我必须使用transmute(feature1=scale(feature1),feature2=scale(feature2)...)并分别命名每个功能。 This works fine but bloats up the code. 这可以正常工作,但会使代码膨胀。 So my question is: 所以我的问题是:

Is there a smart solution to manipulate column by column without the need to address every single column name with transmute? 是否有一个聪明的解决方案来逐列操作,而无需使用transmute处理每个列名称?

I imagine something like: 我想像这样:

transmute(ends_with("...")=scale(ends_with("..."),featureX,featureZ)

(well aware that this does not work) (要知道这是行不通的)

Many thanks in advance 提前谢谢了

library(tidyverse)
data("economics") 

# add variables that are not numeric
economics[7:9] <- sample(LETTERS[1:10], size = dim(economics)[1], replace = TRUE)

# add a 'y' column (for illustration)
set.seed(1)
economics$y <- rnorm(n = dim(economics)[1])

economics_modified <- economics %>%
                       select(-y) %>%
                       transmute_if(is.numeric, scale) %>% 
                       add_column(y = economics$y)

If you want to keep those columns that are not numeric replace transmute_if with modify_if . 如果你想保留那些没有数字代替列transmute_ifmodify_if (There might be a smarter way to exclude column y from being scaled.) (可能存在一种更聪明的方法来将y列排除在缩放范围之外。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM