简体   繁体   English

根据R数据帧中其他列的值缩放列的有效方法

[英]Efficient way of scaling column based on value in other column in R dataframe

I want to scale values in the column of a dataframe based on values in another colum. 我想基于另一个列中的值缩放dataframe的列中的值。 For example, here is a simple example 例如,这是一个简单的例子

d<-data.frame(x=runif(5,0,10),y=sample(c(1,2),size=5,replace=TRUE))

gives the output: 给出输出:

         x  y
1 1.0895865 2
2 0.8261554 2
3 5.3503761 2
4 3.3940759 1
5 6.2786637 1

I want to scale the x values based on the y values, so what I want is to have: 我想基于y值缩放x值,所以我想要的是:

(x|y=1 - average(x's | y=1))/std.dev(x's|y=1)

then replace the x values in d with the scaled values, similarly for the x values with y=2 . 然后将d中的x值替换为缩放后的值,类似y=2x值。

What I have done so far is a bit clunky: 到目前为止,我所做的有些笨拙:

     d1<-subset(d,y==1)
d2<-subset(d,y==2)

d1$x<-(d1$x-mean(d1$x))/sd(d1$x)
d2$x<-(d2$x-mean(d2$x))/sd(d2$x)

and then binding all the results in one big data frame, but this is a bit tedious since my actual data has 50 different values for y and I'd like to do this for multiple (different) columns. 然后将所有结果绑定到一个大数据框中,但这有点乏味,因为我的实际数据对y有50个不同的值,而我想对多个(不同)列进行此操作。

You can easily do this using group_by and mutate from the dplyr package: 您可以使用group_by并通过dplyr软件包中的mutate轻松完成此操作:

require(dplyr)
d %>% 
  group_by(y) %>% 
  mutate(x = (x - mean(x)) / sd(x))

This task is usually performy by group by in dplyr and using scale 此任务通常在dplyr中按group by dplyr scale

 library(dplyr)
 d %>% group_by(y) %>% mutate(x2=scale(x))

We can use data.table . 我们可以使用data.table We convert the 'data.frame' to 'data.table' ( setDT(d) ), grouped by 'b', assign ( := ) the scale of 'x' to 'x2'. 我们将'data.frame'转换为'data.table'( setDT(d) ),并按'b'分组,将'x'的scale:= )分配给'x2'。

 setDT(d)[, x2 := scale(x) , by = y]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于其他值列的新列值数据框的R公式 - R formula for new column value dataframe based on other value column R 根据列值从其他 dataframe 添加行到 dataframe - R add rows to dataframe from other dataframe based on column value 根据其他列值在 R 数据框中定位一个值 - Locate a value in R dataframe based on other column values 如何根据其他两列的值向数据框中添加一列? 电阻 - How to add a column to a dataframe based on the value of two other columns? R 在R中,如何有效地基于小数据框中的行修改大数据框中的列 - In R, how to modify a column in a big dataframe based on rows in a small dataframe in an efficient way 用R中的1替换二进制列中的值的有效方法 - Efficient way to replace value in binary column with 1 in R R:基于字符串替换列值的有效方法(可能使用 case_when 或某种其他形式的 mutate)? - R: Efficient way to replace column values based on strings (maybe with case_when or some other form of mutate)? 如何用其他数据框列替换r的数据框列的值 - How to replace value of r's dataframe column with other dataframe column 根据其他数据框值[R]从一列数据框更改值 - Change value from one column Dataframe based on other dataframe value [R] 根据R中的其他列创建新的数据框列 - Create new dataframe column based on other column in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM