[英]Efficient way of scaling column based on value in other column in R dataframe
I want to scale values in the column of a dataframe
based on values in another colum. 我想基于另一个列中的值缩放
dataframe
的列中的值。 For example, here is a simple example 例如,这是一个简单的例子
d<-data.frame(x=runif(5,0,10),y=sample(c(1,2),size=5,replace=TRUE))
gives the output: 给出输出:
x y
1 1.0895865 2
2 0.8261554 2
3 5.3503761 2
4 3.3940759 1
5 6.2786637 1
I want to scale the x values based on the y values, so what I want is to have: 我想基于y值缩放x值,所以我想要的是:
(x|y=1 - average(x's | y=1))/std.dev(x's|y=1)
then replace the x values in d with the scaled values, similarly for the x
values with y=2
. 然后将d中的x值替换为缩放后的值,类似
y=2
的x
值。
What I have done so far is a bit clunky: 到目前为止,我所做的有些笨拙:
d1<-subset(d,y==1)
d2<-subset(d,y==2)
d1$x<-(d1$x-mean(d1$x))/sd(d1$x)
d2$x<-(d2$x-mean(d2$x))/sd(d2$x)
and then binding all the results in one big data frame, but this is a bit tedious since my actual data has 50 different values for y and I'd like to do this for multiple (different) columns. 然后将所有结果绑定到一个大数据框中,但这有点乏味,因为我的实际数据对y有50个不同的值,而我想对多个(不同)列进行此操作。
You can easily do this using group_by
and mutate
from the dplyr
package: 您可以使用
group_by
并通过dplyr
软件包中的mutate
轻松完成此操作:
require(dplyr)
d %>%
group_by(y) %>%
mutate(x = (x - mean(x)) / sd(x))
This task is usually performy by group by
in dplyr
and using scale
此任务通常在dplyr中按
group by
dplyr
scale
library(dplyr)
d %>% group_by(y) %>% mutate(x2=scale(x))
We can use data.table
. 我们可以使用
data.table
。 We convert the 'data.frame' to 'data.table' ( setDT(d)
), grouped by 'b', assign ( :=
) the scale
of 'x' to 'x2'. 我们将'data.frame'转换为'data.table'(
setDT(d)
),并按'b'分组,将'x'的scale
( :=
)分配给'x2'。
setDT(d)[, x2 := scale(x) , by = y]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.