I am trying to create a list of new variables that represent the deviations from the minimum values of variables based on subsets of another variable.
Consider the following:
df <- data.frame(
cluster = c("A","B","B","A","A","B"),
x = c(3,4,1,5,2,6),
y = c(4,5,3,1,2,6))
I would like to create two new variables, call them x.var and y.var, that take on the deviation from the minimum value of the respective underlying variables contingent on cluster. Thus, x.var and y.var would hopefully be:
x.var y.var
-1 -3
-3 -2
0 0
-3 0
0 -1
-5 -3
I have tried unsuccessfully to use lapply with an anonymous function to accomplish this:
vars <- lapply(df[,c(2:3)],function(x)
ifelse(df$cluster=="A",
min(df[df$cluster=="A",x])-x,
min(df[df$cluster=="B",x])-x))
I receive the following error:
Error in `[.data.frame`(df, df$cluster == "A", x) : undefined columns selected
Any help would be much appreciated!
Here's an approach using dplyr
.
library(dplyr)
df <- data.frame(
cluster = c("A","B","B","A","A","B"),
x = c(3,4,1,5,2,6),
y = c(4,5,3,1,2,6))
df %>%
group_by(cluster) %>%
mutate(x.var = min(x) - x,
y.var = min(y) - y)
#> # A tibble: 6 x 5
#> # Groups: cluster [2]
#> cluster x y x.var y.var
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 A 3 4 -1 -3
#> 2 B 4 5 -3 -2
#> 3 B 1 3 0 0
#> 4 A 5 1 -3 0
#> 5 A 2 2 0 -1
#> 6 B 6 6 -5 -3
Created on 2019-01-01 by the reprex package (v0.2.1)
Here is a base R
method that uses ave
with lapply
. Loop through the columns of dataset excluding the 'cluster', then with ave
get the min
grouped by 'cluster', subtract from the column and assign the list
of vector
s to new columns
df[paste0(names(df)[-1], ".var")] <- lapply(df[-1], function(x)
ave(x, df$cluster, FUN = min) - x)
df
# cluster x y x.var y.var
#1 A 3 4 -1 -3
#2 B 4 5 -3 -2
#3 B 1 3 0 0
#4 A 5 1 -3 0
#5 A 2 2 0 -1
#6 B 6 6 -5 -3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.