简体   繁体   中英

Using lapply to create new variables based on multiple conditions and subsets

I am trying to create a list of new variables that represent the deviations from the minimum values of variables based on subsets of another variable.

Consider the following:

df <- data.frame(
      cluster = c("A","B","B","A","A","B"),
      x = c(3,4,1,5,2,6),
      y = c(4,5,3,1,2,6))

I would like to create two new variables, call them x.var and y.var, that take on the deviation from the minimum value of the respective underlying variables contingent on cluster. Thus, x.var and y.var would hopefully be:

x.var y.var
-1    -3
-3    -2
 0     0
-3     0
 0    -1
-5    -3

I have tried unsuccessfully to use lapply with an anonymous function to accomplish this:

vars <- lapply(df[,c(2:3)],function(x) 
    ifelse(df$cluster=="A",
    min(df[df$cluster=="A",x])-x,
    min(df[df$cluster=="B",x])-x))

I receive the following error:

Error in `[.data.frame`(df, df$cluster == "A", x) : undefined columns selected 

Any help would be much appreciated!

Here's an approach using dplyr .

library(dplyr)

df <- data.frame(
  cluster = c("A","B","B","A","A","B"),
  x = c(3,4,1,5,2,6),
  y = c(4,5,3,1,2,6))

df %>% 
  group_by(cluster) %>% 
  mutate(x.var = min(x) - x,
         y.var = min(y) - y)

#> # A tibble: 6 x 5
#> # Groups:   cluster [2]
#>   cluster     x     y x.var y.var
#>   <fct>   <dbl> <dbl> <dbl> <dbl>
#> 1 A           3     4    -1    -3
#> 2 B           4     5    -3    -2
#> 3 B           1     3     0     0
#> 4 A           5     1    -3     0
#> 5 A           2     2     0    -1
#> 6 B           6     6    -5    -3

Created on 2019-01-01 by the reprex package (v0.2.1)

Here is a base R method that uses ave with lapply . Loop through the columns of dataset excluding the 'cluster', then with ave get the min grouped by 'cluster', subtract from the column and assign the list of vector s to new columns

df[paste0(names(df)[-1], ".var")] <- lapply(df[-1], function(x)
                                         ave(x, df$cluster, FUN = min) - x)
df
#  cluster x y x.var y.var
#1       A 3 4    -1    -3
#2       B 4 5    -3    -2
#3       B 1 3     0     0
#4       A 5 1    -3     0
#5       A 2 2     0    -1
#6       B 6 6    -5    -3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM