简体   繁体   English

R lapply 用于列表列表以将相同的函数应用于预定义的列

[英]R lapply for list of lists to apply the same function to pre-defined columns

I have a list of lists (2 df's) and want to use lapply to perform the same function for pre-defined columns in each df.我有一个列表列表(2 个 df),并希望使用lapply为每个 df 中的预定义列执行相同的功能。

In particular I would like to use the winsorize function from the DescTools package.我特别想使用DescTools包中的 winsorize 函数。 At the moment I know how to do this by specifying all individual columns within the function(x) command, which is, however, tedious if I have many columns (see example).目前我知道如何通过在 function(x) 命令中指定所有单独的列来做到这一点,但是,如果我有很多列,这很乏味(参见示例)。

After applying the function, the entire list of lists (all columns) with the transformed variables should be returned.应用该函数后,应返回带有转换变量的整个列表(所有列)列表。 Ideally, the transformed variables are extended with "_w" (eg "price_w") or similar to indicate that these are the winsorized variables.理想情况下,转换后的变量使用“_w”(例如“price_w”)或类似内容进行扩展,以表明这些是 winsorized 变量。

My data looks as follows, although I want to apply the function only to the pre-defined columns "price" and "quality".我的数据如下所示,尽管我只想将该函数应用于预定义的列“价格”和“质量”。

id <- c(1, 5, 7, 9, 12)
country <- c("A", "A", "C", "E", "E")
price <- c(2.1, 4.6, 3.7, 2.9, 1.8)
quality <- c(3.1, 5.2, 3.3, 1.7, 0.9)
df1 <- cbind.data.frame(id, country, price, quality)

id <- c(2, 3, 4, 10, 14)
country <- c("F", "F", "A", "Z", "X")
price <- c(1.8, 5.2, 2.9, 4.6, 3.9)
quality <- c(4.3, 2.5, 6.9, 1.9, 0.8)
df2 <- cbind.data.frame(id, country, price, quality)

my.list <- list(df1, df2)

cols <- c("price", "quality")

This is what I have so far, which would only work for a small number of columns due to the necessary manual changes:这是我到目前为止所拥有的,由于必要的手动更改,它仅适用于少数列:

my.list <- lapply(my.list, function(x) {
  x$price_w <- DescTools::Winsorize(x$price, probs = c(.01, .99), na.rm = TRUE)
  x$quality_w <- DescTools::Winsorize(x$quality, probs = c(.01, .99), na.rm = TRUE)
  return(x)
})

We can use nested lapply to apply the function for multiple columns in each list.我们可以使用嵌套的lapply将函数应用于每个列表中的多个列。

lapply(my.list, function(x) {
  x[paste0(cols, '_w')] <- lapply(x[cols], DescTools::Winsorize)
  x
})

#[[1]]
#  id country price quality price_w quality_w
#1  1       A   2.1     3.1    2.10      3.10
#2  5       A   4.6     5.2    4.42      4.82
#3  7       C   3.7     3.3    3.70      3.30
#4  9       E   2.9     1.7    2.90      1.70
#5 12       E   1.8     0.9    1.86      1.06

#[[2]]
#  id country price quality price_w quality_w
#1  2       F   1.8     4.3    2.02      4.30
#2  3       F   5.2     2.5    5.08      2.50
#3  4       A   2.9     6.9    2.90      6.38
#4 10       Z   4.6     1.9    4.60      1.90
#5 14       X   3.9     0.8    3.90      1.02

One purrr and dplyr option could be:一种purrrdplyr选项可能是:

map(.x = my.list,
    ~ .x %>%
     mutate(across(all_of(cols), 
                   list(w = ~ DescTools::Winsorize(., probs = c(.01, .99), na.rm = TRUE)))))

[[1]]
  id country price quality price_w quality_w
1  1       A   2.1     3.1   2.100     3.100
2  5       A   4.6     5.2   4.564     5.124
3  7       C   3.7     3.3   3.700     3.300
4  9       E   2.9     1.7   2.900     1.700
5 12       E   1.8     0.9   1.812     0.932

[[2]]
  id country price quality price_w quality_w
1  2       F   1.8     4.3   1.844     4.300
2  3       F   5.2     2.5   5.176     2.500
3  4       A   2.9     6.9   2.900     6.796
4 10       Z   4.6     1.9   4.600     1.900
5 14       X   3.9     0.8   3.900     0.844

here is a data.table solution这是一个data.table解决方案

library( data.table )
library( DescTools )
#make df1 and df2 a data.table
my.list <- lapply( my.list, setDT )
#rund function on columns
lapply( my.list, function(x) {
  x[, paste0( (cols), "_w" ) := DescTools::Winsorize( .SD, 
                                                      probs = c(0.1, 0.9), 
                                                      na.rm = TRUE ), .SDcols = cols]
})

# 
# [[1]]
#    id country price quality price_w quality_w
# 1:  1       A   2.1     3.1     2.1      3.10
# 2:  5       A   4.6     5.2     4.6      4.66
# 3:  7       C   3.7     3.3     3.7      3.30
# 4:  9       E   2.9     1.7     2.9      1.70
# 5: 12       E   1.8     0.9     1.8      1.62
# 
# [[2]]
#    id country price quality price_w quality_w
# 1:  2       F   1.8     4.3     1.8      4.30
# 2:  3       F   5.2     2.5     5.2      2.50
# 3:  4       A   2.9     6.9     2.9      5.37
# 4: 10       Z   4.6     1.9     4.6      1.90
# 5: 14       X   3.9     0.8     3.9      1.70

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM