在 lapply() 中使用 R 应用 Magrittr 管道

Question

I would like to find a way to implement a series of piped functions through an lapply statement and generate multiple databases as a result.我想找到一种方法，通过 lapply 语句实现一系列管道函数，并因此生成多个数据库。 Here is a sample data set:这是一个示例数据集：

# the data
d <- tibble(
  categorical = c("a", "d", "b", "c", "a", "b", "d", "c"),
  var_1 = c(0, 0, 1, 1, 1, 0, 1, 0),
  var_2 = c(0, 1, 0, 0, 0, 0 ,1, 1),
  var_3 = c(0, 0, 1, 1, 1, 1, 1, 1),
  var_4 = c(0, 1, 0, 1, 0, 0, 0, 0)
)

Here is the outcome I want:这是我想要的结果：

$var_1
a  b  c  d
1  1  1  1

$var_2
a  b  c  d
0  0  1  2

$var_3
a  b  c  d
1  2  2  1

$var_4
a  b  c  d
0  0  1  1

I can recreate each list element individually with ease.我可以轻松地单独重新创建每个列表元素。 Here is my sample code with dplyr:这是我使用 dplyr 的示例代码：

d %>%
  filter(var_1 == 1) %>%
  group_by(categorical, var_1) %>%
  summarise(n = n()) %>%
  select(-var_1) %>%
  rename("var_1" = "n") %>%
  ungroup() %>%
  spread(categorical, var_1)

# A tibble: 1 x 4
      a     b     c     d
  <int> <int> <int> <int>
1     1     1     1     1

But, I want to automate the process across all columns and create an object that contains each row of information as a list.但是，我想在所有列中自动执行该过程，并创建一个包含每行信息作为列表的对象。

Here is where I started:这是我开始的地方：

lapply(d[,2:5], function (x) d %>%
  filter(x == 1) %>%
  group_by(categorical, x) %>%
  summarise(n = n()) %>%
  select(-x) %>%
  rename("x" = "n") %>%
  ungroup() %>%
  spread(categorical, x))

Any help would be much appreciated!任何帮助将非常感激！

Answer 1

We can gather into 'long' format, then do a group_split and spread it back after getting the sum of 'val' grouped by 'categorical'我们可以gather成'long'格式，然后在得到按'categorical'分组的'val'的sum后进行group_split并将其spread

library(tidyverse)
gather(d, key, val, -categorical) %>%
     split(.$key) %>%
     map(~ .x %>% 
           group_by(categorical) %>%
           summarise(val = sum(val)) %>%
           spread(categorical, val))
#$var_1
# A tibble: 1 x 4
#      a     b     c     d
#  <dbl> <dbl> <dbl> <dbl>
#1     1     1     1     1

#$var_2
# A tibble: 1 x 4
#      a     b     c     d
#  <dbl> <dbl> <dbl> <dbl>
#1     0     0     1     2

#$var_3
# A tibble: 1 x 4
#      a     b     c     d
#  <dbl> <dbl> <dbl> <dbl>
#1     1     2     2     1

#$var_4
# A tibble: 1 x 4
#      a     b     c     d
#  <dbl> <dbl> <dbl> <dbl>
#1     0     0     1     1

Or another option is to loop through the columns except the first one, and then do the group_by sum and spread to 'wide' format或者另一种选择是循环遍历除第一列之外的列，然后进行group_by sum并spread为“wide”格式

map(names(d)[-1], ~ 
          d %>%
           group_by(categorical) %>% 
           summarise(n = sum(!! rlang::sym(.x))) %>% 
           spread(categorical, n))

Answer 2

here is an option using data.table::transpose() :这是一个使用data.table::transpose()的选项：

aggregate(. ~ categorical, d, sum) %>%
  data.table::transpose(make.names = "categorical") %>%
  split(names(d)[-1])
#> $var_1
#>   a b c d
#> 1 1 1 1 1
#> 
#> $var_2
#>   a b c d
#> 2 0 0 1 2
#> 
#> $var_3
#>   a b c d
#> 3 1 2 2 1
#> 
#> $var_4
#>   a b c d
#> 4 0 0 1 1

^{Created on 2019-11-04 by the reprex package (v0.3.0)}^{由reprex 包(v0.3.0) 于 2019 年 11 月 4 日创建}

在 lapply() 中使用 R 应用 Magrittr 管道

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-08-21 23:05:09

解决方案2
0 2019-11-04 15:20:04

在 lapply() 中使用 R 应用 Magrittr 管道

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-08-21 23:05:09

解决方案2 0 2019-11-04 15:20:04

解决方案1
1 已采纳 2019-08-21 23:05:09

解决方案2
0 2019-11-04 15:20:04