在 dplyr package 中使用汇总和交叉，同时区分数字和非数字列

Question

I would like to perform some operations using dplyr on a dataset that looks like:我想在如下所示的数据集上使用dplyr执行一些操作：

data <- data.frame(day = c(rep(1, 15), rep(2, 15)), nweek = rep(rep(1:5, 3),2), 
                   firm = rep(sapply(letters[1:3], function(x) rep(x, 5)), 2), 
                   quant = rnorm(30), price = runif(30) )

where each observation is at the day, week and firm level (there're only 2 days in a week).每个观察都在日、周和公司级别（一周只有 2 天）。

I would like to summarise the data (grouping by firm ) by (1) taking average across the days of the week across variables that are numeric (ie, quant and price ), and to take the first entry for variables that are not numeric (in this case it is only firm , but in my real dataset I have multiple variables that are not numeric ( Date and character ) and they may change within a week ( nweek ), so I would like to take only the entry in the first day of the week for all the non-numeric variables.我想通过 (1) 对numeric变量（即quant和price ） across一周中的几天取平均值来总结数据（按firm分组），并为非数字变量取第一个条目（在这种情况下，它只是firm的，但在我的真实数据集中，我有多个不是数字的变量（ Date和character ），它们可能会在一周内发生变化（ nweek ），所以我只想在第一天输入所有非数字变量的一周。

I tried using summarise and across but get an error我尝试使用summarise和across但得到一个错误

> data %>% group_by(firm, nweek) %>% dplyr::summarise(across(which(sapply(data, is.numeric)), ~ mean(.x, na.rm = TRUE)),
+                           across(which(sapply(data, !(is.numeric))), ~ head(.x, 1))
+ )
Error: Problem with `summarise()` input `..2`.
x invalid argument type
ℹ Input `..2` is `across(which(sapply(data, !(is.numeric))), ~head(.x, 1))`.
Run `rlang::last_error()` to see where the error occurred.

Any help?有什么帮助吗？

Answer 1

I don't know what your expected output should look like, but something like this could reach what you are trying to achieve我不知道您期望的 output 应该是什么样子，但是这样的事情可能会达到您想要实现的目标

data %>%
  group_by(firm, nweek) %>% 
  summarise(
    across(where(is.numeric), ~ mean(.x, na.rm = TRUE)),
    across(!where(is.numeric), ~ head(.x, 1))
)

As a sidenote, instead of using which(sapply(...)) , have a look at the where helper for conditional selection of variables inside across in this post .作为旁注，不要使用which(sapply(...)) ，而是查看这篇文章中用于条件选择变量across where助手。

Output Output

# A tibble: 15 x 5
# Groups:   firm [3]
   firm  nweek   day   quant price
   <chr> <int> <dbl>   <dbl> <dbl>
 1 a         1   1.5 -0.336  0.903
 2 a         2   1.5  0.0837 0.579
 3 a         3   1.5  0.0541 0.425
 4 a         4   1.5  1.21   0.555
 5 a         5   1.5  0.462  0.806
 6 b         1   1.5  0.0493 0.346
 7 b         2   1.5  0.635  0.596
 8 b         3   1.5  0.406  0.583
 9 b         4   1.5 -0.707  0.205
10 b         5   1.5  0.157  0.816
11 c         1   1.5  0.728  0.271
12 c         2   1.5  0.117  0.775
13 c         3   1.5 -1.05   0.234
14 c         4   1.5 -1.35   0.290
15 c         5   1.5  0.771  0.310

在 dplyr package 中使用汇总和交叉，同时区分数字和非数字列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-30 13:29:55

在 dplyr package 中使用汇总和交叉，同时区分数字和非数字列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-30 13:29:55

解决方案1
1 已采纳 2020-06-30 13:29:55