简体   繁体   English

r:汇总group_by之后的rowSums

[英]r: Summarise for rowSums after group_by

I've tried searching a number of posts on SO but I'm not sure what I'm doing wrong here, and I imagine the solution is quite simple. 我尝试搜索有关SO的大量帖子,但不确定在这里做错了什么,我想解决方案非常简单。 I'm trying to group a dataframe by one variable and figure the mean of several variables within that group. 我正在尝试按一个变量对数据框进行分组,并计算出该组中几个变量的平均值。

Here is what I am trying: 这是我正在尝试的:

head(airquality)
target_vars = c("Ozone","Temp","Solar.R")
airquality %>% group_by(Month) %>% select(target_vars) %>% summarise(rowSums(.))

But I get the error that my lenghts don't match. 但是我得到了我的长度不匹配的错误。 I've tried variations using mutate to create the column or summarise_all , but neither of these seem to work. 我已经尝试过使用mutate创建列或summarise_all ,但这些似乎都不起作用。 I need the row sums within group, and then to compute the mean within group (yes, it's nonsensical here). 我需要组内的行总和,然后计算组内的平均值(是的,在这里是无意义的)。

Also, I want to use select because I'm trying to do this over just certain variables. 另外,我想使用select因为我试图仅对某些变量执行此操作。

I'm sure this could be a duplicate, but I can't find the right one. 我确定这可能是重复的,但是我找不到正确的副本。

EDIT FOR CLARITY Sorry, my original question was not clear. 编辑CLARITY对不起,我原来的问题不明确。 Imagine the grouping variable is the calendar month, and we have v1 , v2 , and v3 . 想象一下,分组变量是日历月,我们有v1v2v3 I'd like to know, within month , what was the average of the sums of v1 , v2 , and v3 . 我想知道month之内v1v2v3之和的平均值是多少。 So if we have 12 months, the result would be a 12x1 dataframe. 因此,如果我们有12个月,则结果将是12x1数据帧。 Here is an example if we just had 1 month: 这是一个只有1个月的示例:

Month v1 v2 v3 Sum 
1      1  1  0   2
1      1  1  1   3
1      1  0  0   3

Then the result would be: 那么结果将是:

Month  Average
1           8/3

This seems to deliver what you want. 这似乎可以提供您想要的东西。 It's regular R. The sapply function keeps the months separated by "name". 它是常规sapply函数将月份分隔为“名称”。 The sum function applied to each dataframe will not keep the column sums separate. 应用于每个数据帧的sum函数不会使列的总和保持独立。 (Correction # 2: used only target_vars): (更正#2:仅使用target_vars):

sapply( split( airquality[target_vars], airquality$Month), sum, na.rm=TRUE)
    5     6     7     8     9 
 7541  8343 10849  8974  8242 

If you wanted the per number of variable results, then you would divide by the number of variables: 如果要按数量计算变量结果,则将除以变量数:

sapply( split( airquality[target_vars], airquality$Month), sum, na.rm=TRUE)/
                                                           (length(target_vars))
       5        6        7        8        9 
2513.667 2781.000 3616.333 2991.333 2747.333 

You can try: 你可以试试:

library(tidyverse)
airquality %>% 
  select(Month, target_vars) %>% 
  gather(key, value, -Month) %>% 
  group_by(Month) %>%
  summarise(n=length(unique(key)),
            Sum=sum(value, na.rm = T)) %>% 
  mutate(Average=Sum/n)
# A tibble: 5 x 4
  Month     n   Sum  Average
  <int> <int> <int>    <dbl>
1     5     3  7541 2513.667
2     6     3  8343 2781.000
3     7     3 10849 3616.333
4     8     3  8974 2991.333
5     9     3  8242 2747.333

The idea is to convert the data from wide to long using tidyr::gather() , then group by Month and calculate the sum and the average. 我们的想法是将数据从广转换为长期使用tidyr::gather()然后按Month和计算之和的平均值。

Perhaps this is what you're looking for 也许这就是您要寻找的

library(dplyr)
library(purrr)
library(tidyr)   # forgot this in original post
airquality %>%
  group_by(Month) %>% 
  nest(Ozone, Temp, Solar.R, .key=newcol) %>%
  mutate(newcol = map_dbl(newcol, ~mean(rowSums(.x, na.rm=TRUE))))

# A tibble: 5 x 2
  # Month   newcol
  # <int>    <dbl>
# 1     5 243.2581
# 2     6 278.1000
# 3     7 349.9677
# 4     8 289.4839
# 5     9 274.7333  

I've never encountered a situation where all the answers disagreed. 我从未遇到过所有答案都不同的情况。 Here's some validation (at least I think) for the 5th month 这是第5个月的验证(至少我认为)

airquality %>%
  filter(Month == 5) %>%
  select(Ozone, Temp, Solar.R) %>%
  mutate(newcol = rowSums(., na.rm=TRUE)) %>%
  summarise(sum5 = sum(newcol), mean5 = mean(newcol))

#   sum5    mean5
# 1 7541 243.2581

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM