简体   繁体   English

如何在使用Dplyr的Group_by和Summarise_at时使用na.rm = TRUE和n()

[英]How to Use na.rm=TRUE with n() While Using Dplyr's Group_by and Summarise_at

library(tidyverse) 

I'm stuck on something that should be so simple! 我坚持做一些应该这么简单的事情! Using the code below, all I want to do is group and summarise the three "Var" columns. 使用下面的代码,我想要做的就是对三个“Var”列进行分组和汇总。 I want counts and sums (so that I can create three percentage columns, so bonus if you can include an easy way to accomplish this in your answer). 我想要数和总和(这样我就可以创建三个百分比的列,如果你可以在你的答案中包含一个简单的方法来实现这一点,那么奖励)。 However, I don't want to include the NA's. 但是,我不想包括NA。 Removing the NA's from sum is easy enough by using "na.rm=TRUE", but I can't seem to figure out how to not include the NA's in the counts (using n() ) while using dplyr::summarise_at. 使用“na.rm = TRUE”可以很容易地从sum中删除NA,但我似乎无法弄清楚如何在使用dplyr :: summarise_at时不在计数中包含NA(使用n())。

Am I missing something very simple? 我错过了一些非常简单的事吗?

Df%>%group_by(Group)%>%summarise_at(vars(Var1:Var3),funs(n(),sum((.),na.rm=TRUE)))

Group<-c("House","Condo","House","House","House","House","House","Condo")
Var1<-c(0,1,1,NA,1,1,1,0)    
Var2<-c(1,1,1,1,0,1,1,1)
Var3<-c(1,1,1,NA,NA,1,1,0)

Df<-data.frame(Group,Var1,Var2,Var3)

I think your code was very close to getting the job done. 我认为你的代码非常接近于完成工作。 I made some slight changes and have included an example of how you might include the percent calculation in the same step (although I am not sure of your expected output). 我做了一些细微的更改,并且包含了一个示例,说明如何在同一步骤中包含百分比计算(尽管我不确定您的预期输出)。

library(dplyr)
Df %>% 
  group_by(Group) %>% 
  summarise_all(funs(count = sum(!is.na(.)), 
                     sum = sum(.,na.rm=TRUE),
                     pct = sum(.,na.rm=TRUE)/sum(!is.na(.))))

#> # A tibble: 2 x 10
#>    Group Var1_count Var2_count Var3_count Var1_sum Var2_sum Var3_sum
#>   <fctr>      <int>      <int>      <int>    <dbl>    <dbl>    <dbl>
#> 1  Condo          2          2          2        1        2        1
#> 2  House          5          6          4        4        5        4
#> # ... with 3 more variables: Var1_pct <dbl>, Var2_pct <dbl>,
#> #   Var3_pct <dbl>

I've also used summarise_all instead of summarise_at as summarise_all works on all the variables which aren't group variables. 我还使用了summarise_all而不是summarise_at因为summarise_all适用于所有非group变量的变量。

I think you just need to move your 'na.rm()' argument back in the parentheses. 我想你只需要在括号中移回'na.rm()'参数。 See below: 见下文:

Group<-c("House","Condo","House","House","House","House","House","Condo")
Var1<-c(0,1,1,NA,1,1,1,0)    
Var2<-c(1,1,1,1,0,1,1,1)
Var3<-c(1,1,1,NA,NA,1,1,0)

Df<-data.frame(Group,Var1,Var2,Var3)

out <- Df %>%
  group_by(Group) %>% 
  mutate_at(vars(Var1:Var3), funs(total = sum(!(is.na(.))), sum = sum(., na.rm = T))) %>% 
  ungroup()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM