简体   繁体   中英

Filter based on NA in dplyr

This is my df

df <- structure(structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), y = c(NA, NA, NA, NA, 1, NA, NA, NA, 1, 2, NA, NA, 1, 2, 3, NA, 2, 2, 3, 4, NA, 3, 3, 4, 5), x = c(1L, 2L, 3L, 4L,5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("group", "y", "x"), row.names = c(NA, 25L), class = "data.frame"))

> df
   group  y x
1      A NA 1
2      A NA 2
3      A NA 3
4      A NA 4
5      A  1 5
6      B NA 1
7      B NA 2
8      B NA 3
9      B  1 4
10     B  2 5
11     C NA 1
12     C NA 2
13     C  1 3
14     C  2 4
15     C  3 5
16     D NA 1
17     D  2 2
18     D  2 3
19     D  3 4
20     D  4 5
21     E NA 1
22     E  3 2
23     E  3 3
24     E  4 4
25     E  5 5

My goal is to calculate the mean per x value (across groups), using mutate . But first I'd like to filter the data, such that only those values of x remain for which there are at least 3 non-NA values. So in this example I only want to include those entries for which x is at least 3. I can't figure out how to create the filter() , any suggestions?

You could try

df %>% 
   group_by(group) %>% #group_by(x) %>% #as per the OP's clarification
   filter(sum(!is.na(y))>=3) %>% 
   mutate(Mean=mean(x, na.rm=TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM