简体   繁体   English

消除R中具有一定数量的非NA值的类别

[英]eliminating categories with a certain number of non-NA values in R

I have a data frame df which looks like this 我有一个看起来像这样的数据框df

  > g <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6)
> m <- c(1, NA, NA, NA, 3, NA, 2, 1, 3, NA, 3, NA, NA, 4, NA, NA, NA, 2, 1, NA, 7, 3, NA, 1)
> df <- data.frame(g, m)

where g is the category (1 to 6) and m are values in that category. 其中g是类别(1到6),m是该类别中的值。 I've managed to find the amount of none NA values per category by : 我设法通过以下方法找到每个类别的无NA值:

  aggregate(m ~ g, data=df, function(x) {sum(!is.na(x))}, na.action = NULL)
  g m
1 1 1
2 2 3
3 3 2
4 4 1
5 5 2
6 6 3

and would now like to eliminate the rows (categories) where the number of None-NA is 1 and only keep those where the number of NA is 2 and above. 并且现在想要消除None-NA的数目为1的行(类别),而仅保留NA的数目为2以上的行(类别)。

the desired outcome would be 理想的结果是

   g  m
5  2  3
6  2 NA
7  2  2
8  2  1
9  3  3
10 3 NA
11 3  3
12 3 NA
17 5 NA
18 5  2
19 5  1
20 5 NA
21 6  7
22 6  3
23 6 NA
24 6  1

every g=1 and g=4 is eliminated because as shown there is only 1 none-NA in each of those categories 每个g = 1和g = 4都被消除,因为如图所示,每个类别中只有1个不适用

any suggestions :)? 有什么建议么 :)?

One can try a dplyr based solution. 可以尝试基于dplyr的解决方案。 group_by on g will help to get the desired count. g group_by将有助于获得所需的计数。

library(dplyr)

df %>% group_by(g) %>%
  filter(!is.na(m)) %>%
  filter(n() >=2) %>%
  summarise(count = n())

#Result
# # A tibble: 6 x 2
#     g   count
#    <dbl> <int>
# 1  2.00     3
# 2  3.00     2
# 3  5.00     2
# 4  6.00     3

If you want base R, then I suggest you use your aggregation: 如果您想要基数R,那么我建议您使用聚合:

df2 <- aggregate(m ~ g, data=df, function(x) {sum(!is.na(x))}, na.action = NULL)
df[ ! df$g %in% df2$g[df2$m < 2], ]
#    g  m
# 5  2  3
# 6  2 NA
# 7  2  2
# 8  2  1
# 9  3  3
# 10 3 NA
# 11 3  3
# 12 3 NA
# 17 5 NA
# 18 5  2
# 19 5  1
# 20 5 NA
# 21 6  7
# 22 6  3
# 23 6 NA
# 24 6  1

If you want to use dplyr , perhaps 如果您想使用dplyr ,也许

library(dplyr)
group_by(df, g) %>%
  filter(sum(!is.na(m)) > 1) %>%
  ungroup()
# # A tibble: 16 × 2
#        g     m
#    <dbl> <dbl>
# 1      2     3
# 2      2    NA
# 3      2     2
# 4      2     1
# 5      3     3
# 6      3    NA
# 7      3     3
# 8      3    NA
# 9      5    NA
# 10     5     2
# 11     5     1
# 12     5    NA
# 13     6     7
# 14     6     3
# 15     6    NA
# 16     6     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM