简体   繁体   English

在 pivot_wider 之后删除 NA 以匹配行

[英]Remove NAs after pivot_wider to match up rows

I spread a column using pivot_wider so I could compare two groups (var1 vs var2) using an xy plot. But I can't compare them because there is a corresponding NA in the column.我使用pivot_wider展开一列,因此我可以使用 xy plot 比较两组(var1 与 var2)。但我无法比较它们,因为列中有相应的 NA。

Here is an example dataframe:这是一个示例 dataframe:

 df <- data.frame(group = c("a", "a", "b", "b", "c", "c"), var1 = c(3, NA, 1, NA, 2, NA), 
            var2 = c(NA, 2, NA, 4, NA, 8))

I would like it to look like:我希望它看起来像:

df2 <- data.frame(group = c("a", "b", "c"), var1 = c(3, 1, 2), 
            var2 = c( 2,  4, 8))

This solution is a bit more robust, with a slightly more general data.frame to begin with:这个解决方案有点更健壮,以稍微更通用的data.frame开始:

df <- data.frame(col_1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B"), 
                 col_2 = c(1, 3, NA, NA, NA, NA, 4, NA, NA),
                 col_3 = c(NA, NA, 2, 5, NA, NA, NA, 5, NA),
                 col_4 = c(NA, NA, NA, NA, 5, 6, NA, NA, 7))

df %>% dplyr::group_by(col_1) %>% 
  dplyr::summarise_all(purrr::discard, is.na)

You can use summarize.您可以使用汇总。 But this treats the symptom not the cause.但这只是治标不治本。 You may have a column in id_cols which is one-to-one with your variable in values_from .您可能在id_cols中有一个列,它与values_from中的变量是一对一的。

library(dplyr)

df %>%
  group_by(group) %>%
  summarize_all(sum, na.rm = T)

# A tibble: 3 x 3
  group  var1  var2
  <fct> <dbl> <dbl>
1 a         3     2
2 b         1     4
3 c         2     8

Here is a way to do it, assuming you only have two rows by group and one row with NA这是一种方法,假设您只有两行按组和一行有 NA

library(dplyr)
df %>% group_by(group) %>% 
       summarise(var1=max(var1,na.rm=TRUE),
                 var2=max(var2,na.rm=TRUE))

The na.rm=TRUE will not count the NAs and get the max on only one value (the one which is not NA) na.rm=TRUE将不计算 NA 并仅在一个值上获得最大值(不是 NA 的那个)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM