使用R根据共同变量组合数据框中的行

Question

I am working with a few survey responses and the data returned has been formatted awkwardly. 我正在处理一些调查回复，并且返回的数据格式很尴尬。 Here is an example of what the data looks like: 这是数据的示例：

df <- data_frame(Person = c("Person1", "Person1","Person2", "Person2","Person3", "Person3"), Q1 = c(NA, 1, NA, 2, NA, 1), Q2 = c(NA, 3, NA, 2, NA, 4),
             Q3 = c(2, NA, 4, NA, 1, NA), Q4 = c(5, NA, 5, NA, 5, NA))

This is what I am starting with: 这就是我的开始：

Person     Q1    Q2    Q3    Q4
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 Person1    NA    NA     2     5
2 Person1     1     3    NA    NA
3 Person2    NA    NA     4     5
4 Person2     2     2    NA    NA
5 Person3    NA    NA     1     5
6 Person3     1     4    NA    NA

This is what I would like: 这就是我想要的：

Person     Q1    Q2    Q3    Q4
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 Person1     1     3     2     5
2 Person2     2     2     4     5
3 Person3     1     4     1     5

I would like to be able to accomplish this using dplyr but so far I have not had any luck. 我希望能够使用dplyr完成此操作，但到目前为止我还没有任何运气。

Answer 1

If we have only one non-NA element per each column per group 如果我们每组每一列只有一个非NA元素

library(dplyr)
df %>% 
   group_by(Person) %>%
   summarise_all(na.omit)
# A tibble: 3 x 5
#  Person     Q1    Q2    Q3    Q4
#  <chr>   <dbl> <dbl> <dbl> <dbl>
#1 Person1     1     3     2     5
#2 Person2     2     2     4     5
#3 Person3     1     4     1     5

We can also use min/max/sum/median/ etc 我们还可以使用min/max/sum/median/等

df  %>%
     group_by(Person) %>%
      summarise_all(mean, na.rm = TRUE)

Or 要么

df %>%
   group_by(Person) %>%
   summarise_all(min, na.rm = TRUE)

Or 要么

df %>%
   group_by(Person) %>%
   summarise_all(median, na.rm = TRUE)

Also, any of the functions that remove the NA and get the first non-NA element 此外，任何删除NA并获取第first非NA元素的函数

df %>%
    group_by(Person) %>%
    summarise_all(list(~.[!is.na(.)]))

If the non-NA elements are more than 1, then either paste in a string or have a list column 如果非NA元素大于1，则paste在字符串中或具有list列

df %>% 
    group_by(Person) %>%
    summarise_all(list(~ toString(.[!is.na(.)])))

Answer 2

You can get the first non-NA for each column within each group with coalesce . 您可以使用coalesce每个组中每个列的第一个非NA。 No real reason to prefer that over na.omit though unless you have >1 non-NA value. 除非您的非NA值大于1，否则没有真正的理由比na.omit 。

library(tidyverse)

df %>% 
  group_by(Person) %>% 
  summarise_all(reduce, coalesce)

# # A tibble: 3 x 5
#   Person     Q1    Q2    Q3    Q4
#   <chr>   <dbl> <dbl> <dbl> <dbl>
# 1 Person1     1     3     2     5
# 2 Person2     2     2     4     5
# 3 Person3     1     4     1     5

使用R根据共同变量组合数据框中的行

问题描述

2 个解决方案

解决方案1
1 2019-05-22 17:26:53

解决方案2
1 已采纳 2019-05-22 17:51:23

使用R根据共同变量组合数据框中的行

问题描述

2 个解决方案

解决方案1 1 2019-05-22 17:26:53

解决方案2 1 已采纳 2019-05-22 17:51:23

解决方案1
1 2019-05-22 17:26:53

解决方案2
1 已采纳 2019-05-22 17:51:23