[英]Using R to combine rows in a data frame based on a common variable
I am working with a few survey responses and the data returned has been formatted awkwardly. 我正在处理一些调查回复,并且返回的数据格式很尴尬。 Here is an example of what the data looks like:
这是数据的示例:
df <- data_frame(Person = c("Person1", "Person1","Person2", "Person2","Person3", "Person3"), Q1 = c(NA, 1, NA, 2, NA, 1), Q2 = c(NA, 3, NA, 2, NA, 4),
Q3 = c(2, NA, 4, NA, 1, NA), Q4 = c(5, NA, 5, NA, 5, NA))
This is what I am starting with: 这就是我的开始:
Person Q1 Q2 Q3 Q4
<chr> <dbl> <dbl> <dbl> <dbl>
1 Person1 NA NA 2 5
2 Person1 1 3 NA NA
3 Person2 NA NA 4 5
4 Person2 2 2 NA NA
5 Person3 NA NA 1 5
6 Person3 1 4 NA NA
This is what I would like: 这就是我想要的:
Person Q1 Q2 Q3 Q4
<chr> <dbl> <dbl> <dbl> <dbl>
1 Person1 1 3 2 5
2 Person2 2 2 4 5
3 Person3 1 4 1 5
I would like to be able to accomplish this using dplyr but so far I have not had any luck. 我希望能够使用dplyr完成此操作,但到目前为止我还没有任何运气。
If we have only one non-NA element per each column per group 如果我们每组每一列只有一个非NA元素
library(dplyr)
df %>%
group_by(Person) %>%
summarise_all(na.omit)
# A tibble: 3 x 5
# Person Q1 Q2 Q3 Q4
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Person1 1 3 2 5
#2 Person2 2 2 4 5
#3 Person3 1 4 1 5
We can also use min/max/sum/median/
etc 我们还可以使用
min/max/sum/median/
等
df %>%
group_by(Person) %>%
summarise_all(mean, na.rm = TRUE)
Or 要么
df %>%
group_by(Person) %>%
summarise_all(min, na.rm = TRUE)
Or 要么
df %>%
group_by(Person) %>%
summarise_all(median, na.rm = TRUE)
Also, any of the functions that remove the NA
and get the first
non-NA element 此外,任何删除
NA
并获取第first
非NA元素的函数
df %>%
group_by(Person) %>%
summarise_all(list(~.[!is.na(.)]))
If the non-NA elements are more than 1, then either paste
in a string or have a list
column 如果非NA元素大于1,则
paste
在字符串中或具有list
列
df %>%
group_by(Person) %>%
summarise_all(list(~ toString(.[!is.na(.)])))
You can get the first non-NA for each column within each group with coalesce
. 您可以使用
coalesce
每个组中每个列的第一个非NA。 No real reason to prefer that over na.omit
though unless you have >1 non-NA value. 除非您的非NA值大于1,否则没有真正的理由比
na.omit
。
library(tidyverse)
df %>%
group_by(Person) %>%
summarise_all(reduce, coalesce)
# # A tibble: 3 x 5
# Person Q1 Q2 Q3 Q4
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Person1 1 3 2 5
# 2 Person2 2 2 4 5
# 3 Person3 1 4 1 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.