I got a Data Frame:
a <- c('A','A','B','B','A')
b <- c(1,1,1,1,2)
c <- c(NA,60,NA,100,NA)
d <- c(10,NA,10,NA,100)
frame <- data.frame(a,b,c,d)
> frame
a b c d
1 A 1 NA 10
2 A 1 60 NA
3 B 1 NA 10
4 B 1 100 NA
5 A 2 NA 100
And I want to aggregate it by a and b
>frame2
a b c d
1 A 1 60 10
3 B 1 100 10
5 A 2 NA 100
I tried several things like aggregat() and group from dplyr but somehow it never works. I guess the NA is a problem.
With aggregate
, we may need to use na.action
aggregate(.~ a + b, frame, sum, na.rm = TRUE, na.action = 'na.pass')
# a b c d
#1 A 1 60 10
#2 B 1 100 10
#3 A 2 0 100
If we intend to subset the rows
library(dplyr)
frame %>%
group_by(a, b) %>%
mutate_at(vars(-group_cols()), ~ .[order(is.na(.))]) %>%
slice(1)
# A tibble: 3 x 4
# Groups: a, b [3]
# a b c d
# <fct> <dbl> <dbl> <dbl>
#1 A 1 60 10
#2 A 2 NA 100
#3 B 1 100 10
Using data.table
and hablar::sum_
:
library(data.table)
setDT(frame)[,.(c = as.numeric(hablar::sum_(c)),
d = as.numeric(hablar::sum_(d))), .(a,b)]
#> a b c d
#> 1: A 1 60 10
#> 2: B 1 100 10
#> 3: A 2 NA 100
Or in base
we can define our own function and use it with aggregate
as illustrated in their answer:使用,如他们的回答中所示的 :
sum__ <- function(x){if(all(is.na(x))) NA_real_ else sum(x, na.rm=T)}
aggregate(.~ a + b, frame, sum__, na.action = 'na.pass')
In addition to the use of aggreate()
by @akrun , you can use the following code as well to make it:
aggregate(frame[-(1:2)], frame[1:2], sum, na.rm = TRUE)
such that
> aggregate(frame[-(1:2)], frame[1:2], sum, na.rm = TRUE)
a b c d
1 A 1 60 10
2 B 1 100 10
3 A 2 0 100
Using dplyr
and tidyr
, you can reshape the data into a long format, filter the NA rows, then reshape back to wide. This basically combines the c and d values, and retains the NA
you have in column c.
library(dplyr)
library(tidyr)
frame %>%
pivot_longer(c:d) %>%
filter(!is.na(value)) %>%
arrange(name) %>%
pivot_wider(names_from = name)
#> # A tibble: 3 x 4
#> a b c d
#> <fct> <dbl> <dbl> <dbl>
#> 1 A 1 60 10
#> 2 B 1 100 10
#> 3 A 2 NA 100
A minor annoyance IMO is that unlike the previous tidyr::spread
, tidyr::pivot_wider
keeps the order of your data; if you don't call arrange
, you'll get column d, then c, because of the order of observations after filtering.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.