I need to aggregate multiple variables of different classes at once.
test<- data.frame (name = c("anna", "joe", "anna"),
party = c("red", "blue", "red"),
text = c("hey there", "we ate an apple", "i took a walk"),
numberofwords = c(2, 4, 4),
score1 = 1:3,
score2 = 4:6)
It looks like this now
# name party text numberofwords score1 score2
#1 anna red hey there 2 1 4
#2 joe blue we ate an apple 4 2 5
#3 anna red i took a walk 4 3 6
I want to aggregate the score1, score2, numberofwords, text variables based on name and party.
The desired result is:
# name party text numberofwords score1 score2
#1 anna red hey there i took a walk 6 4 10
#2 joe blue we ate an apple 4 2 5
Using recent versions of dplyr
with across
:
test %>%
group_by(name, party) %>%
summarize(
across(text, paste, collapse = " "),
across(where(is.numeric), sum)
)
# # A tibble: 2 x 6
# name party text numberofwords score1 score2
# <chr> <chr> <chr> <dbl> <int> <int>
# 1 anna red hey there i took a walk 6 4 10
# 2 joe blue we ate an apple 4 2 5
Old version, keeping first party
value:
test %>%
group_by(name) %>%
summarize(
across(party, first),
across(text, paste, collapse = " "),
across(where(is.numeric), sum)
)
# # A tibble: 2 x 6
# name party text numberofwords score1 score2
# <chr> <chr> <chr> <dbl> <int> <int>
# 1 anna red hey there i took a walk 6 4 10
# 2 joe blue we ate an apple 4 2 5
We can do a conditional summarise
based on class of each column in dplyr
.
library(dplyr)
test %>%
mutate_at("text", as.character) %>%
group_by(name) %>%
summarise_all(list(~if(is.numeric(.)) sum(., na.rm = TRUE)
else if(is.factor(.)) first(.)
else paste(., collapse = " ")))
#> # A tibble: 2 x 6
#> name party text numberofwords score1 score2
#> <fct> <fct> <chr> <dbl> <int> <int>
#> 1 anna red hey there i took a walk 6 4 10
#> 2 joe blue we ate an apple 4 2 5
In base R
, we can do this with aggregate
and merge
out1 <- aggregate(cbind(numberofwords, score1, score2) ~ name + party, test, sum)
out2 <- aggregate(text ~ name + party, test, paste, collapse=' ')
merge(out1, out2)
-output
# name party numberofwords score1 score2 text
#1 anna red 6 4 10 hey there i took a walk
#2 joe blue 4 2 5 we ate an apple
Try this approach aggregating first the text variable, then the continuous variables. After that merge all. Here the code using dplyr
:
library(dplyr)
#Data
test<- data.frame (name = c("anna", "joe", "anna"),
party =c("red", "blue", "red"),
text = c("hey there", "we ate an apple", "i took a walk"),
numberofwords = c(2,4,4),
score1 = 1:3, score2= 4:6,stringsAsFactors = F)
#First aggregate text after that aggregate continuous variables and merge
new <- test %>%
group_by(name,party) %>% summarise(text=paste0(text,collapse = ' ')) %>%
left_join(
test %>% select(-text) %>%
group_by(name,party) %>%
summarise_all(sum,na.rm=T)
)
Output:
# A tibble: 2 x 6
# Groups: name [2]
name party text numberofwords score1 score2
<chr> <chr> <chr> <dbl> <int> <int>
1 anna red hey there i took a walk 6 4 10
2 joe blue we ate an apple 4 2 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.