简体   繁体   中英

Aggregate numerical and character at once

I need to aggregate multiple variables of different classes at once.

test<- data.frame (name = c("anna", "joe", "anna"), 
                   party = c("red", "blue", "red"),
                   text = c("hey there", "we ate an apple", "i took a walk"), 
                   numberofwords = c(2, 4, 4), 
                   score1 = 1:3, 
                   score2 = 4:6)

It looks like this now

#   name    party      text           numberofwords score1 score2
#1  anna    red       hey there             2         1      4
#2  joe     blue    we ate an apple         4         2      5
#3  anna    red      i took a walk          4         3      6

I want to aggregate the score1, score2, numberofwords, text variables based on name and party.

The desired result is:

#   name  party            text                  numberofwords score1 score2
#1  anna  red           hey there i took a walk       6           4      10
#2   joe  blue           we ate an apple              4           2      5

Using recent versions of dplyr with across :

test %>%
  group_by(name, party) %>%
  summarize(
    across(text, paste, collapse = " "),
    across(where(is.numeric), sum)
  )
# # A tibble: 2 x 6
#   name  party text                    numberofwords score1 score2
#   <chr> <chr> <chr>                           <dbl>  <int>  <int>
# 1 anna  red   hey there i took a walk             6      4     10
# 2 joe   blue  we ate an apple                     4      2      5   

Old version, keeping first party value:

test %>%
  group_by(name) %>%
  summarize(
    across(party, first),
    across(text, paste, collapse = " "),
    across(where(is.numeric), sum)
  )
# # A tibble: 2 x 6
#   name  party text                    numberofwords score1 score2
#   <chr> <chr> <chr>                           <dbl>  <int>  <int>
# 1 anna  red   hey there i took a walk             6      4     10
# 2 joe   blue  we ate an apple                     4      2      5   

We can do a conditional summarise based on class of each column in dplyr .

library(dplyr)

test %>% 
  mutate_at("text", as.character) %>% 
  group_by(name) %>% 
  summarise_all(list(~if(is.numeric(.)) sum(., na.rm = TRUE)  
                      else if(is.factor(.)) first(.) 
                      else paste(., collapse = " ")))

#> # A tibble: 2 x 6
#>   name  party text                    numberofwords score1 score2
#>   <fct> <fct> <chr>                           <dbl>  <int>  <int>
#> 1 anna  red   hey there i took a walk             6      4     10
#> 2 joe   blue  we ate an apple                     4      2      5

In base R , we can do this with aggregate and merge

out1 <- aggregate(cbind(numberofwords, score1, score2) ~ name + party, test, sum)
out2 <- aggregate(text ~ name + party, test, paste, collapse=' ')
merge(out1, out2)

-output

# name party numberofwords score1 score2                    text
#1 anna   red             6      4     10 hey there i took a walk
#2  joe  blue             4      2      5         we ate an apple

Try this approach aggregating first the text variable, then the continuous variables. After that merge all. Here the code using dplyr :

library(dplyr)
#Data
test<- data.frame (name = c("anna", "joe", "anna"), 
                   party =c("red", "blue", "red"),
                   text = c("hey there", "we ate an apple", "i took a walk"),
                   numberofwords = c(2,4,4),
                   score1 = 1:3, score2= 4:6,stringsAsFactors = F)
#First aggregate text after that aggregate continuous variables and merge
new <- test %>% 
  group_by(name,party) %>% summarise(text=paste0(text,collapse = ' ')) %>%
  left_join(
    test %>% select(-text) %>%
      group_by(name,party) %>%
      summarise_all(sum,na.rm=T)
  )

Output:

# A tibble: 2 x 6
# Groups:   name [2]
  name  party text                    numberofwords score1 score2
  <chr> <chr> <chr>                           <dbl>  <int>  <int>
1 anna  red   hey there i took a walk             6      4     10
2 joe   blue  we ate an apple                     4      2      5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM