Sum columns row-wise with similar names

Question

I have a dataframe that has lots of columns that are something like this:

data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)

I'd like a result with columns that sum the variables that have the same prefix. In this example, I want to return a dataframe: a = (9:13), bt = (11:15)

My real data set is quite a bit more complicated (I want to combine page view counts for web pages with different utm parameters) but a solution for this case should put me on the right track.

Answer 1

Here a solution with base R:

> prefixes = unique(sub("\\..*", "", colnames(data)))
> sapply(prefixes, function(x)rowSums(data[,startsWith(colnames(data), x)]))
      a bt
[1,]  9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19

Answer 2

How about a one-liner approach using base R's rowsum function:

> t(rowsum(t(data), group = sub("\\..*", "", colnames(data))))
      a bt
[1,]  9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19

The idea is to transpose the data so that the columns become rows, then apply the rowsum function to sum up these rows indexed by the same group label. Transposing again returns the data to its original form, now with the columns with the same labels summed up.

Answer 3

You can try

library(tidyverse)
data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11) %>% 
  rownames_to_column() %>% 
  gather(k, v, -rowname) %>% 
  separate(k, letters[1:2]) %>% 
  group_by(rowname, a) %>% 
  summarise(Sum=sum(v)) %>% 
  spread(a, Sum)
#> # A tibble: 5 x 3
#> # Groups:   rowname [5]
#>   rowname     a    bt
#>   <chr>   <int> <int>
#> 1 1           9    11
#> 2 2          12    13
#> 3 3          15    15
#> 4 4          18    17
#> 5 5          21    19

Created on 2018-04-16 by the reprex package (v0.2.0).

Answer 4

Here's another tidyverse solution:

library(tidyverse)

t(data) %>%
  data.frame() %>%
  group_by(., id = gsub('\\..*', '', rownames(.))) %>%
  summarise_all(sum) %>%
  data.frame() %>%
  column_to_rownames(var = 'id') %>%
  t()

Result:

Answer 5

data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)
i <- grepl("a.", names(data), fixed = TRUE)
result <- data.frame(a=rowSums(data[, i]), bt=rowSums(data[, !i]))
result
# > result
#    a bt
# 1  9 11
# 2 12 13
# 3 15 15
# 4 18 17
# 5 21 19

If you have more than two prefixes you can do something like:

prefs <- c("a.", "bt.")
as.data.frame(lapply(prefs, function(p) rowSums(data[, grepl(p, names(data), fixed = TRUE)]) ))

Answer 6

Another solution is to use matrix product:

data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)
as.matrix(data) %*% sapply(c("a","b"), function(a,b){startsWith(b,a)}, colnames(data))

Result:

      a  b
[1,]  9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19

Here sapply(c("a","b"), function(a,b){startsWith(b,a)}, colnames(data)) is

         a     b
[1,]  TRUE FALSE
[2,]  TRUE FALSE
[3,]  TRUE FALSE
[4,] FALSE  TRUE
[5,] FALSE  TRUE

denoting how the columns should be combined. Note that in this way you can easily keep the row names of your data.

Here sapply is used to keep column names, otherwise you can simply use outer(colnames(data), c("a","b"), startsWith) and then set column names by yourself.

Sum columns row-wise with similar names

Question

6 answers

solution1
5 2018-04-16 14:09:12

solution2
2 2021-10-10 13:57:22

solution3
1 2018-04-16 14:15:13

solution4
1 2018-04-16 14:17:57

solution5
1 2018-04-16 14:18:31

solution6
0 2023-01-11 14:45:35

Sum columns row-wise with similar names

Question

6 answers

solution1 5 2018-04-16 14:09:12

solution2 2 2021-10-10 13:57:22

solution3 1 2018-04-16 14:15:13

solution4 1 2018-04-16 14:17:57

solution5 1 2018-04-16 14:18:31

solution6 0 2023-01-11 14:45:35

solution1
5 2018-04-16 14:09:12

solution2
2 2021-10-10 13:57:22

solution3
1 2018-04-16 14:15:13

solution4
1 2018-04-16 14:17:57

solution5
1 2018-04-16 14:18:31

solution6
0 2023-01-11 14:45:35