I have a dataframe that has lots of columns that are something like this:
data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)
I'd like a result with columns that sum the variables that have the same prefix. In this example, I want to return a dataframe: a = (9:13), bt = (11:15)
My real data set is quite a bit more complicated (I want to combine page view counts for web pages with different utm parameters) but a solution for this case should put me on the right track.
Here a solution with base R:
> prefixes = unique(sub("\\..*", "", colnames(data)))
> sapply(prefixes, function(x)rowSums(data[,startsWith(colnames(data), x)]))
a bt
[1,] 9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19
How about a one-liner approach using base R's rowsum
function:
> t(rowsum(t(data), group = sub("\\..*", "", colnames(data))))
a bt
[1,] 9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19
The idea is to transpose the data so that the columns become rows, then apply the rowsum
function to sum up these rows indexed by the same group label. Transposing again returns the data to its original form, now with the columns with the same labels summed up.
You can try
library(tidyverse)
data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11) %>%
rownames_to_column() %>%
gather(k, v, -rowname) %>%
separate(k, letters[1:2]) %>%
group_by(rowname, a) %>%
summarise(Sum=sum(v)) %>%
spread(a, Sum)
#> # A tibble: 5 x 3
#> # Groups: rowname [5]
#> rowname a bt
#> <chr> <int> <int>
#> 1 1 9 11
#> 2 2 12 13
#> 3 3 15 15
#> 4 4 18 17
#> 5 5 21 19
Created on 2018-04-16 by the reprex package (v0.2.0).
Here's another tidyverse
solution:
library(tidyverse)
t(data) %>%
data.frame() %>%
group_by(., id = gsub('\\..*', '', rownames(.))) %>%
summarise_all(sum) %>%
data.frame() %>%
column_to_rownames(var = 'id') %>%
t()
Result:
a bt
X1 9 11
X2 12 13
X3 15 15
X4 18 17
X5 21 19
data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)
i <- grepl("a.", names(data), fixed = TRUE)
result <- data.frame(a=rowSums(data[, i]), bt=rowSums(data[, !i]))
result
# > result
# a bt
# 1 9 11
# 2 12 13
# 3 15 15
# 4 18 17
# 5 21 19
If you have more than two prefixes you can do something like:
prefs <- c("a.", "bt.")
as.data.frame(lapply(prefs, function(p) rowSums(data[, grepl(p, names(data), fixed = TRUE)]) ))
Another solution is to use matrix product:
data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)
as.matrix(data) %*% sapply(c("a","b"), function(a,b){startsWith(b,a)}, colnames(data))
Result:
a b
[1,] 9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19
Here sapply(c("a","b"), function(a,b){startsWith(b,a)}, colnames(data))
is
a b
[1,] TRUE FALSE
[2,] TRUE FALSE
[3,] TRUE FALSE
[4,] FALSE TRUE
[5,] FALSE TRUE
denoting how the columns should be combined. Note that in this way you can easily keep the row names of your data.
Here sapply
is used to keep column names, otherwise you can simply use outer(colnames(data), c("a","b"), startsWith)
and then set column names by yourself.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.