简体   繁体   English

按行对具有相似名称的列求和

[英]Sum columns row-wise with similar names

I have a dataframe that has lots of columns that are something like this:我有一个dataframe ,它有很多像这样的列:

data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)

I'd like a result with columns that sum the variables that have the same prefix.我想要一个结果,其中的列对具有相同前缀的变量求和。 In this example, I want to return a dataframe: a = (9:13), bt = (11:15)在这个例子中,我想返回一个 dataframe:a = (9:13), bt = (11:15)

My real data set is quite a bit more complicated (I want to combine page view counts for web pages with different utm parameters) but a solution for this case should put me on the right track.我的真实数据集要复杂得多(我想将 web 页面的页面浏览量与不同的 utm 参数结合起来)但是针对这种情况的解决方案应该让我走上正轨。

Here a solution with base R: 这里有一个基础R的解决方案:

> prefixes = unique(sub("\\..*", "", colnames(data)))
> sapply(prefixes, function(x)rowSums(data[,startsWith(colnames(data), x)]))
      a bt
[1,]  9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19

How about a one-liner approach using base R's rowsum function:使用基本 R 的rowsum函数的rowsum方法如何:

> t(rowsum(t(data), group = sub("\\..*", "", colnames(data))))
      a bt
[1,]  9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19

The idea is to transpose the data so that the columns become rows, then apply the rowsum function to sum up these rows indexed by the same group label.这个想法是转置数据,使列变成行,然后应用rowsum函数来总结由相同组标签索引的这些行。 Transposing again returns the data to its original form, now with the columns with the same labels summed up.再次转置将数据返回到其原始形式,现在汇总具有相同标签的列。

You can try 你可以试试

library(tidyverse)
data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11) %>% 
  rownames_to_column() %>% 
  gather(k, v, -rowname) %>% 
  separate(k, letters[1:2]) %>% 
  group_by(rowname, a) %>% 
  summarise(Sum=sum(v)) %>% 
  spread(a, Sum)
#> # A tibble: 5 x 3
#> # Groups:   rowname [5]
#>   rowname     a    bt
#>   <chr>   <int> <int>
#> 1 1           9    11
#> 2 2          12    13
#> 3 3          15    15
#> 4 4          18    17
#> 5 5          21    19

Created on 2018-04-16 by the reprex package (v0.2.0). reprex包 (v0.2.0)于2018-04-16创建。

Here's another tidyverse solution: 这是另一个tidyverse解决方案:

library(tidyverse)

t(data) %>%
  data.frame() %>%
  group_by(., id = gsub('\\..*', '', rownames(.))) %>%
  summarise_all(sum) %>%
  data.frame() %>%
  column_to_rownames(var = 'id') %>%
  t()

Result: 结果:

    a bt
X1  9 11
X2 12 13
X3 15 15
X4 18 17
X5 21 19
data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)
i <- grepl("a.", names(data), fixed = TRUE)
result <- data.frame(a=rowSums(data[, i]), bt=rowSums(data[, !i]))
result
# > result
#    a bt
# 1  9 11
# 2 12 13
# 3 15 15
# 4 18 17
# 5 21 19

If you have more than two prefixes you can do something like: 如果您有两个以上的前缀,您可以执行以下操作:

prefs <- c("a.", "bt.")
as.data.frame(lapply(prefs, function(p) rowSums(data[, grepl(p, names(data), fixed = TRUE)]) ))

Another solution is to use matrix product:另一种解决方案是使用矩阵乘积:

data <- data.frame (a.1 = 1:5, a.2b = 3:7, a.5 = 5:9, bt.16 = 4:8, bt.12342 = 7:11)
as.matrix(data) %*% sapply(c("a","b"), function(a,b){startsWith(b,a)}, colnames(data))

Result:结果:

      a  b
[1,]  9 11
[2,] 12 13
[3,] 15 15
[4,] 18 17
[5,] 21 19

Here sapply(c("a","b"), function(a,b){startsWith(b,a)}, colnames(data)) is这里sapply(c("a","b"), function(a,b){startsWith(b,a)}, colnames(data))

         a     b
[1,]  TRUE FALSE
[2,]  TRUE FALSE
[3,]  TRUE FALSE
[4,] FALSE  TRUE
[5,] FALSE  TRUE

denoting how the columns should be combined.表示应如何组合列。 Note that in this way you can easily keep the row names of your data.请注意,通过这种方式,您可以轻松保留数据的行名称。

Here sapply is used to keep column names, otherwise you can simply use outer(colnames(data), c("a","b"), startsWith) and then set column names by yourself.这里sapply是用来保留列名的,否则可以直接用outer(colnames(data), c("a","b"), startsWith)然后自己设置列名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM