繁体   English   中英

基于R中的列名聚合列

[英]Aggregating columns based on columns name in R

我在 R 中有这个数据框

Party Pro2005 Anti2005 Pro2006 Anti2006 Pro2007 Anti2007
R       1       18       0        7       2       13   
R       1       19       0        7       1       14   

D      13        7       3        4      10        5 
D      12        8       3        4       9        6  

我想将它聚合到它将结合所有基于派对的优点和反面的地方

例如

Party ProSum AntiSum
R.     234.   245
D.     234.   245

我将如何在 R 中做到这一点?

我建议采用tidyverse方法来重塑数据并计算值的总和:

library(tidyverse)
#Data
df <- structure(list(Party = c("R", "R", "D", "D"), Pro2005 = c(1L, 
1L, 13L, 12L), Anti2005 = c(18L, 19L, 7L, 8L), Pro2006 = c(0L, 
0L, 3L, 3L), Anti2006 = c(7L, 7L, 4L, 4L), Pro2007 = c(2L, 1L, 
10L, 9L), Anti2007 = c(13L, 14L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
-4L)) 

编码:

df %>% pivot_longer(cols = -1) %>%
  #Format strings
  mutate(name=gsub('\\d+','',name)) %>%
  #Aggregate
  group_by(Party,name) %>% summarise(value=sum(value,na.rm=T)) %>%
  pivot_wider(names_from = name,values_from=value)

输出:

# A tibble: 2 x 3
# Groups:   Party [2]
  Party  Anti   Pro
  <chr> <int> <int>
1 D        34    50
2 R        78     5

by各方拆分并使用sapply对 pro/anti 进行循环sum ,最后使用rbind

res <- data.frame(Party=sort(unique(d$Party)), do.call(rbind, by(d, d$Party, function(x) 
  sapply(c("Pro", "Anti"), function(y) sum(x[grep(y, names(x))])))))
res
#   Party Pro Anti
# D     D  50   34
# R     R   5   78

outer解决方案也是合适的。

t(outer(c("Pro", "Anti"), c("R", "D"), 
      Vectorize(function(x, y) sum(d[d$Party %in% y, grep(x, names(d))]))))
#      [,1] [,2]
# [1,]    5   78
# [2,]   50   34

数据:

d <- read.table(header=T, text="Party Pro2005 Anti2005 Pro2006 Anti2006 Pro2007 Anti2007
R       1       18       0        7       2       13   
R       1       19       0        7       1       14   

D      13        7       3        4      10        5 
D      12        8       3        4       9        6  ")

您可以使用:

library(tidyverse)
df %>% 
  pivot_longer(-Party,
               names_to = c(".value", NA),
               names_pattern = "([a-zA-Z]*)([0-9]*)") %>% 
  group_by(Party) %>% 
  summarise(across(where(is.numeric), sum, na.rm = T))

# A tibble: 2 x 3
  Party   Pro  Anti
  <chr> <int> <int>
1 D        50    34
2 R         5    78

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM