簡體   English   中英

基於R中的列名聚合列

[英]Aggregating columns based on columns name in R

我在 R 中有這個數據框

Party Pro2005 Anti2005 Pro2006 Anti2006 Pro2007 Anti2007
R       1       18       0        7       2       13   
R       1       19       0        7       1       14   

D      13        7       3        4      10        5 
D      12        8       3        4       9        6  

我想將它聚合到它將結合所有基於派對的優點和反面的地方

例如

Party ProSum AntiSum
R.     234.   245
D.     234.   245

我將如何在 R 中做到這一點?

我建議采用tidyverse方法來重塑數據並計算值的總和:

library(tidyverse)
#Data
df <- structure(list(Party = c("R", "R", "D", "D"), Pro2005 = c(1L, 
1L, 13L, 12L), Anti2005 = c(18L, 19L, 7L, 8L), Pro2006 = c(0L, 
0L, 3L, 3L), Anti2006 = c(7L, 7L, 4L, 4L), Pro2007 = c(2L, 1L, 
10L, 9L), Anti2007 = c(13L, 14L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
-4L)) 

編碼:

df %>% pivot_longer(cols = -1) %>%
  #Format strings
  mutate(name=gsub('\\d+','',name)) %>%
  #Aggregate
  group_by(Party,name) %>% summarise(value=sum(value,na.rm=T)) %>%
  pivot_wider(names_from = name,values_from=value)

輸出:

# A tibble: 2 x 3
# Groups:   Party [2]
  Party  Anti   Pro
  <chr> <int> <int>
1 D        34    50
2 R        78     5

by各方拆分並使用sapply對 pro/anti 進行循環sum ,最后使用rbind

res <- data.frame(Party=sort(unique(d$Party)), do.call(rbind, by(d, d$Party, function(x) 
  sapply(c("Pro", "Anti"), function(y) sum(x[grep(y, names(x))])))))
res
#   Party Pro Anti
# D     D  50   34
# R     R   5   78

outer解決方案也是合適的。

t(outer(c("Pro", "Anti"), c("R", "D"), 
      Vectorize(function(x, y) sum(d[d$Party %in% y, grep(x, names(d))]))))
#      [,1] [,2]
# [1,]    5   78
# [2,]   50   34

數據:

d <- read.table(header=T, text="Party Pro2005 Anti2005 Pro2006 Anti2006 Pro2007 Anti2007
R       1       18       0        7       2       13   
R       1       19       0        7       1       14   

D      13        7       3        4      10        5 
D      12        8       3        4       9        6  ")

您可以使用:

library(tidyverse)
df %>% 
  pivot_longer(-Party,
               names_to = c(".value", NA),
               names_pattern = "([a-zA-Z]*)([0-9]*)") %>% 
  group_by(Party) %>% 
  summarise(across(where(is.numeric), sum, na.rm = T))

# A tibble: 2 x 3
  Party   Pro  Anti
  <chr> <int> <int>
1 D        50    34
2 R         5    78

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM