简体   繁体   中英

How do I sum all observations row-wise for all columns that have the same name?

I'm looking at census data for Ontario, Canada and there are columns that have the same column name (they have the same name because they represent different subdivisions of the census regions). I want to sum row-wise for any columns that have the same column name but have run into trouble. In my sample data there are only duplicate column names, but in the actual data there are several columns with the same name. Is there a vectorized way in R to do this?

  TORONTO HALTON  PEEL YORK BRANT HALDIMAND-NORFOLK HAMILTON MUSKOKA NIAGARA 
  20855   4011 11178 8138   996               739     3835     305    2923            
  23281   3997 11770 8417   961               684     4095     343    2970            
  24130   3900 11810 8306   972               732     4168     334    2985            
 TORONTO HALTON  PEEL  YORK BRANT HALDIMAND-NORFOLK HAMILTON MUSKOKA NIAGARA 
  39924   7863 21415 15714  1947              1428     7320     646    5675    
  44357   7820 22340 16261  1861              1369     7755     697    5775            
  46016   7679 22577 16260  1971              1447     7883     717    5868 

I attempted it with ifelse statement with no luck. Something like this pseudo-code:

# where i is the column name
for every column with name i(sum rows of each column with name == i)

Would appreciate any guidance!!

We can split the dataset based on the names of the dataset and apply the rowSums on the list of datasets with same name

do.call(cbind, lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE))
#    BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA  PEEL TORONTO  YORK
#[1,]  2943              2167  11874    11155     951    8598 32593   60779 23852
#[2,]  2822              2053  11817    11850    1040    8745 34110   67638 24678
#[3,]  2943              2179  11579    12051    1051    8853 34387   70146 24566

Or as @thelatemail mentioned, if we need a data.frame output, wrap the list output with data.frame

data.frame(lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE)) 

Or using tidyverse

library(tidyverse)
dfN %>% 
   split.default(names(.))  %>% 
   map_df(reduce, `+`)
# A tibble: 3 x 9
#  BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA  PEEL TORONTO  YORK
#  <int>             <int>  <int>    <int>   <int>   <int> <int>   <int> <int>
#1  2943              2167  11874    11155     951    8598 32593   60779 23852
#2  2822              2053  11817    11850    1040    8745 34110   67638 24678
#3  2943              2179  11579    12051    1051    8853 34387   70146 24566

data

dfN <- structure(list(TORONTO = c(20855L, 23281L, 24130L), HALTON = c(4011L, 
3997L, 3900L), PEEL = c(11178L, 11770L, 11810L), YORK = c(8138L, 
8417L, 8306L), BRANT = c(996L, 961L, 972L), HALDIMAND.NORFOLK = c(739L, 
684L, 732L), HAMILTON = c(3835L, 4095L, 4168L), MUSKOKA = c(305L, 
343L, 334L), NIAGARA = c(2923L, 2970L, 2985L), TORONTO = c(39924L, 
44357L, 46016L), HALTON = c(7863L, 7820L, 7679L), PEEL = c(21415L, 
22340L, 22577L), YORK = c(15714L, 16261L, 16260L), BRANT = c(1947L, 
1861L, 1971L), HALDIMAND.NORFOLK = c(1428L, 1369L, 1447L), HAMILTON = c(7320L, 
7755L, 7883L), MUSKOKA = c(646L, 697L, 717L), NIAGARA = c(5675L, 
5775L, 5868L)), class = "data.frame", row.names = c(NA, -3L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM