I'm looking at census data for Ontario, Canada and there are columns that have the same column name (they have the same name because they represent different subdivisions of the census regions). I want to sum row-wise for any columns that have the same column name but have run into trouble. In my sample data there are only duplicate column names, but in the actual data there are several columns with the same name. Is there a vectorized way in R to do this?
TORONTO HALTON PEEL YORK BRANT HALDIMAND-NORFOLK HAMILTON MUSKOKA NIAGARA
20855 4011 11178 8138 996 739 3835 305 2923
23281 3997 11770 8417 961 684 4095 343 2970
24130 3900 11810 8306 972 732 4168 334 2985
TORONTO HALTON PEEL YORK BRANT HALDIMAND-NORFOLK HAMILTON MUSKOKA NIAGARA
39924 7863 21415 15714 1947 1428 7320 646 5675
44357 7820 22340 16261 1861 1369 7755 697 5775
46016 7679 22577 16260 1971 1447 7883 717 5868
I attempted it with ifelse statement with no luck. Something like this pseudo-code:
# where i is the column name
for every column with name i(sum rows of each column with name == i)
Would appreciate any guidance!!
We can split
the dataset based on the names
of the dataset and apply the rowSums
on the list
of datasets with same name
do.call(cbind, lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE))
# BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA PEEL TORONTO YORK
#[1,] 2943 2167 11874 11155 951 8598 32593 60779 23852
#[2,] 2822 2053 11817 11850 1040 8745 34110 67638 24678
#[3,] 2943 2179 11579 12051 1051 8853 34387 70146 24566
Or as @thelatemail mentioned, if we need a data.frame
output, wrap the list
output with data.frame
data.frame(lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE))
Or using tidyverse
library(tidyverse)
dfN %>%
split.default(names(.)) %>%
map_df(reduce, `+`)
# A tibble: 3 x 9
# BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA PEEL TORONTO YORK
# <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 2943 2167 11874 11155 951 8598 32593 60779 23852
#2 2822 2053 11817 11850 1040 8745 34110 67638 24678
#3 2943 2179 11579 12051 1051 8853 34387 70146 24566
dfN <- structure(list(TORONTO = c(20855L, 23281L, 24130L), HALTON = c(4011L,
3997L, 3900L), PEEL = c(11178L, 11770L, 11810L), YORK = c(8138L,
8417L, 8306L), BRANT = c(996L, 961L, 972L), HALDIMAND.NORFOLK = c(739L,
684L, 732L), HAMILTON = c(3835L, 4095L, 4168L), MUSKOKA = c(305L,
343L, 334L), NIAGARA = c(2923L, 2970L, 2985L), TORONTO = c(39924L,
44357L, 46016L), HALTON = c(7863L, 7820L, 7679L), PEEL = c(21415L,
22340L, 22577L), YORK = c(15714L, 16261L, 16260L), BRANT = c(1947L,
1861L, 1971L), HALDIMAND.NORFOLK = c(1428L, 1369L, 1447L), HAMILTON = c(7320L,
7755L, 7883L), MUSKOKA = c(646L, 697L, 717L), NIAGARA = c(5675L,
5775L, 5868L)), class = "data.frame", row.names = c(NA, -3L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.