简体   繁体   中英

How to calculate the sum of periods over each column for each row and divide it by a total minimum value in R

I have a calculation where I accumulate the sum of each flower in the table below for each year. However, I would like to use the same calculation to also divide the accumulated values by a total minimum value.

(Data Frame)

 dat <- data.frame(
  stringsAsFactors = FALSE,
  flower = c("lily", "rose", "daisy"),
  x1902 = c(23L, 50L, 30L),
  x1950 = c(23L, 110L, 37L),
  x2010 = c( 23L, 115L, 47L),
  x2012 = c( 31L, 131L, 49L),
  x2021 = c( 36L, 131L, 49L),
  total_min = c(5L, 5L, 2L)

The code that I used to do the accumulation as follows:

dat[, 2:6] <- t(apply(dat[,2:6], 1, cumsum))

So now I need to divide the accumulated amount by the total min for each flower type for each year. I tried the code as follows but I am not getting the correct results:

dat[,2:6] <- t(apply(dat[,2:6], 1, cumsum)/dat$total_min)

(Table 1 - DataFrame)

flower x1902 x1950 x2010 x2012 x2021 total_min
lily 23 0 0 8 5 5
rose 50 60 5 16 0 5
daisy 30 7 10 2 0 2

Calculating the sum for each flower in each year. The end result gives me:

(Table 2 - Accumulated results )

flower x1902 x1950 x2010 x2012 x2021
lily 23 23 23 31 36
rose 50 110 115 131 131
daisy 30 37 47 49 49

The final result should look like Table 3

(Table 3 - expected results)

flower x1902 x1950 x2010 x2012 x2021
lily 4.6 4.6 4.6 6.2 7.2
rose 10 22 23 26.2 26.2
daisy 15 18.5 23.5 24.5 24.5

The problems with the code in the question are:

  • the final ) is in the wrong place
  • although not wrong it would be less error prone to create a new version of the data with a different name.
  • also it would be a bit more robust if we extract the columns whose names start with x rather than hard coding 2:6.

1) Using dat0 shown in the Note at the end we have:

xnms <- startsWith(names(dat0), "x")
dat2 <- replace(dat0, xnms, t(apply(dat0[xnms], 1, cumsum)) / dat0$total_min)
dat2
##   flower x1902 x1950 x2010 x2012 x2021 total_min
## 1   lily   4.6   4.6   4.6   6.2   7.2         5
## 2   rose  10.0  22.0  23.0  26.2  26.2         5
## 3  daisy  15.0  18.5  23.5  24.5  24.5         2

2) Another approach is Reduce :

replace(dat0, xnms, 
  as.data.frame(Reduce("+", dat0[xnms], acc = TRUE)) / dat0$total_min)

Note

dat0 <-
structure(list(flower = c("lily", "rose", "daisy"), x1902 = c(23L, 
50L, 30L), x1950 = c(0L, 60L, 7L), x2010 = c(0L, 5L, 10L), x2012 = c(8L, 
16L, 2L), x2021 = c(5L, 0L, 0L), total_min = c(5L, 5L, 2L)),
class = "data.frame", row.names = c(NA, -3L))

dat0
##   flower x1902 x1950 x2010 x2012 x2021 total_min
## 1   lily    23     0     0     8     5         5
## 2   rose    50    60     5    16     0         5
## 3  daisy    30     7    10     2     0         2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM