简体   繁体   中英

Getting a cumulative count of rows in each column excluding NAs in R

I have a dataframe that is structured as follows:

structure(list(CT_CW.QA.RWL.H1A1Y = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A1Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
CT_CW.QA.RWL.H1A2Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A2Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A3Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A3Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A4Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A4Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A5Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A5Z = c(1.07, 0.41, 0.87, 1.21, 0.99, 0.77, 
0.73, 0.77, 0.61, 0.89), CT_CW.QA.RWL.H1A6Y = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A6Z = c(NA, 
NA, 0.92, 0.64, 0.63, 0.48, 0.17, 0.28, 0.32, 0.64), CT_CW.QA.RWL.H1A7Y = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A7Z = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A8Y = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_)), row.names = c("1812", "1813", 
"1814", "1815", "1816", "1817", "1818", "1819", "1820", "1821"
), class = "data.frame")

What I would like to do is, for each column, get the cumulative count of the number of rows excluding NAs (it's necessary to leave the NAs in at this point).

I've tried the following (where test is the dataframe above):

test_count = cumsum(colSums(!is.na(test)))

But this seems to be continuing the count across columns, whereas I need a unique cumulative count of the number of rows in each column, so that the result is a dataframe that looks like (this is for visual reference only, the numbers are made up):

Row.Name    CT_CW.QA.RWL.H1A1Y    CT_CW.QA.RWL.H1A2Y
  1812              0                      0
  1813              0                      1
  1814              1                      2

...indicating that the original dataframe ( test ) had NAs in the first two rows of columns CT_CW.QA.RWL.H1A1Y, and an NA in the first row of the column CT.CW.QA.RWL.H1A2Y (again, this hypothetical doesn't represent the values in the data above, it's just to illustrate the structure of what I'm looking for)

We can loop over the column with lapply , convert to logical, do the cumsum and assign the output back to the original or a copy of the original object. Make sure to use [] to preserve the attributes

test1 <- test
test1[] <- lapply(test, function(x) cumsum(!is.na(x)))

-output

head(test1)
     CT_CW.QA.RWL.H1A1Y CT_CW.QA.RWL.H1A1Z CT_CW.QA.RWL.H1A2Y CT_CW.QA.RWL.H1A2Z CT_CW.QA.RWL.H1A3Y CT_CW.QA.RWL.H1A3Z CT_CW.QA.RWL.H1A4Y
1812                  0                  0                  0                  0                  0                  0                  0
1813                  0                  0                  0                  0                  0                  0                  0
1814                  0                  0                  0                  0                  0                  0                  0
1815                  0                  0                  0                  0                  0                  0                  0
1816                  0                  0                  0                  0                  0                  0                  0
1817                  0                  0                  0                  0                  0                  0                  0
     CT_CW.QA.RWL.H1A4Z CT_CW.QA.RWL.H1A5Y CT_CW.QA.RWL.H1A5Z CT_CW.QA.RWL.H1A6Y CT_CW.QA.RWL.H1A6Z CT_CW.QA.RWL.H1A7Y CT_CW.QA.RWL.H1A7Z
1812                  0                  0                  1                  0                  0                  0                  0
1813                  0                  0                  2                  0                  0                  0                  0
1814                  0                  0                  3                  0                  1                  0                  0
1815                  0                  0                  4                  0                  2                  0                  0
1816                  0                  0                  5                  0                  3                  0                  0
1817                  0                  0                  6                  0                  4                  0                  0
     CT_CW.QA.RWL.H1A8Y
1812                  0
1813                  0
1814                  0
1815                  0
1816                  0
1817                  0
....

There is a colCumsums from matrixStats

library(matrixStats)
test1[] <- colCumsums(!is.na(test))

The issue with cumsum returning a vector is because colSums is a single vector of one observation per each column, and cumsum on it returns the cumulative sum of column sums instead of cumulative sum inside each column

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM