简体   繁体   中英

How to count number of rows in data.table using same condition on multiple columns in R

If I have the below data table in R

dt <- data.table(x = c("No", "No", "Yes")
              ,  y = c("No", "Yes", "Yes")
              ,  z = c("No", "No", "No")
              ,  q = c("Hi", "Where, "When")
              ,  AA = c(1, 1,1)
              ,  ID = c(999, 789, 567)
              , bop = c(2345, 2222,2222))

and I want to get the count of "No" in each variable(x, y, z) in below format

Variable Type Count

 x     No      2

 y     No     1

 z     No     3

I am not able to pass the same filter to the columns I want. I can only pass filter to one column

dt[, .(.N), by = .(x,y,z)]


We can use melt to reshape into 'long' format, and then do a group by 'variable' (ie the column name column), get the number of 'No' by sum ming the logical vector ( value == 'No' )

library(data.table)
melt(dt, id.var = 'q')[, .(Type = 'No', Count = sum(value == 'No')), .(variable)]
#   variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

For the new example

melt(dt[, .(x, y, z)], measure = c('x', 'y', 'z'))[, 
   .(Type = 'No', Count = sum(value == 'No')), .(variable)]
#    variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

Or specify the logical condition in i and use .N

melt(dt[, .(x, y, z)], measure = c('x', 'y', 'z'))[value == 'No',
       .(Type = first(value), Count = .N), variable]
#   variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

Or without melt ing we can loop over the Subset of Data.table ( .SD ), create a logical vector and reduce the list to a single vector with Reduce

dt[, .(variable = names(.SD), Type = 'No', 
     Count = Reduce(`+`, lapply(.SD, `==`, 'No'))), .SDcols = x:z]
#  variable Type Count
#1:        x   No     3
#2:        y   No     2
#3:        z   No     1

Or using rowSums

dt[, .(Variable = names(.SD), Type = 'No',
        Count = rowSums(.SD == 'No')), .SDcols = x:z]

In the OP's code, the grouping columns were taken as 'x', 'y', 'z'. Then, if we do .N , it is looking for the number of rows for each unique set of values across those columns and this will be 1 for each case because there is only 1 row for No No No or No Yes No or Yes Yes No

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM