简体   繁体   中英

Subset rows based on values of columns of unknown names and number of columns

I am sure I have a very basic question but I am frustrated after searching for the idea on how to accomplish subsetting (getting row numbers) of some data frame/matrix which can have any number of columns and column names change all the time. I would like to find only rows (indexes) of the data frame for which any of the columns is greater than 0. Since column names and number of columns is unknown I do not know how to do this...

An example:

# these are the terms I am looking in
terms <- c("beats", "revs", "revenue", "earnings")
# dict <- Dictionary(terms)
# dictStudy <- inspect(DocumentTermMatrix(mydata.corpus.tmp, list(dictionary = dict)))

dictStudy <- data.frame(beats=c(0, 0, 0, 1, 0, 2), revs=c(0, 0, 0, 1, 0, 1), revenue=c(0, 0, 0, 0, 0, 0), earnings=c(1, 0, 0, 1, 0, 1)) 
ss <- expression(terms > 0)
dictStudy.matching <- subset(dictStudy, eval(ss))

I was hoping that expression and eval would save me, but I can not figure this out.

How to find only rows in a data frame that have any of the columns > 0?

I'm assuming you mean you want the rows where at least one element of that row is greater than zero (ie any of the columns are greater than zero).

> which(apply(dictStudy,1,function(x) any(x > 0)))
[1] 1 4 6

As Tommy points out below, this assumes that all your columns are in fact numeric. You could sidestep this by subseting your data frame to pull out only those columns that are numeric:

> which(apply(dictStudy[,sapply(dictStudy,is.numeric)],1,function(x) any(x > 0)))
[1] 1 4 6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM