简体   繁体   English

根据未知名称的列的值和列数对行进行子集

[英]Subset rows based on values of columns of unknown names and number of columns

I am sure I have a very basic question but I am frustrated after searching for the idea on how to accomplish subsetting (getting row numbers) of some data frame/matrix which can have any number of columns and column names change all the time. 我确定我有一个非常基本的问题,但是在寻找关于如何完成某些数据帧/矩阵的子集(获取行号)的想法之后,我感到沮丧,该数据帧/矩阵可以具有任意数量的列,并且列名一直在变化。 I would like to find only rows (indexes) of the data frame for which any of the columns is greater than 0. Since column names and number of columns is unknown I do not know how to do this... 我只想查找数据框的任何列大于0的行(索引)。由于未知的列名和列数,我不知道该怎么做...

An example: 一个例子:

# these are the terms I am looking in
terms <- c("beats", "revs", "revenue", "earnings")
# dict <- Dictionary(terms)
# dictStudy <- inspect(DocumentTermMatrix(mydata.corpus.tmp, list(dictionary = dict)))

dictStudy <- data.frame(beats=c(0, 0, 0, 1, 0, 2), revs=c(0, 0, 0, 1, 0, 1), revenue=c(0, 0, 0, 0, 0, 0), earnings=c(1, 0, 0, 1, 0, 1)) 
ss <- expression(terms > 0)
dictStudy.matching <- subset(dictStudy, eval(ss))

I was hoping that expression and eval would save me, but I can not figure this out. 我希望这种表达和评估可以挽救我,但我无法弄清楚。

How to find only rows in a data frame that have any of the columns > 0? 如何仅在数据框中查找任何列> 0的行?

I'm assuming you mean you want the rows where at least one element of that row is greater than zero (ie any of the columns are greater than zero). 我假设您的意思是您想要该行中至少一个元素大于零(即,任何列均大于零)的行。

> which(apply(dictStudy,1,function(x) any(x > 0)))
[1] 1 4 6

As Tommy points out below, this assumes that all your columns are in fact numeric. 正如Tommy在下面指出的那样,这假定您的所有列实际上都是数字。 You could sidestep this by subseting your data frame to pull out only those columns that are numeric: 您可以通过对数据框架进行子集化以仅拉出那些数字列来回避此问题:

> which(apply(dictStudy[,sapply(dictStudy,is.numeric)],1,function(x) any(x > 0)))
[1] 1 4 6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM