简体   繁体   English

R 中的条件子集数据帧

[英]Conditionally subset data frame in R

I have a data frame that has 10 columns and 510 rows.我有一个有 10 列和 510 行的数据框。 I'm trying to create a subset of it wherein if the row sum of the first 5 columns equals 0, the entire row is discarded.我正在尝试创建它的一个子集,其中如果前 5 列的行总和等于 0,则丢弃整行。 I've read posts on this site saying that you can't simply delete rows in R, so I've tried the following:我读过这个网站上的帖子说你不能简单地删除 R 中的行,所以我尝试了以下方法:

    data_sub <- data[!sum(data[, 1:5]==0), ]

However, data_sub ends up being a copy of data... and I'm really not sure why... Please advise, This data frame has no Inf or NaN values.但是,data_sub 最终成为数据的副本......我真的不知道为什么......请告知,这个数据框没有 Inf 或 NaN 值。 only integers.只有整数。

Try the following:尝试以下操作:

ind <- apply(data, 1, function(x) sum(x[1:5]) != 0)
data_sub <- data[ind, ]

or或者

data_sub <- data[rowSums(data[,1:5]) != 0, ]

This is what you want这就是你想要的

reprex[sum(reprex[,1:5])!=0,] 

returns a data set meeting your criteria.返回满足您条件的数据集。 This applies to arrays or data frames.这适用于 arrays 或数据帧。 Notice however, that the original HAS NOT CHANGED , nor should it.但是请注意,原来的没有改变,也不应该改变。

In the future, consider including a reproducible example as the one in the code below.将来,请考虑在下面的代码中包含一个可重现的示例。 It doesn't have to be complex, but I think you'll find the act of making one will clarify your thinking.它不必很复杂,但我认为你会发现制作一个的行为会澄清你的想法。 It does for me!它对我有用!

# emily example

# sample column as a 50% chance of being zero and 50 percent chance of random 
set.seed(152)
sample_column<-function(col_length) {
  ifelse(runif(col_length)<0.5,0,runif(col_length))
}

# produce some columns of random numbers.  Spike it with 
# zeroes to make the filter actually catch some.

make_reprex<-function(nrows,ncols) {
  id=1:nrows
  colnames=paste0('x',1:ncols)
  data=matrix(nrow=nrows,ncol=ncols)
  rownames(data)=id
  colnames(data)=colnames
  for (j in 1:ncols) {
    data[,j]=sample_column(nrows)
  }
  return(data)
}

reprex=make_reprex(510,15)
# desired expression 
reprex[sum(reprex[,1:5]!=0),] 

If you wish to subset the data as though in place, you'll need to make another assignment.如果您希望像就地一样对数据进行子集化,则需要进行另一项分配。

reprex=reprex[sum(reprex[,1:5]!=0),] 

I advise against this kind of in-place substitution.我建议不要进行这种就地替换。 There are some cases where it is necessary, but rarely as often as you might think.在某些情况下,这是必要的,但很少像您想象的那样频繁。

reason?原因?

If you avoid destructive subsetting, and something goes wrong, you can easily return to the data frame as you originally loaded it.如果您避免破坏性子集,并且出现问题,您可以轻松地返回到最初加载的数据框。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM