简体   繁体   中英

Conditionally subset data frame in R

I have a data frame that has 10 columns and 510 rows. I'm trying to create a subset of it wherein if the row sum of the first 5 columns equals 0, the entire row is discarded. I've read posts on this site saying that you can't simply delete rows in R, so I've tried the following:

    data_sub <- data[!sum(data[, 1:5]==0), ]

However, data_sub ends up being a copy of data... and I'm really not sure why... Please advise, This data frame has no Inf or NaN values. only integers.

Try the following:

ind <- apply(data, 1, function(x) sum(x[1:5]) != 0)
data_sub <- data[ind, ]

or

data_sub <- data[rowSums(data[,1:5]) != 0, ]

This is what you want

reprex[sum(reprex[,1:5])!=0,] 

returns a data set meeting your criteria. This applies to arrays or data frames. Notice however, that the original HAS NOT CHANGED , nor should it.

In the future, consider including a reproducible example as the one in the code below. It doesn't have to be complex, but I think you'll find the act of making one will clarify your thinking. It does for me!

# emily example

# sample column as a 50% chance of being zero and 50 percent chance of random 
set.seed(152)
sample_column<-function(col_length) {
  ifelse(runif(col_length)<0.5,0,runif(col_length))
}

# produce some columns of random numbers.  Spike it with 
# zeroes to make the filter actually catch some.

make_reprex<-function(nrows,ncols) {
  id=1:nrows
  colnames=paste0('x',1:ncols)
  data=matrix(nrow=nrows,ncol=ncols)
  rownames(data)=id
  colnames(data)=colnames
  for (j in 1:ncols) {
    data[,j]=sample_column(nrows)
  }
  return(data)
}

reprex=make_reprex(510,15)
# desired expression 
reprex[sum(reprex[,1:5]!=0),] 

If you wish to subset the data as though in place, you'll need to make another assignment.

reprex=reprex[sum(reprex[,1:5]!=0),] 

I advise against this kind of in-place substitution. There are some cases where it is necessary, but rarely as often as you might think.

reason?

If you avoid destructive subsetting, and something goes wrong, you can easily return to the data frame as you originally loaded it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM