简体   繁体   中英

How to subset a data frame by removing all rows from columns with a given string, and value less than X?

I am trying to subset a data frame in R... I would like to remove all rows where the value is >0 in all columns with the word 'Blank' in it.

Ex input) df

ID OTU1 OTU2 Blank1 Blank2 1 5 0 0 2 2 3 3 0 0 3 0 9 5 0 4 2 0 0 0

Desired output

ID OTU1 OTU2 Blank1 Blank2 2 3 3 0 0 4 2 0 0 0

I can do this individually by column with df2=subset(df, subset=!(Blank1>0 | Blank2>0))

I would like to change this so that it searches for all columns with the word 'Blank' in it, then removes rows where in those columns there is a value greater than 0.

I am trying df2=subset(df, subset=!((grepl("Blank",colnames(df)))>0)) but it does not work correctly.

Consider using dput next time to provide a reproducible example. Given that, this code should work (but I did not test it):

df <- df[rowSums(df[, grepl("Blank", colnames(df))]) > 0, ]

Edit: This does the exact opposite of what was asked. Here you go:

df[rowSums(df[, grepl("Blank", colnames(df))]) == 0, ]

这也应该工作:

df[!(apply(df[,c("Blank1","Blank2")] > 0,1,sum) > 0),]

Using the grepl function I would use the following code:

df2 <- df[apply(df[,grepl("Blank",names(df))],1,sum)==0,]

To break that up...

apply applies a function by either row or column, the 1 argument tells it to do rows. By applying sum I'll get something non-zero if there's any zeros. If there's a possibility of negative values change sum to function(x){sum(abs(x))} which will instead take the absolute value of the cell before summing them.

Once I've applied the sum function I just check to grab only those values which are 0 :)

We wrap all that into the row argument for 'df' and we get returned only those rows that we want.

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM