简体   繁体   中英

Filter & Subset if a String Contains Certain Characters (in R)

I currently wish to divide a data frame into subsets for training/testing. In the data frame there are columns that contain different items, and some contain sub-items like (Aisle01, Aisle02, etc.) I am getting tripped up by filtering out a partial string in multiple columns.

Data sample:

Column1   Column2  Column3

Wall01    Wall04   45.6
Wall04    Aisle02  65.7
Aisle06   Wall01   45.0
Aisle01   Wall01   33.3
Wall01    Wall04   21.1

If my data frame (x) contains two columns that within them contain multiple version of "Aisle", I wish to filter out everything from both columns that contains "Aisle". Wondering if the line below is somewhat on the right track?

filter(x, column1 & column2 == grep(x$column1 & x$column2, "Aisle"))

Desired result:

Column1  Column2  Column3

Wall04   Aisle02  65.7
Aisle06  Wall01   45.0
Aisle01  Wall01   33.3

Thank you in advance.

The easiest solution I can see would be this:

x <- x[grepl("Aisle", x[["column1"]]) | grepl("Aisle", x[["column2"]]), ]

Using grepl instead of grep produces a logical so you can use the | operation to select your rows. Also I just wanted to quickly go over a few places in your code that may be giving you trouble.

  1. The x$column1 & x$column2 in the beginning of your grep statement means that the function will try to run the & operation pairwise on each of the entries in column1 and column2 . Since these are characters and not logicals, this will produce some weird results.

  2. In grep the pattern you are trying to match comes before the string you are trying to match it to, so it should be grep("Aisle", columnValue) not the other way around. Running ?functionName will give you the information about the function so you don't have to try and figure that out from memory.

  3. filter is a function for time series ( ts ) objects, not data frames. I am surprised you didn't get an error by using it in this way.

Best of luck. Comment if you want anything clarified.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM