简体   繁体   中英

Remove the rows from the data frame if there is “,” in specified column

I would like to remove some rows from my data.frame. Let's start with example:

> tbl_EOD[20:40,]
   AGI.identifier              location_subacon
20    AT1G11360.4                       plastid
21    AT1G11650.2                       nucleus
22    AT1G11930.2                       cytosol
23    AT1G12010.1                    peroxisome
24    AT1G12080.2                       nucleus
25    AT1G12140.1               plasma membrane
26    AT1G12250.2               cytosol,nucleus ## row which I want to delete
27    AT1G12520.2                    peroxisome
28    AT1G13320.2                       cytosol
29    AT1G13930.3                       nucleus
30    AT1G14250.1 extracellular,plasma membrane ## row which I want to delete
31    AT1G15340.2                       nucleus
32    AT1G15470.1                       cytosol
33    AT1G16460.4                       cytosol
34    AT1G16820.2         cytosol,mitochondrion ## row which I want to delete
35    AT1G17150.1                 extracellular
36    AT1G17330.1                       cytosol
37    AT1G17470.2                       cytosol
38    AT1G17890.3                       cytosol
39    AT1G19730.1                       cytosol
40    AT1G20060.1                       nucleus

As I show on the example I just want to remove those rows which have two localizations separated by coma.

You can use grepl for this.

tbl_EOD <- tbl_EOD[!grepl(",", tbl_EOD$location_subacon), ]

Explanation: grepl searches a character vector, call it S , for a pattern. It returns a vector of the same length with TRUE if the corresponding element of S contains the patter, and FALSE otherwise. In this case, the pattern is "," . What you really want are the rows where there aren't commas, so you can tack on the "!" in front of grepl , which turns all values that are TRUE into FALSE and vice versa.


If you want to keep all rows, but remove everything after the commas, you could use gsub .

tbl_EOD$location_subacon <- gsub("(.*),.*", "\\1", tbl_EOD$location_subacon)

Explanation: gsub searches a character vector S for a pattern and replaces every occurrence of that pattern with the replacement. In this case, the pattern is "(.*),.*" and the replacement is "\\\\1" . The pattern is a regular expression that says something like "(zero or more characters) followed by a comma followed by zero or more characters" . Here, the parentheses capture the enclosed portion so that you can refer to it later. The replacement is simply the captured portion in this case, and it's denoted by \\\\1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM