简体   繁体   English

如果指定列中有“,”,则从数据框中删除行

[英]Remove the rows from the data frame if there is “,” in specified column

I would like to remove some rows from my data.frame. 我想从data.frame中删除一些行。 Let's start with example: 让我们从示例开始:

> tbl_EOD[20:40,]
   AGI.identifier              location_subacon
20    AT1G11360.4                       plastid
21    AT1G11650.2                       nucleus
22    AT1G11930.2                       cytosol
23    AT1G12010.1                    peroxisome
24    AT1G12080.2                       nucleus
25    AT1G12140.1               plasma membrane
26    AT1G12250.2               cytosol,nucleus ## row which I want to delete
27    AT1G12520.2                    peroxisome
28    AT1G13320.2                       cytosol
29    AT1G13930.3                       nucleus
30    AT1G14250.1 extracellular,plasma membrane ## row which I want to delete
31    AT1G15340.2                       nucleus
32    AT1G15470.1                       cytosol
33    AT1G16460.4                       cytosol
34    AT1G16820.2         cytosol,mitochondrion ## row which I want to delete
35    AT1G17150.1                 extracellular
36    AT1G17330.1                       cytosol
37    AT1G17470.2                       cytosol
38    AT1G17890.3                       cytosol
39    AT1G19730.1                       cytosol
40    AT1G20060.1                       nucleus

As I show on the example I just want to remove those rows which have two localizations separated by coma. 正如我在示例中所示,我只想删除那些具有两个由逗号分隔的本地化的行。

You can use grepl for this. 你可以使用grepl

tbl_EOD <- tbl_EOD[!grepl(",", tbl_EOD$location_subacon), ]

Explanation: grepl searches a character vector, call it S , for a pattern. 说明: grepl搜索字符向量,将其称为S ,以获取模式。 It returns a vector of the same length with TRUE if the corresponding element of S contains the patter, and FALSE otherwise. 如果S的对应元素包含模式,则返回相同长度的向量,其中为TRUE ,否则为FALSE In this case, the pattern is "," . 在这种情况下,模式是"," What you really want are the rows where there aren't commas, so you can tack on the "!" 你真正想要的是那些没有逗号的行,所以你可以使用“!” in front of grepl , which turns all values that are TRUE into FALSE and vice versa. grepl ,将所有值为TRUE值变为FALSE ,反之亦然。


If you want to keep all rows, but remove everything after the commas, you could use gsub . 如果你想保留所有行,但删除逗号之后的所有内容,你可以使用gsub

tbl_EOD$location_subacon <- gsub("(.*),.*", "\\1", tbl_EOD$location_subacon)

Explanation: gsub searches a character vector S for a pattern and replaces every occurrence of that pattern with the replacement. 说明: gsub在字符向量S搜索模式,并用替换替换该模式的每次出现。 In this case, the pattern is "(.*),.*" and the replacement is "\\\\1" . 在这种情况下,模式是"(.*),.*" ,替换是"\\\\1" The pattern is a regular expression that says something like "(zero or more characters) followed by a comma followed by zero or more characters" . 该模式是一个正则表达式,表示"(zero or more characters) followed by a comma followed by zero or more characters" Here, the parentheses capture the enclosed portion so that you can refer to it later. 这里,括号捕获封闭的部分,以便您以后可以参考它。 The replacement is simply the captured portion in this case, and it's denoted by \\\\1 . 在这种情况下,替换只是捕获的部分,它由\\\\1表示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM