[英]Remove the rows from the data frame if there is “,” in specified column
I would like to remove some rows from my data.frame. 我想从data.frame中删除一些行。 Let's start with example: 让我们从示例开始:
> tbl_EOD[20:40,]
AGI.identifier location_subacon
20 AT1G11360.4 plastid
21 AT1G11650.2 nucleus
22 AT1G11930.2 cytosol
23 AT1G12010.1 peroxisome
24 AT1G12080.2 nucleus
25 AT1G12140.1 plasma membrane
26 AT1G12250.2 cytosol,nucleus ## row which I want to delete
27 AT1G12520.2 peroxisome
28 AT1G13320.2 cytosol
29 AT1G13930.3 nucleus
30 AT1G14250.1 extracellular,plasma membrane ## row which I want to delete
31 AT1G15340.2 nucleus
32 AT1G15470.1 cytosol
33 AT1G16460.4 cytosol
34 AT1G16820.2 cytosol,mitochondrion ## row which I want to delete
35 AT1G17150.1 extracellular
36 AT1G17330.1 cytosol
37 AT1G17470.2 cytosol
38 AT1G17890.3 cytosol
39 AT1G19730.1 cytosol
40 AT1G20060.1 nucleus
As I show on the example I just want to remove those rows which have two localizations separated by coma. 正如我在示例中所示,我只想删除那些具有两个由逗号分隔的本地化的行。
You can use grepl
for this. 你可以使用grepl
。
tbl_EOD <- tbl_EOD[!grepl(",", tbl_EOD$location_subacon), ]
Explanation: grepl
searches a character vector, call it S
, for a pattern. 说明: grepl
搜索字符向量,将其称为S
,以获取模式。 It returns a vector of the same length with TRUE
if the corresponding element of S
contains the patter, and FALSE
otherwise. 如果S
的对应元素包含模式,则返回相同长度的向量,其中为TRUE
,否则为FALSE
。 In this case, the pattern is ","
. 在这种情况下,模式是","
。 What you really want are the rows where there aren't commas, so you can tack on the "!" 你真正想要的是那些没有逗号的行,所以你可以使用“!” in front of grepl
, which turns all values that are TRUE
into FALSE
and vice versa. 在grepl
,将所有值为TRUE
值变为FALSE
,反之亦然。
If you want to keep all rows, but remove everything after the commas, you could use gsub
. 如果你想保留所有行,但删除逗号之后的所有内容,你可以使用gsub
。
tbl_EOD$location_subacon <- gsub("(.*),.*", "\\1", tbl_EOD$location_subacon)
Explanation: gsub
searches a character vector S
for a pattern and replaces every occurrence of that pattern with the replacement. 说明: gsub
在字符向量S
搜索模式,并用替换替换该模式的每次出现。 In this case, the pattern is "(.*),.*"
and the replacement is "\\\\1"
. 在这种情况下,模式是"(.*),.*"
,替换是"\\\\1"
。 The pattern is a regular expression that says something like "(zero or more characters) followed by a comma followed by zero or more characters"
. 该模式是一个正则表达式,表示"(zero or more characters) followed by a comma followed by zero or more characters"
。 Here, the parentheses capture the enclosed portion so that you can refer to it later. 这里,括号捕获封闭的部分,以便您以后可以参考它。 The replacement is simply the captured portion in this case, and it's denoted by \\\\1
. 在这种情况下,替换只是捕获的部分,它由\\\\1
表示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.