[英]How to remove certain rows in a data.frame?
I have a data.frame which looks like this : 我有一个data.frame看起来像这样:
20021 K08975 K09735 0.929
20022 K08979 K09735 0.934
20023 K09140 K09735 0.901
20024 K09142 K09735 0.938
20025 K09152 K09735 0.947
20026 K09482 K09735 0.919
20027 K09716 K09735 0.944
20028 K09723 K09735 0.949
20029 K09726 K09735 0.915
20030 K06875 K09736 0.905
20031 K09149 K09736 0.901
20032 K09721 K09736 0.903
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906
20035 K00952 K09738 0.904
20036 K01622 K09738 0.907
20037 K06875 K09738 0.912
20038 K06963 K09738 0.923
20039 K07060 K09738 0.934
There are three columns : var1
, var2
& corr
一共有三列: var1
, var2
和corr
var1
& var2
can take the values "KOXXXX" or "OTUXXXX" . var1
和var2
可以使用值“ KOXXXX”或“ OTUXXXX”。
I would like to keep the rows where var1
and var2
are different, I mean only the rows where appears KOXXXX OTUXXXX
or OTUXXXX KOXXXX
我想保留var1
和var2
不同的行,我的意思是仅出现KOXXXX OTUXXXX
或OTUXXXX KOXXXX
Maybe this is naive, but could help: 也许这很幼稚,但可以帮助您:
# here you take only the rows where the first two character of var1 and var2
# are different
df[substr(df$var1,1,2) != substr(df$var2,1,2),]
var1 var2 corr
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906
Probably, something like 大概是这样
subset(df, grepl("^K0", var1) & grepl("^OTU", var2) |
grepl("^OTU", var1) & grepl("^K0", var2))
# var1 var2 corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906
Or using startsWith
或使用startsWith
subset(df, startsWith(var1, "K0") & startsWith(var2, "OTU") |
startsWith(var1, "OTU") & startsWith(var2, "K0"))
Or using dplyr
we can use grepl
/ str_detect
with filter
或者使用dplyr
我们可以将grepl
/ str_detect
与filter
str_detect
使用
library(dplyr)
library(stringr)
df %>%
filter(str_detect(var1, "^K0") & str_detect(var2, "^OTU") |
str_detect(var1, "^OTU") & str_detect(var2, "^K0"))
data 数据
df <- structure(list(var1 = c("K08975", "K08979", "K09140", "K09142",
"K09152", "K09482", "K09716", "K09723", "K09726", "K06875", "K09149",
"K09721", "OTU0001", "OTU0095", "K00952", "K01622", "K06875",
"K06963", "K07060"), var2 = c("K09735", "K09735", "K09735", "K09735",
"K09735", "K09735", "K09735", "K09735", "K09735", "K09736", "K09736",
"K09736", "K09738", "K09738", "K09738", "K09738", "K09738", "K09738",
"K09738"), corr = c(0.929, 0.934, 0.901, 0.938, 0.947, 0.919,
0.944, 0.949, 0.915, 0.905, 0.901, 0.903, 0.908, 0.906, 0.904,
0.907, 0.912, 0.923, 0.934)), row.names = 20021:20039, class =
"data.frame")
We can also do this in base R
as 我们也可以在base R
做
df[Reduce(`!=`, lapply(df[1:2], substr, 1, 2)),]
# var1 var2 corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.