简体   繁体   English

如何删除data.frame中的某些行?

[英]How to remove certain rows in a data.frame?

I have a data.frame which looks like this : 我有一个data.frame看起来像这样:

20021  K08975 K09735 0.929
20022  K08979 K09735 0.934
20023  K09140 K09735 0.901
20024  K09142 K09735 0.938
20025  K09152 K09735 0.947
20026  K09482 K09735 0.919
20027  K09716 K09735 0.944
20028  K09723 K09735 0.949
20029  K09726 K09735 0.915
20030  K06875 K09736 0.905
20031  K09149 K09736 0.901
20032  K09721 K09736 0.903
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906
20035  K00952 K09738 0.904
20036  K01622 K09738 0.907
20037  K06875 K09738 0.912
20038  K06963 K09738 0.923
20039  K07060 K09738 0.934

There are three columns : var1 , var2 & corr 一共有三列: var1var2corr

var1 & var2 can take the values "KOXXXX" or "OTUXXXX" . var1var2可以使用值“ KOXXXX”或“ OTUXXXX”。

I would like to keep the rows where var1 and var2 are different, I mean only the rows where appears KOXXXX OTUXXXX or OTUXXXX KOXXXX 我想保留var1var2不同的行,我的意思是仅出现KOXXXX OTUXXXXOTUXXXX KOXXXX

Maybe this is naive, but could help: 也许这很幼稚,但可以帮助您:

# here you take only the rows where the first two character of var1 and var2
# are different
df[substr(df$var1,1,2) != substr(df$var2,1,2),]

         var1   var2  corr
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906

Probably, something like 大概是这样

subset(df, grepl("^K0", var1) & grepl("^OTU", var2) | 
           grepl("^OTU", var1) & grepl("^K0", var2))

#         var1   var2  corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906

Or using startsWith 或使用startsWith

subset(df, startsWith(var1, "K0") & startsWith(var2, "OTU") | 
           startsWith(var1, "OTU") & startsWith(var2, "K0"))

Or using dplyr we can use grepl / str_detect with filter 或者使用dplyr我们可以将grepl / str_detectfilter str_detect使用

library(dplyr)
library(stringr)

df %>%
  filter(str_detect(var1, "^K0") & str_detect(var2, "^OTU") | 
         str_detect(var1, "^OTU") & str_detect(var2, "^K0"))

data 数据

df <- structure(list(var1 = c("K08975", "K08979", "K09140", "K09142", 
"K09152", "K09482", "K09716", "K09723", "K09726", "K06875", "K09149", 
"K09721", "OTU0001", "OTU0095", "K00952", "K01622", "K06875", 
"K06963", "K07060"), var2 = c("K09735", "K09735", "K09735", "K09735", 
"K09735", "K09735", "K09735", "K09735", "K09735", "K09736", "K09736", 
"K09736", "K09738", "K09738", "K09738", "K09738", "K09738", "K09738", 
"K09738"), corr = c(0.929, 0.934, 0.901, 0.938, 0.947, 0.919, 
0.944, 0.949, 0.915, 0.905, 0.901, 0.903, 0.908, 0.906, 0.904, 
0.907, 0.912, 0.923, 0.934)), row.names = 20021:20039, class = 
"data.frame")

We can also do this in base R as 我们也可以在base R

df[Reduce(`!=`, lapply(df[1:2], substr, 1, 2)),]
#          var1   var2  corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM