如何从R中的数据框中删除“很多”行

Question

I tried all the similar posts but none of the answers seemed to work for me. 我尝试了所有类似的帖子，但似乎没有答案对我有用。 I want to delete 8500+ rows (by rowname only) from a dataframe with 27,000+. 我想从27,000+的数据框中删除8500+行（仅按行名）。 The other columns are completely different, but the smaller dataset was derived from the larger one, and just looking for names shows me that whatever I look for from smaller df it is present in larger df. 其他列完全不同，但是较小的数据集是从较大的数据集派生而来的，仅查找名称就可以表明，无论我从较小的df寻找什么，它都存在于较大的df中。 I could of course do this manually (busy work for sure!), but seems like there should be a simple computational answer. 我当然可以手动执行此操作（肯定是忙碌的工作！），但是似乎应该有一个简单的计算答案。

I have tried: 我努力了：

fordel<-df2[1,]

df3<-df1[!rownames(df1) %in% fordel

l1<- as.vector(df2[1,])

df3<- df1[1-c(l1),]

and lots of other crazy ideas! 还有很多其他疯狂的想法！ Here is a smallish example: df1: 这是一个小例子：df1：

Ent_gene_id clone57_RNA clone43_RNA_2   clone67_RNA clone55_RNA
ENSMUSG00000000001.4    10634   6954    6835    6510
ENSMUSG00000000003.15   0       0       0       0
ENSMUSG00000000028.14   559     1570    807     1171
ENSMUSG00000000031.15   5748    174     4103    146
ENSMUSG00000000037.16   37      194     49      96
ENSMUSG00000000049.11   0       3       1       0
ENSMUSG00000000056.7    1157    1125    806     947
ENSMUSG00000000058.6    75      304     123     169
ENSMUSG00000000078.6    4012    4391    5637    3854
ENSMUSG00000000085.16   381     560     482     368
ENSMUSG00000000088.6    2667    4777    3483    3450
ENSMUSG00000000093.6    3       48      41      22
ENSMUSG00000000094.12   23      201     102     192

df2 df2

structure(list(base_mean = c(7962.408875, 947.1240794, 43.76698418 ), log2foldchange = c(-0.363434063, -0.137403759, -0.236463207 ), lfcSE = c(0.096816743, 0.059823215, 0.404929452), stat = c(-3.753834854, -2.296830066, -0.583961493)), row.names = c("ENSMUSG00000000001.4", "ENSMUSG00000000056.7", "ENSMUSG00000000093.6"), class = "data.frame")

I want to delete from df1 the rows corresponding to the rownames in df2. 我想从df1中删除与df2中的行名相对应的行。 Tried to format it, but seems no longer formatted... oh well.... 试图格式化，但似乎不再格式化了。

Suggestions really appreciated! 建议真的很感激！

Answer 1

You mentioned row names but your data does not include that, so I'll assume that they really don't matter (or exist). 您提到了行名，但您的数据不包括该行名，因此我假设它们确实无关紧要（或存在）。 Also, your df2 has more column headers than columns, not sure what's going on there ... so I'll ignore it. 另外，您的df2列标题多于列，不确定发生了什么……所以我将忽略它。

Data 数据

df1 <- structure(list(Ent_gene_id = c("ENSMUSG00000000001.4", "ENSMUSG00000000003.15", 
"ENSMUSG00000000028.14", "ENSMUSG00000000031.15", "ENSMUSG00000000037.16", 
"ENSMUSG00000000049.11", "ENSMUSG00000000056.7", "ENSMUSG00000000058.6", 
"ENSMUSG00000000078.6", "ENSMUSG00000000085.16", "ENSMUSG00000000088.6", 
"ENSMUSG00000000093.6", "ENSMUSG00000000094.12"), clone57_RNA = c(10634L, 
0L, 559L, 5748L, 37L, 0L, 1157L, 75L, 4012L, 381L, 2667L, 3L, 
23L), clone43_RNA_2 = c(6954L, 0L, 1570L, 174L, 194L, 3L, 1125L, 
304L, 4391L, 560L, 4777L, 48L, 201L), clone67_RNA = c(6835L, 
0L, 807L, 4103L, 49L, 1L, 806L, 123L, 5637L, 482L, 3483L, 41L, 
102L), clone55_RNA = c(6510L, 0L, 1171L, 146L, 96L, 0L, 947L, 
169L, 3854L, 368L, 3450L, 22L, 192L)), class = "data.frame", row.names = c(NA, 
-13L))
df2 <- structure(list(Ent_gene_id = c("ENSMUSG00000000001.4", "ENSMUSG00000000056.7", 
"ENSMUSG00000000093.6"), base_mean = c(7962.408875, 947.1240794, 
43.76698418), log2foldchange = c(-0.36343406, -0.137403759, -0.236463207
), pvalue = c(0.00017415, 0.021628466, 0.55924622)), class = "data.frame", row.names = c(NA, 
-3L))

Base 基础

df1[!df1$Ent_gene_id %in% df2$Ent_gene_id,]
#              Ent_gene_id clone57_RNA clone43_RNA_2 clone67_RNA clone55_RNA
# 2  ENSMUSG00000000003.15           0             0           0           0
# 3  ENSMUSG00000000028.14         559          1570         807        1171
# 4  ENSMUSG00000000031.15        5748           174        4103         146
# 5  ENSMUSG00000000037.16          37           194          49          96
# 6  ENSMUSG00000000049.11           0             3           1           0
# 8   ENSMUSG00000000058.6          75           304         123         169
# 9   ENSMUSG00000000078.6        4012          4391        5637        3854
# 10 ENSMUSG00000000085.16         381           560         482         368
# 11  ENSMUSG00000000088.6        2667          4777        3483        3450
# 13 ENSMUSG00000000094.12          23           201         102         192

dplyr dplyr

dplyr::anti_join(df1, df2, by = "Ent_gene_id")
#              Ent_gene_id clone57_RNA clone43_RNA_2 clone67_RNA clone55_RNA
# 1  ENSMUSG00000000003.15           0             0           0           0
# 2  ENSMUSG00000000028.14         559          1570         807        1171
# 3  ENSMUSG00000000031.15        5748           174        4103         146
# 4  ENSMUSG00000000037.16          37           194          49          96
# 5  ENSMUSG00000000049.11           0             3           1           0
# 6   ENSMUSG00000000058.6          75           304         123         169
# 7   ENSMUSG00000000078.6        4012          4391        5637        3854
# 8  ENSMUSG00000000085.16         381           560         482         368
# 9   ENSMUSG00000000088.6        2667          4777        3483        3450
# 10 ENSMUSG00000000094.12          23           201         102         192

Edit : same thing but with row names: 编辑：同一件事，但具有行名：

# update my df1 to change Ent_gene_id from a column to rownames
rownames(df1) <- df1$Ent_gene_id
df1$Ent_gene_id <- NULL
# use your updated df2 (from dput)
# df2 <- structure(...)
df1[ !rownames(df1) %in% rownames(df2), ]
#                       clone57_RNA clone43_RNA_2 clone67_RNA clone55_RNA
# ENSMUSG00000000003.15           0             0           0           0
# ENSMUSG00000000028.14         559          1570         807        1171
# ENSMUSG00000000031.15        5748           174        4103         146
# ENSMUSG00000000037.16          37           194          49          96
# ENSMUSG00000000049.11           0             3           1           0
# ENSMUSG00000000058.6           75           304         123         169
# ENSMUSG00000000078.6         4012          4391        5637        3854
# ENSMUSG00000000085.16         381           560         482         368
# ENSMUSG00000000088.6         2667          4777        3483        3450
# ENSMUSG00000000094.12          23           201         102         192

如何从R中的数据框中删除“很多”行

问题描述

1 个解决方案

解决方案1
0 2019-08-22 19:21:13

Data 数据

Base 基础

dplyr dplyr

如何从R中的数据框中删除“很多”行

问题描述

1 个解决方案

解决方案1 0 2019-08-22 19:21:13

Data 数据

Base 基础

dplyr dplyr

解决方案1
0 2019-08-22 19:21:13