简体   繁体   English

如何从R中的数据框中删除“很多”行

[英]How can I delete “a lot” of rows from a dataframe in r

I tried all the similar posts but none of the answers seemed to work for me. 我尝试了所有类似的帖子,但似乎没有答案对我有用。 I want to delete 8500+ rows (by rowname only) from a dataframe with 27,000+. 我想从27,000+的数据框中删除8500+行(仅按行名)。 The other columns are completely different, but the smaller dataset was derived from the larger one, and just looking for names shows me that whatever I look for from smaller df it is present in larger df. 其他列完全不同,但是较小的数据集是从较大的数据集派生而来的,仅查找名称就可以表明,无论我从较小的df寻找什么,它都存在于较大的df中。 I could of course do this manually (busy work for sure!), but seems like there should be a simple computational answer. 我当然可以手动执行此操作(肯定是忙碌的工作!),但是似乎应该有一个简单的计算答案。

I have tried: 我努力了:

fordel<-df2[1,]

df3<-df1[!rownames(df1) %in% fordel

l1<- as.vector(df2[1,])

df3<- df1[1-c(l1),]

and lots of other crazy ideas! 还有很多其他疯狂的想法! Here is a smallish example: df1: 这是一个小例子:df1:

Ent_gene_id clone57_RNA clone43_RNA_2   clone67_RNA clone55_RNA
ENSMUSG00000000001.4    10634   6954    6835    6510
ENSMUSG00000000003.15   0       0       0       0
ENSMUSG00000000028.14   559     1570    807     1171
ENSMUSG00000000031.15   5748    174     4103    146
ENSMUSG00000000037.16   37      194     49      96
ENSMUSG00000000049.11   0       3       1       0
ENSMUSG00000000056.7    1157    1125    806     947
ENSMUSG00000000058.6    75      304     123     169
ENSMUSG00000000078.6    4012    4391    5637    3854
ENSMUSG00000000085.16   381     560     482     368
ENSMUSG00000000088.6    2667    4777    3483    3450
ENSMUSG00000000093.6    3       48      41      22
ENSMUSG00000000094.12   23      201     102     192

df2 df2

structure(list(base_mean = c(7962.408875, 947.1240794, 43.76698418 ), log2foldchange = c(-0.363434063, -0.137403759, -0.236463207 ), lfcSE = c(0.096816743, 0.059823215, 0.404929452), stat = c(-3.753834854, -2.296830066, -0.583961493)), row.names = c("ENSMUSG00000000001.4", "ENSMUSG00000000056.7", "ENSMUSG00000000093.6"), class = "data.frame")

I want to delete from df1 the rows corresponding to the rownames in df2. 我想从df1中删除与df2中的行名相对应的行。 Tried to format it, but seems no longer formatted... oh well.... 试图格式化,但似乎不再格式化了。

Suggestions really appreciated! 建议真的很感激!

You mentioned row names but your data does not include that, so I'll assume that they really don't matter (or exist). 您提到了行名,但您的数据不包括该行名,因此我假设它们确实无关紧要(或存在)。 Also, your df2 has more column headers than columns, not sure what's going on there ... so I'll ignore it. 另外,您的df2列标题多于列,不确定发生了什么……所以我将忽略它。

Data 数据

df1 <- structure(list(Ent_gene_id = c("ENSMUSG00000000001.4", "ENSMUSG00000000003.15", 
"ENSMUSG00000000028.14", "ENSMUSG00000000031.15", "ENSMUSG00000000037.16", 
"ENSMUSG00000000049.11", "ENSMUSG00000000056.7", "ENSMUSG00000000058.6", 
"ENSMUSG00000000078.6", "ENSMUSG00000000085.16", "ENSMUSG00000000088.6", 
"ENSMUSG00000000093.6", "ENSMUSG00000000094.12"), clone57_RNA = c(10634L, 
0L, 559L, 5748L, 37L, 0L, 1157L, 75L, 4012L, 381L, 2667L, 3L, 
23L), clone43_RNA_2 = c(6954L, 0L, 1570L, 174L, 194L, 3L, 1125L, 
304L, 4391L, 560L, 4777L, 48L, 201L), clone67_RNA = c(6835L, 
0L, 807L, 4103L, 49L, 1L, 806L, 123L, 5637L, 482L, 3483L, 41L, 
102L), clone55_RNA = c(6510L, 0L, 1171L, 146L, 96L, 0L, 947L, 
169L, 3854L, 368L, 3450L, 22L, 192L)), class = "data.frame", row.names = c(NA, 
-13L))
df2 <- structure(list(Ent_gene_id = c("ENSMUSG00000000001.4", "ENSMUSG00000000056.7", 
"ENSMUSG00000000093.6"), base_mean = c(7962.408875, 947.1240794, 
43.76698418), log2foldchange = c(-0.36343406, -0.137403759, -0.236463207
), pvalue = c(0.00017415, 0.021628466, 0.55924622)), class = "data.frame", row.names = c(NA, 
-3L))

Base 基础

df1[!df1$Ent_gene_id %in% df2$Ent_gene_id,]
#              Ent_gene_id clone57_RNA clone43_RNA_2 clone67_RNA clone55_RNA
# 2  ENSMUSG00000000003.15           0             0           0           0
# 3  ENSMUSG00000000028.14         559          1570         807        1171
# 4  ENSMUSG00000000031.15        5748           174        4103         146
# 5  ENSMUSG00000000037.16          37           194          49          96
# 6  ENSMUSG00000000049.11           0             3           1           0
# 8   ENSMUSG00000000058.6          75           304         123         169
# 9   ENSMUSG00000000078.6        4012          4391        5637        3854
# 10 ENSMUSG00000000085.16         381           560         482         368
# 11  ENSMUSG00000000088.6        2667          4777        3483        3450
# 13 ENSMUSG00000000094.12          23           201         102         192

dplyr dplyr

dplyr::anti_join(df1, df2, by = "Ent_gene_id")
#              Ent_gene_id clone57_RNA clone43_RNA_2 clone67_RNA clone55_RNA
# 1  ENSMUSG00000000003.15           0             0           0           0
# 2  ENSMUSG00000000028.14         559          1570         807        1171
# 3  ENSMUSG00000000031.15        5748           174        4103         146
# 4  ENSMUSG00000000037.16          37           194          49          96
# 5  ENSMUSG00000000049.11           0             3           1           0
# 6   ENSMUSG00000000058.6          75           304         123         169
# 7   ENSMUSG00000000078.6        4012          4391        5637        3854
# 8  ENSMUSG00000000085.16         381           560         482         368
# 9   ENSMUSG00000000088.6        2667          4777        3483        3450
# 10 ENSMUSG00000000094.12          23           201         102         192

Edit : same thing but with row names: 编辑 :同一件事,但具有行名:

# update my df1 to change Ent_gene_id from a column to rownames
rownames(df1) <- df1$Ent_gene_id
df1$Ent_gene_id <- NULL
# use your updated df2 (from dput)
# df2 <- structure(...)
df1[ !rownames(df1) %in% rownames(df2), ]
#                       clone57_RNA clone43_RNA_2 clone67_RNA clone55_RNA
# ENSMUSG00000000003.15           0             0           0           0
# ENSMUSG00000000028.14         559          1570         807        1171
# ENSMUSG00000000031.15        5748           174        4103         146
# ENSMUSG00000000037.16          37           194          49          96
# ENSMUSG00000000049.11           0             3           1           0
# ENSMUSG00000000058.6           75           304         123         169
# ENSMUSG00000000078.6         4012          4391        5637        3854
# ENSMUSG00000000085.16         381           560         482         368
# ENSMUSG00000000088.6         2667          4777        3483        3450
# ENSMUSG00000000094.12          23           201         102         192

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:如何将数据框中的行分组,满足条件的ID行,然后删除该组的先前行? - R: How can I group rows in a dataframe, ID rows meeting a condition, then delete prior rows for the group? 如何从 R 中的 dataframe 中删除带有 inf 的行? - How can I remove rows with inf from my dataframe in R? 如何删除数据框中R中的某些行 - How to delete certain rows in R within dataframe 如何根据分组变量从R中的数据帧中删除第n行? - How can I delete every n-th row from a dataframe in R, according to grouping variable? 如何基于R中的数据帧从ODBC数据库中删除记录 - How can I delete records from an ODBC database based on a dataframe in R 如何从R数据框中的特定行的上方和下方提取行? - How can I extract rows from above and below a specific row in an R dataframe? R:如何从数据帧的两行中找到元素的交集? - R: How can I find the intersection of elements from two rows of a dataframe? 当 R 中的数据帧列表具有不同的行数时,如何创建单个 dataframe? - How can I create a single dataframe from a list of dataframes in R, when they have different number of rows? 如何粘贴 dataframe 行中的文本,仅保留 R 中的唯一值 - How can I paste text from dataframe rows, keeping only unique values in R 如何在 R 中仅制作 dataframe 的某些行的直方图 - How can I make a histogram of only certain rows of a dataframe in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM