简体   繁体   English

是否有任何 function 从 R 的数据框中提取几行(不连续)?

[英]is there any function to extract several rows (not continuous) from data frame in R?

I am trying to extract several rows (not one after the other) from data frame.我正在尝试从数据框中提取几行(不是一个接一个)。 The range of rows that should be removed is in another data frame.应删除的行范围在另一个数据框中。

I have tried to extract the rows with for loop, but unfortunately only the last range is removed.我试图用 for 循环提取行,但不幸的是只删除了最后一个范围。

this is the code line I used (inside of 'for' loop, while 'i' is the variable):这是我使用的代码行(在'for'循环内,而'i'是变量):

new_df <- main_df[-(erase_df$starts[i]:erase_df$stops[i]),]

for example: this is the data frame that I want to change (main_df)例如:这是我要更改的数据框(main_df)

> main_df
   v1  v2     v3
1   1 bla blabla
2   2 bla blabla
3   3 bla blabla
4   4 bla blabla
5   5 bla blabla
6   6 bla blabla
7   7 bla blabla
8   8 bla blabla
9   9 bla blabla
10 10 bla blabla
11 11 bla blabla
12 12 bla blabla
13 13 bla blabla
14 14 bla blabla
15 15 bla blabla

this is the data frame (erase_df) that include the ranges of rows that I want to remove ('starts' vector indicate the first range of rows should be removed and 'stops' vector indicates the last row should be removed in that range)这是包含我要删除的行范围的数据框(erase_df)(“starts”向量表示应删除第一个行范围,“stops”向量表示应删除该范围内的最后一行)

> erase_df
  starts stops
1      3     5
2      9    10
3     12    14

so the new data frame should look like this:所以新的数据框应该是这样的:

> new_df
   v1  v2     v3
1   1 bla blabla
2   2 bla blabla
6   6 bla blabla
7   7 bla blabla
8   8 bla blabla
11 11 bla blabla
15 15 bla blabla

I expected output looks like (new_df) as I mentioned above, but instead of it, only the last range from erase_df has removed (starts = 12, stops = 14)我预计 output 看起来像我上面提到的 (new_df) ,但不是它,而是删除了 erase_df 的最后一个范围(开始 = 12,停止 = 14)

If you Map the seq function over erase_df to create a sequence of rows to remove for each row, then unlist them all into a single vector, you can subset main_df with the negative of that vector to remove the rows in the given ranges.如果你Map seq function over erase_df为每一行创建要删除的行序列,然后将它们全部unlist到一个向量中,你可以使用该向量的负值子集main_df以删除给定范围内的行。

remove <- unlist(Map(seq, erase_df[[1]], erase_df[[2]]))

main_df[-remove,]
#    v1  v2     v3
# 1:  1 bla blabla
# 2:  2 bla blabla
# 3:  6 bla blabla
# 4:  7 bla blabla
# 5:  8 bla blabla
# 6: 11 bla blabla
# 7: 15 bla blabla

Or, for a more complex option that could be more efficient with larger data (haven't tested, just guesssing)或者,对于更复杂的选项,可以更有效地处理更大的数据(尚未测试,只是猜测)

library(data.table)
setDT(main_df)
setDT(erase_df)

setkey(erase_df, starts, stops)
main_df[, v0 := v1]
for_anti <- 
  foverlaps(main_df, erase_df, by.x = c('v0', 'v1'), type = 'within',
            nomatch = NULL)

main_df[!for_anti, on = .(v1)]
#    v1  v2     v3 v0
# 1:  1 bla blabla  1
# 2:  2 bla blabla  2
# 3:  6 bla blabla  6
# 4:  7 bla blabla  7
# 5:  8 bla blabla  8
# 6: 11 bla blabla 11
# 7: 15 bla blabla 15

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM