从r data.frame中的每个因子级别删除前N行

Question

With the dat below. 使用下面的dat 。 How can I make a new dataframe subset that includes all values except the first five rows for each IndID? 如何创建包含除每个IndID的前五行之外的所有值的新数据框子集？ Said differently I want new data frame with the first 5 rows for each IndID excluded. 换句话说，我想要排除每个IndID的前5行的新数据帧。

set.seed(123)
dat <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD"), each  = 10),
                  Number = sample(1:100,40))

I have seen a number of SO posts that select data, but I am not sure how to remove as mentioned above. 我已经看过一些选择数据的SO帖子，但我不知道如何删除如上所述。

Answer 1

We can use dplyr 's slice() functionality: 我们可以使用dplyr的slice()功能：

dat %>% 
    group_by(IndID) %>% 
    slice(6:n())

Answer 2

In base R, tapply() is handy when used on a sequence of row numbers with tail() . 在基数R中， tapply()在使用tail()的行号序列上使用时非常方便。

idx <- unlist(tapply(1:nrow(dat), dat$IndID, tail, -5))
dat[idx, ]

Note that this will be more efficient with use.names=FALSE in unlist() . 请注意，在unlist() use.names=FALSE会更有效。

With data.table , you can do the following with tail() . 使用data.table ，您可以使用tail()执行以下操作。

library(data.table)

setDT(dat)[dat[, tail(.I, -5), by=IndID]$V1]

Answer 3

If the data is sorted and you are guaranteed to have at least n rows per group... 如果数据已排序，并且保证每组至少有n行......

n = 5
w = match(unique(dat$IndID), dat$IndID)
dat[- (rep(w, each = n) + 1:n - 1L), ]

Answer 4

可以使用split碱R分裂dat通过IndID ，除去前5行每个子组的，然后rbind后它。

do.call(rbind, lapply(split(dat,as.character(dat$IndID)), function(x) x[-(1:5),]))

从r data.frame中的每个因子级别删除前N行

问题描述

4 个解决方案

解决方案1
19 已采纳 2017-02-14 23:16:50

解决方案2
7 2017-02-15 00:33:50

解决方案3
6 2017-02-15 01:01:57

解决方案4
3 2017-02-14 23:24:33

从r data.frame中的每个因子级别删除前N行

问题描述

4 个解决方案

解决方案1 19 已采纳 2017-02-14 23:16:50

解决方案2 7 2017-02-15 00:33:50

解决方案3 6 2017-02-15 01:01:57

解决方案4 3 2017-02-14 23:24:33

解决方案1
19 已采纳 2017-02-14 23:16:50

解决方案2
7 2017-02-15 00:33:50

解决方案3
6 2017-02-15 01:01:57

解决方案4
3 2017-02-14 23:24:33