从数据框中删除连续的重复项

Question

I have a data frame that I want to remove duplicates that are consecutive (in base). 我有一个数据框，我想删除连续的重复（基础）。 I know rle may be helpful here but can't think of how to use it. 我知道rle在这里可能会有所帮助但却无法想到如何使用它。 The example output will help to illuminate what I'm asking for. 示例输出将有助于阐明我的要求。

Generate sample data: 生成样本数据：

set.seed(12)
samps <- sample(1:5, 20, T)
dat <- data.frame(v1=LETTERS[samps], v2=month.abb[samps])
dat[10, 2] <- "Mar"

Sample data: 样本数据：

   v1  v2
1   A Jan
2   E May
3   E May
4   B Feb
5   A Jan
6   A Jan
7   A Jan
8   D Apr
9   A Jan
10  A Mar
11  B Feb
12  E May
13  B Feb
14  B Feb
15  B Feb
16  C Mar
17  C Mar
18  C Mar
19  D Apr
20  A Jan

Desired outcome: 期望的结果：

   v1  v2
1   A Jan
3   E May
4   B Feb
7   A Jan
8   D Apr
10  A Mar
11  B Feb
12  E May
15  B Feb
18  C Mar
19  D Apr
20  A Jan

Answer 1

Here's a way, not with rle , but a way none-the-less: 这是一种方式，不是rle ，而是一种方式：

dat[with(dat, c(TRUE, diff(as.numeric(interaction(v1, v2))) != 0)), ]

This assumes you're using factor columns, as your sample data implies. 这假设您正在使用factor列，正如您的样本数据所暗示的那样。

Answer 2

Here a fast solution using filter 这是使用过滤器的快速解决方案

dat[(filter(dat,c(-1,1))!= 0)[,1],]
     v1   v2
1     A  Jan
3     E  May
4     B  Feb
7     A  Jan
8     D  Apr
10    A  Mar
11    B  Feb
12    E  May
15    B  Feb
18    C  Mar
19    D  Apr
NA <NA> <NA>

You need to add the last value of the original data to the result. 您需要将原始数据的最后一个值添加到结果中。

Answer 3

Using rle I came up with this 使用rle我想出了这个

ind <- cumsum(rle(as.character(dat$v1))$length)
dat[ind, ]

ind indicates either the first or the last of consecutive entries. ind表示连续条目的第一个或最后一个。

EDIT: 编辑：

A simple solution to Matthews comment would be 马修斯评论的一个简单解决方案就是

dat[15, 2] <- "May"
dat[cumsum(rle(paste0(dat$v1, dat$v2))$length), ]

从数据框中删除连续的重复项

问题描述

3 个解决方案

解决方案1
9 已采纳 2012-12-27 14:34:57

解决方案2
4 2012-12-27 15:33:22

解决方案3
3 2012-12-27 14:40:02

从数据框中删除连续的重复项

问题描述

3 个解决方案

解决方案1 9 已采纳 2012-12-27 14:34:57

解决方案2 4 2012-12-27 15:33:22

解决方案3 3 2012-12-27 14:40:02

解决方案1
9 已采纳 2012-12-27 14:34:57

解决方案2
4 2012-12-27 15:33:22

解决方案3
3 2012-12-27 14:40:02