简体   繁体   English

从R中的data.frame中删除每个间隔的重复行

[英]Removing repeated rows with each interval from data.frame in R

I need help to remove all repeated rows in the same interval values of data.frame. 我需要帮助删除data.frame的相同间隔值中的所有重复行。

For example, i have a data.frame like : 例如,我有一个data.frame,如:

Time                X   Y   Z
01/01/2011 00:00    101 200 302
01/01/2011 00:05    101 200 302
01/01/2011 00:10    101 200 302
01/01/2011 00:20    100 200 303
01/01/2011 00:25    100 200 303
01/01/2011 00:30    100 200 303
01/01/2011 00:35    101 200 302
01/01/2011 00:40    100 200 303
01/01/2011 00:45    100 200 303

And after removing the repeated row values (x,y,z), i will have a result just like below : 删除重复的行值(x,y,z)后,我将得到如下结果:

Time                X   Y   Z
01/01/2011 00:00    101 200 302
01/01/2011 00:20    100 200 303
01/01/2011 00:35    101 200 302
01/01/2011 00:40    100 200 303

What i have tried with : unique or duplicate function, but they give the different result. 我尝试过:独特或重复的功能,但它们给出了不同的结果。

ex/ eliminate <- data[!duplicated(data[,c("X","Y","Z")]),] ex / eliminate <- data[!duplicated(data[,c("X","Y","Z")]),]

This code just delete all the duplicated values in the all data.frame. 此代码只删除所有data.frame中的所有重复值。

Is there somebody can help me for find the solution? 有人可以帮我找到解决方案吗?

Thanks before, Regards, 先谢谢,此致,

Yougyz Yougyz

Probably not the most elegant way: 可能不是最优雅的方式:

data  <- within(data, C <- paste(X, Y, Z, sep = ""))
rl <- rle(data$C)$lengths
data <- data[c(1, cumsum(rl)[-length(rl)] + 1), 1:(ncol(data)-1)]

The following code makes your three columns of interest a single vector. 以下代码使您感兴趣的三列成为单个向量。 Then, I just test for equality between the vector and it's offset by 1. When that is false you've had a transition to a new XYZ item. 然后,我只测试向量之间的相等性,它的偏移量为1.当这是假的时,你已经转换到一个新的XYZ项目。

n <- nrow(ss)
xyz <- with(ss, paste0(X, Y, Z))
sel <- xyz[1:(n-1)] !=  xyz[2:n]
ss[c(TRUE,sel),] #the first one would always be true

This is about 3x faster than Julius answer. 这比朱利叶斯回答快约3倍。 The advantage should become greater as the dataset grows. 随着数据集的增长,优势应该会变得更大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM