简体   繁体   中英

Removing repeated rows with each interval from data.frame in R

I need help to remove all repeated rows in the same interval values of data.frame.

For example, i have a data.frame like :

Time                X   Y   Z
01/01/2011 00:00    101 200 302
01/01/2011 00:05    101 200 302
01/01/2011 00:10    101 200 302
01/01/2011 00:20    100 200 303
01/01/2011 00:25    100 200 303
01/01/2011 00:30    100 200 303
01/01/2011 00:35    101 200 302
01/01/2011 00:40    100 200 303
01/01/2011 00:45    100 200 303

And after removing the repeated row values (x,y,z), i will have a result just like below :

Time                X   Y   Z
01/01/2011 00:00    101 200 302
01/01/2011 00:20    100 200 303
01/01/2011 00:35    101 200 302
01/01/2011 00:40    100 200 303

What i have tried with : unique or duplicate function, but they give the different result.

ex/ eliminate <- data[!duplicated(data[,c("X","Y","Z")]),]

This code just delete all the duplicated values in the all data.frame.

Is there somebody can help me for find the solution?

Thanks before, Regards,

Yougyz

Probably not the most elegant way:

data  <- within(data, C <- paste(X, Y, Z, sep = ""))
rl <- rle(data$C)$lengths
data <- data[c(1, cumsum(rl)[-length(rl)] + 1), 1:(ncol(data)-1)]

The following code makes your three columns of interest a single vector. Then, I just test for equality between the vector and it's offset by 1. When that is false you've had a transition to a new XYZ item.

n <- nrow(ss)
xyz <- with(ss, paste0(X, Y, Z))
sel <- xyz[1:(n-1)] !=  xyz[2:n]
ss[c(TRUE,sel),] #the first one would always be true

This is about 3x faster than Julius answer. The advantage should become greater as the dataset grows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM