Here is an example of my data set;
Date Time(GMT)Depth Temp Salinity Density Phosphate
24/06/2002 1000 1 33.855 0.01
24/06/2002 1000 45 33.827 0.01
01/07/2002 1000 10 13.26 33.104 24.873 0.06
01/07/2002 1000 30 12.01 33.787 25.646 0.13
08/07/2002 1000 5 13.34 33.609 25.248 0.01
08/07/2002 1000 40 12.01 34.258 26.011 1.33
15/07/2002 1000 30 12.04 34.507 26.199 0.01
22/07/2002 1000 5 13.93 33.792 25.269 0.01
22/07/2002 1000 30 11.9 34.438 26.172 0.08
29/07/2002 1000 5 13.23 34.09 25.642 0.01
I want to delete duplicate rows so that I only have one row per date, I want to do this based on the Depth, I would like to keep the row with the greatest (deepest) depth. Any ideas?
Lets say you have data in df
df = df[order(df[,'Date'],-df[,'Depth']),]
df = df[!duplicated(df$Date),]
Introducing a data.table
solution which will be the fastest way to solve this (assuming data
is your data set)
library(data.table)
unique(setDT(data)[order(Date, -Depth)], by = "Date")
Just another way:
setDT(data)[data[, .I[which.max(Depth)], by=Date]$V1]
Here's one way to do it in a single dplyr call:
# Remove any duplicates
df <- df %>%
arrange(Date, -Depth) %>%
filter(duplicated(Date) == FALSE)
This might be not the fastest approach if your data frame is large, but a fairly strightforward one. This might change the order of your data frame and you might need to reorder by eg date afterwards. Instead of deleting we split the data by date, in each chunk pick a row with the maximum date and finally join the result back into a data frame
data = split(data, data$Date)
data = lapply(data, function(x) x[which.max(x$Depth), , drop=FALSE])
data = do.call("rbind", data)
You might also use dplyr's arrange()
instead of order (I find it more intuitive):
df <- arrange(df, Date, -Depth)
df <- df[!duplicated(df$Date),]
# First find the maxvalues
maxvals = aggregate(df$Depth~df$Date, FUN=max)
#Now use apply to find the matching rows and separate them out
out = df[apply(maxvals,1,FUN=function(x) which(paste(df$Date,df$Depth) == paste(x[1],x[2]))),]
Does that work for you?
An alternative approach using dplyr's distinct
function:
library(dplyr)
df %>% distinct(column, .keep_all = TRUE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.