简体   繁体   中英

Using approx function within tapply or by in R

I have a temperature profiler (tp) data for date, depth and temperature. The depth for each date is not exactly the same so I need to unify it to the same depth and set the temperature for that depth by linear approximation. I was able to do this with a loop using 'approx' function (see first part of the enclosed code). But I know that I should do it better without a loop (considering I will have about 600,000 rows). I tried to do it with 'by' function but was not successful transforming the result (list) into a data frame or matrix (see second part of the code). Keep in mind that length of the rounded depth is not always the same as in the example. Rounded depth is in Depth2 column, interpulated temperature is put in Temp2 What is the 'right' way to solve this?

# create df manually
tp <- data.frame(Date=double(31), Depth=double(31), Temperature=double(31))
tp$Date[1:11] <- '2009-12-17' ; tp$Date[12:22] <- '2009-12-18'; tp$Date[23:31] <- '2009-12-19' 
tp$Depth <- c(24.92,25.50,25.88,26.33,26.92,27.41,27.93,28.37,28.82,29.38,29.92,25.07,25.56,26.06,26.54,27.04,27.53,28.03,28.52,29.02,29.50,30.01,25.05,25.55,26.04,26.53,27.02,27.52,28.01,28.53,29.01)
tp$Temperature <- c(19.08,19.06,19.06,18.87,18.67,17.27,16.53,16.43,16.30,16.26,16.22,17.62,17.43,17.11,16.72,16.38,16.28,16.20,16.15,16.13,16.11,16.08,17.54,17.43,17.32,17.14,16.89,16.53,16.28,16.20,16.13)

# create rounded depth column
tp$Depth2 <- round(tp$Depth)

# loop on date to calculate linear approximation for rounded depth
dtgrp <- tp[!duplicated(tp[,1]),1]
for (i in dtgrp) {
  x1 <- tp[tp$Date == i, "Depth"]  
  y1 <- tp[tp$Date == i, "Temperature"]
  x2 <- tp[tp$Date == i, "Depth2"]
  tpa <- approx(x=x1,y=y1,xout=x2, rule=2)
  tp[tp$Date == i, "Temp2"] <- tpa$y
}
# reduce result to rounded depth
tp1 <- tp[!duplicated(tp[,-c(2:3)]),-c(2:3)]

# not part of the question, but the end need is for a matrix, so this complete it:
library(reshape2)
tpbydt <- acast(tp1, Date~Depth2, value.var="Temp2")

# second part: I tried to use the by function (instead of loop) but got lost when tring to convert it to data frame or matrix
rdpth <- function(x1,y1,x2) {
  tpa <- approx(x=x1,y=y1,xout=x2, rule=2)
  return(tpa)
}
tp2 <- by(tp, tp$Date,function(tp) rdpth(tp$Depth,tp$Temperature,tp$Depth2), simplify = TRUE)

Very close with by call but remember it returns a list of objects. Therefore, consider building a list of data frames to be row binded at very end:

df_list <- by(tp, tp$Date, function(sub) {
  tpa <- approx(x=sub$Depth, y=sub$Temperature, xout=sub$Depth2, rule=2)

  df <- unique(data.frame(Date = sub$Date, 
                          Depth2 = sub$Depth2,
                          Temp2 = tpa$y,
                          stringsAsFactors = FALSE))
  return(df)
})    

tp2 <- do.call(rbind, unname(df_list))

tp2
#          Date Depth2    Temp2
# 1  2009-12-17     25 19.07724
# 2  2009-12-17     26 19.00933
# 5  2009-12-17     27 18.44143
# 7  2009-12-17     28 16.51409
# 9  2009-12-17     29 16.28714
# 11 2009-12-17     30 16.22000
# 12 2009-12-18     25 17.62000
# 21 2009-12-18     26 17.14840
# 4  2009-12-18     27 16.40720
# 6  2009-12-18     28 16.20480
# 8  2009-12-18     29 16.13080
# 10 2009-12-18     30 16.08059
# 13 2009-12-19     25 17.54000
# 22 2009-12-19     26 17.32898
# 41 2009-12-19     27 16.90020
# 61 2009-12-19     28 16.28510
# 81 2009-12-19     29 16.13146

And if you reset row.names , this is exactly identical to your tp1 output:

identical(data.frame(tp1, row.names = NULL),
          data.frame(tp2, row.names = NULL))
# [1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM