简体   繁体   中英

Linear interpolation among columns in r

I am working with some temperature data where I have temperatures at certain depths eg 0.9m, 2.5m and 5m. I would like to interpolate this values so I obtain the temperature each meter, eg 1m, 2m and 3m. The original data looks like this:

df
# A tibble: 5 x 3
  date                d_0.9 d_2.5  
  <dttm>              <dbl> <dbl> 
1 2004-01-05 03:00:00  7     8        
2 2004-01-05 04:00:00  7.5   9      
3 2004-01-05 05:00:00  7     8        
4 2004-01-05 06:00:00  6.92  NA      

What I would like to get is something like :

df_int
# A tibble: 5 x 5
  date                 d_0.9   d_1      d_2      d_2.5  
  <dttm>              <dbl>   <dbl>     <dbl>    <dbl>
1 2004-01-05 03:00:00  7       7.0625   7.6875   8     
2 2004-01-05 04:00:00  7.5     7.59375  8.53125  9      
3 2004-01-05 05:00:00  7       7.0625   7.6875   8  
4 2004-01-05 06:00:00  6.92    NA       NA       NA 

I have to do this for a very large data frame. Is there an efficient way of doing it?

Many thanks in advance

One option is to convert the data to long format, use a join to add rows for the depths we want to interpolate at, and then use approx for the interpolation:

library(tidyverse)

# Data
df = tibble(date=seq(as.POSIXct("2004-01-05 03:00:00"),
                     as.POSIXct("2004-01-05 06:00:00"),
                     by="1 hour"),
            d_0.9 = c(7,7.5,7,6.92),
            d_2.5 = c(8,NA,8,NA),
            d_5.0 = c(10,10.5,9.4,NA))

# Create a data frame with all of the times and depths we want to interpolate at
depths = sort(unique(c(c(0.9, 2.5, 5), seq(ceiling(0.9), floor(5), 1))))
depths = crossing(date=unique(df$date), depth = depths)

# Convert data to long format, join to add interpolation depths, then interpolate
df.interp = df %>% 
  gather(depth, value, -date) %>% 
  mutate(depth = as.numeric(gsub("d_", "", depth))) %>% 
  full_join(depths) %>% 
  arrange(date, depth) %>% 
  group_by(date) %>% 
  mutate(value.interp = if(length(na.omit(value)) > 1) {
    approx(depth, value, xout=depth)$y
  } else {
    value
  })

In the code above, the if statement is inclduded to prevent approx throwing an error when a given date has only one non-missing value.

df.interp
 date depth value value.interp 1 2004-01-05 03:00:00 0.9 7.00 7.000000 2 2004-01-05 03:00:00 1.0 NA 7.062500 3 2004-01-05 03:00:00 2.0 NA 7.687500 4 2004-01-05 03:00:00 2.5 8.00 8.000000 5 2004-01-05 03:00:00 3.0 NA 8.400000 6 2004-01-05 03:00:00 4.0 NA 9.200000 7 2004-01-05 03:00:00 5.0 10.00 10.000000 8 2004-01-05 04:00:00 0.9 7.50 7.500000 9 2004-01-05 04:00:00 1.0 NA 7.573171 10 2004-01-05 04:00:00 2.0 NA 8.304878 11 2004-01-05 04:00:00 2.5 NA 8.670732 12 2004-01-05 04:00:00 3.0 NA 9.036585 13 2004-01-05 04:00:00 4.0 NA 9.768293 14 2004-01-05 04:00:00 5.0 10.50 10.500000 15 2004-01-05 05:00:00 0.9 7.00 7.000000 16 2004-01-05 05:00:00 1.0 NA 7.062500 17 2004-01-05 05:00:00 2.0 NA 7.687500 18 2004-01-05 05:00:00 2.5 8.00 8.000000 19 2004-01-05 05:00:00 3.0 NA 8.280000 20 2004-01-05 05:00:00 4.0 NA 8.840000 21 2004-01-05 05:00:00 5.0 9.40 9.400000 22 2004-01-05 06:00:00 0.9 6.92 6.920000 23 2004-01-05 06:00:00 1.0 NA NA 24 2004-01-05 06:00:00 2.0 NA NA 25 2004-01-05 06:00:00 2.5 NA NA 26 2004-01-05 06:00:00 3.0 NA NA 27 2004-01-05 06:00:00 4.0 NA NA 28 2004-01-05 06:00:00 5.0 NA NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM