Define a gridbox from spatial (lat/long) data and extract average value in R

Question

I would like to compute the spatial average over a region of data that I define, by defining a longitude/latitude gridbox.

The data I have is ECMWF Sea-ice data, so it's spatio-temporal data for each .75x.75 lon/lat coordinate over the whole Northern Hemisphere. I've changed the data from NetCDF format into an R dataframe, so the head(var.df) looks like this with columns: Date, longitude, latitude, value

            date_time lon   lat ci
1 2016-01-01 18:00:00   0 87.75  1
2 2016-01-02 18:00:00   0 87.75  1
3 2016-01-03 18:00:00   0 87.75  1
4 2016-01-04 18:00:00   0 87.75  1
5 2016-01-05 18:00:00   0 87.75  1
6 2016-01-06 18:00:00   0 87.75  1

There is therefore a value for each lon/lat coordinate across the northern hemisphere (df is ordered by date, rather than lon for some reason).

How would I extract the spatial area that I want ie

BK <- subset(var.df,lon <= 30 & lon >= 105 & lat >= 70 & lat <= 80)

and then average all the values that fall within that area, for each timestep (day)? So I'd have the mean of a gridbox that I define.

Thanks in advance, I hope this wasn't phrased terribly.

Update

Using GGamba's suggested code below, I got the following output, with multiple values for the same day so it hadn't averaged the whole region by timeslice.

             date_time  binlat  binlon      mean
                <dttm>  <fctr>  <fctr>     <dbl>
1  2016-01-01 18:00:00 [80,90)  [0,10) 0.4200042
2  2016-01-01 18:00:00 [80,90) [10,20) 0.4503899
3  2016-01-01 18:00:00 [80,90) [20,30) 0.5614429
4  2016-01-01 18:00:00 [80,90) [30,40) 0.6118528
5  2016-01-01 18:00:00 [80,90) [40,50) 0.5809092
6  2016-01-01 18:00:00 [80,90) [50,60) 0.5617919
7  2016-01-01 18:00:00 [80,90) [60,70) 0.6071370
8  2016-01-01 18:00:00 [80,90) [70,80) 0.6011818
9  2016-01-01 18:00:00 [80,90) [80,90] 0.5442770
10 2016-01-01 18:00:00 [80,90)      NA 0.4120862
# ... with 610 more rows

I also had to add na.rm = TRUE to the mean() function at the end, as the averages were NA.

Answer 1

Using dplyr we can do:

library(dplyr)
df %>% 
    mutate(binlon = cut(lon, seq(from = min(lon), to = max(lon), by = .75), include.lowest = T, right = F),
           binlat = cut(lat, seq(from = min(lat), to = max(lat), by = .75), include.lowest = T, right = F)) %>% 
    group_by(date_time, binlat, binlon) %>% 
    summarise(mean = mean(ci))

Data:

structure(list(date_time = structure(1:6, .Label = c("2016-01-01 18:00:00", 
"2016-01-02 18:00:00", "2016-01-03 18:00:00", "2016-01-04 18:00:00", 
"2016-01-05 18:00:00", "2016-01-06 18:00:00"), class = "factor"), 
    lon = c(0L, 0L, 0L, 0L, 0L, 90L), lat = c(0, 87.75, 87.75, 
    87.75, 87.75, 90), ci = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("date_time", 
"lon", "lat", "ci"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Results:

#             date_time      binlat    binlon  mean
#                <fctr>      <fctr>    <fctr> <dbl>
# 1 2016-01-01 18:00:00    [0,0.75)  [0,0.75)     1
# 2 2016-01-02 18:00:00 [87.8,88.5)  [0,0.75)     1
# 3 2016-01-03 18:00:00 [87.8,88.5)  [0,0.75)     1
# 4 2016-01-04 18:00:00 [87.8,88.5)  [0,0.75)     1
# 5 2016-01-05 18:00:00 [87.8,88.5)  [0,0.75)     1
# 6 2016-01-06 18:00:00   [89.2,90] [89.2,90]     1
# 6 2016-01-06 18:00:00 [80,90) [0,10)     1

This create two new columns binning lat & lon into bins defined into the cut function.
Then group by date_time and the new columns and calculate the ci mean on the group.

Of course you should adapt the cut function to suit your need.

Define a gridbox from spatial (lat/long) data and extract average value in R

Question

1 answers

solution1
0 2017-03-01 19:04:18

Data:

Results:

Define a gridbox from spatial (lat/long) data and extract average value in R

Question

1 answers

solution1 0 2017-03-01 19:04:18

Data:

Results:

solution1
0 2017-03-01 19:04:18