简体   繁体   中英

Define a gridbox from spatial (lat/long) data and extract average value in R

I would like to compute the spatial average over a region of data that I define, by defining a longitude/latitude gridbox.

The data I have is ECMWF Sea-ice data, so it's spatio-temporal data for each .75x.75 lon/lat coordinate over the whole Northern Hemisphere. I've changed the data from NetCDF format into an R dataframe, so the head(var.df) looks like this with columns: Date, longitude, latitude, value

            date_time lon   lat ci
1 2016-01-01 18:00:00   0 87.75  1
2 2016-01-02 18:00:00   0 87.75  1
3 2016-01-03 18:00:00   0 87.75  1
4 2016-01-04 18:00:00   0 87.75  1
5 2016-01-05 18:00:00   0 87.75  1
6 2016-01-06 18:00:00   0 87.75  1

There is therefore a value for each lon/lat coordinate across the northern hemisphere (df is ordered by date, rather than lon for some reason).

How would I extract the spatial area that I want ie

BK <- subset(var.df,lon <= 30 & lon >= 105 & lat >= 70 & lat <= 80)

and then average all the values that fall within that area, for each timestep (day)? So I'd have the mean of a gridbox that I define.

Thanks in advance, I hope this wasn't phrased terribly.

Update

Using GGamba's suggested code below, I got the following output, with multiple values for the same day so it hadn't averaged the whole region by timeslice.

             date_time  binlat  binlon      mean
                <dttm>  <fctr>  <fctr>     <dbl>
1  2016-01-01 18:00:00 [80,90)  [0,10) 0.4200042
2  2016-01-01 18:00:00 [80,90) [10,20) 0.4503899
3  2016-01-01 18:00:00 [80,90) [20,30) 0.5614429
4  2016-01-01 18:00:00 [80,90) [30,40) 0.6118528
5  2016-01-01 18:00:00 [80,90) [40,50) 0.5809092
6  2016-01-01 18:00:00 [80,90) [50,60) 0.5617919
7  2016-01-01 18:00:00 [80,90) [60,70) 0.6071370
8  2016-01-01 18:00:00 [80,90) [70,80) 0.6011818
9  2016-01-01 18:00:00 [80,90) [80,90] 0.5442770
10 2016-01-01 18:00:00 [80,90)      NA 0.4120862
# ... with 610 more rows

I also had to add na.rm = TRUE to the mean() function at the end, as the averages were NA.

Using dplyr we can do:

library(dplyr)
df %>% 
    mutate(binlon = cut(lon, seq(from = min(lon), to = max(lon), by = .75), include.lowest = T, right = F),
           binlat = cut(lat, seq(from = min(lat), to = max(lat), by = .75), include.lowest = T, right = F)) %>% 
    group_by(date_time, binlat, binlon) %>% 
    summarise(mean = mean(ci))

Data:

structure(list(date_time = structure(1:6, .Label = c("2016-01-01 18:00:00", 
"2016-01-02 18:00:00", "2016-01-03 18:00:00", "2016-01-04 18:00:00", 
"2016-01-05 18:00:00", "2016-01-06 18:00:00"), class = "factor"), 
    lon = c(0L, 0L, 0L, 0L, 0L, 90L), lat = c(0, 87.75, 87.75, 
    87.75, 87.75, 90), ci = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("date_time", 
"lon", "lat", "ci"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Results:

#             date_time      binlat    binlon  mean
#                <fctr>      <fctr>    <fctr> <dbl>
# 1 2016-01-01 18:00:00    [0,0.75)  [0,0.75)     1
# 2 2016-01-02 18:00:00 [87.8,88.5)  [0,0.75)     1
# 3 2016-01-03 18:00:00 [87.8,88.5)  [0,0.75)     1
# 4 2016-01-04 18:00:00 [87.8,88.5)  [0,0.75)     1
# 5 2016-01-05 18:00:00 [87.8,88.5)  [0,0.75)     1
# 6 2016-01-06 18:00:00   [89.2,90] [89.2,90]     1
# 6 2016-01-06 18:00:00 [80,90) [0,10)     1

This create two new columns binning lat & lon into bins defined into the cut function.
Then group by date_time and the new columns and calculate the ci mean on the group.

Of course you should adapt the cut function to suit your need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM