I would like to compute the spatial average over a region of data that I define, by defining a longitude/latitude gridbox.
The data I have is ECMWF Sea-ice data, so it's spatio-temporal data for each .75x.75 lon/lat coordinate over the whole Northern Hemisphere. I've changed the data from NetCDF format into an R dataframe, so the head(var.df) looks like this with columns: Date, longitude, latitude, value
date_time lon lat ci
1 2016-01-01 18:00:00 0 87.75 1
2 2016-01-02 18:00:00 0 87.75 1
3 2016-01-03 18:00:00 0 87.75 1
4 2016-01-04 18:00:00 0 87.75 1
5 2016-01-05 18:00:00 0 87.75 1
6 2016-01-06 18:00:00 0 87.75 1
There is therefore a value for each lon/lat coordinate across the northern hemisphere (df is ordered by date, rather than lon for some reason).
How would I extract the spatial area that I want ie
BK <- subset(var.df,lon <= 30 & lon >= 105 & lat >= 70 & lat <= 80)
and then average all the values that fall within that area, for each timestep (day)? So I'd have the mean of a gridbox that I define.
Thanks in advance, I hope this wasn't phrased terribly.
Update
Using GGamba's suggested code below, I got the following output, with multiple values for the same day so it hadn't averaged the whole region by timeslice.
date_time binlat binlon mean
<dttm> <fctr> <fctr> <dbl>
1 2016-01-01 18:00:00 [80,90) [0,10) 0.4200042
2 2016-01-01 18:00:00 [80,90) [10,20) 0.4503899
3 2016-01-01 18:00:00 [80,90) [20,30) 0.5614429
4 2016-01-01 18:00:00 [80,90) [30,40) 0.6118528
5 2016-01-01 18:00:00 [80,90) [40,50) 0.5809092
6 2016-01-01 18:00:00 [80,90) [50,60) 0.5617919
7 2016-01-01 18:00:00 [80,90) [60,70) 0.6071370
8 2016-01-01 18:00:00 [80,90) [70,80) 0.6011818
9 2016-01-01 18:00:00 [80,90) [80,90] 0.5442770
10 2016-01-01 18:00:00 [80,90) NA 0.4120862
# ... with 610 more rows
I also had to add na.rm = TRUE to the mean() function at the end, as the averages were NA.
Using dplyr
we can do:
library(dplyr)
df %>%
mutate(binlon = cut(lon, seq(from = min(lon), to = max(lon), by = .75), include.lowest = T, right = F),
binlat = cut(lat, seq(from = min(lat), to = max(lat), by = .75), include.lowest = T, right = F)) %>%
group_by(date_time, binlat, binlon) %>%
summarise(mean = mean(ci))
structure(list(date_time = structure(1:6, .Label = c("2016-01-01 18:00:00",
"2016-01-02 18:00:00", "2016-01-03 18:00:00", "2016-01-04 18:00:00",
"2016-01-05 18:00:00", "2016-01-06 18:00:00"), class = "factor"),
lon = c(0L, 0L, 0L, 0L, 0L, 90L), lat = c(0, 87.75, 87.75,
87.75, 87.75, 90), ci = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("date_time",
"lon", "lat", "ci"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
# date_time binlat binlon mean
# <fctr> <fctr> <fctr> <dbl>
# 1 2016-01-01 18:00:00 [0,0.75) [0,0.75) 1
# 2 2016-01-02 18:00:00 [87.8,88.5) [0,0.75) 1
# 3 2016-01-03 18:00:00 [87.8,88.5) [0,0.75) 1
# 4 2016-01-04 18:00:00 [87.8,88.5) [0,0.75) 1
# 5 2016-01-05 18:00:00 [87.8,88.5) [0,0.75) 1
# 6 2016-01-06 18:00:00 [89.2,90] [89.2,90] 1
# 6 2016-01-06 18:00:00 [80,90) [0,10) 1
This create two new columns binning lat
& lon
into bins defined into the cut
function.
Then group by date_time
and the new columns and calculate the ci
mean on the group.
Of course you should adapt the cut
function to suit your need.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.