简体   繁体   English

如何更改 R 中的分辨率(或重新网格)数据

[英]How to change the resolution (or regrid) data in R

I have a dataset consisting of lon, lat and a monthly mean variable (eg temperature or precipitation) covering 1961 to 1970. The dataset is at a resolution of 0.5 by 0.5 degree lon/lat and covers the whole globe and was downloaded as an .NC file which I have extracted the data in R by using:我有一个由 lon、lat 和一个涵盖 1961 年到 1970 年的月平均变量(例如温度或降水)组成的数据集。该数据集的分辨率为 0.5 x 0.5 度 lon/lat,覆盖整个地球,并作为 .我使用以下方法提取了 R 中的数据的 NC 文件:

library(ncdf)
f <- open.ncdf("D:/CRU/cru_ts3.21.1961.1970.tmp.dat.nc")
A <- get.var.ncdf(nc=f,varid="tmp")
B <- get.var.ncdf(nc=f,varid="lon")
C <- get.var.ncdf(nc=f,varid="lat")
D <- cbind(expand.grid(B, C))
E <- expand.grid(A)

The expanded grid (E) is a data table consisting of 31,104,000 rows of the variable and the expanded grid (D) is a data table consisting of 259,200 rows of lon/lat.扩展网格 (E) 是一个包含 31,104,000 行变量的数据表,扩展网格 (D) 是一个包含 259,200 行 lon/lat 的数据表。 If you multiply 259,200 * 10 years * 12 months you get the 31,104,000.如果你乘以 259,200 * 10 年 * 12 个月,你会得到 31,104,000。 Hence the table E can be chopped up into monthly values by using:因此,可以使用以下方法将表 E 拆分为月值:

Month <- 1
Start <- (Month-1)*(259200)+1
Finish <- (Month*259200)
G <- E[Start:Finish,]
H <- expand.grid(G)
I <- cbind(D,H) 

Therefore I is now a data table of the first month (ie January 1961) consisting of lon, lat and the variable.因此我现在是第一个月(即 1961 年 1 月)的数据表,由 lon、lat 和变量组成。 An example of the data is given below:下面给出了一个数据示例:

        lon    lat tmp
49184 -68.25 -55.75 7.5
49185 -67.75 -55.75 7.6
49186 -67.25 -55.75 7.6
49899 -70.75 -55.25 6.8
49900 -70.25 -55.25 7.0
49901 -69.75 -55.25 6.9
49902 -69.25 -55.25 7.1
49903 -68.75 -55.25 6.8
49904 -68.25 -55.25 7.6
49905 -67.75 -55.25 8.2

Now for my question.现在我的问题。 The current resolution of the grid is 0.5 * 0.5 degrees, and I would like to "regrid" the data so the resolution is 0.25 * 0.25 degrees.网格的当前分辨率为 0.5 * 0.5 度,我想“重新网格化”数据,因此分辨率为 0.25 * 0.25 度。 I don't want to do anything particularly clever with the data, so I just want the 0.25 grid to take the value of the 0.5 grid that it sits in ie each 0.5*0.5 grid contains 4 0.25*0.25 grids and I just want the 4 0.25*0.25 grids to have the same value as the 0.5*0.5 grid.我不想对数据做任何特别聪明的事情,所以我只想让 0.25 网格取它所在的 0.5 网格的值,即每个 0.5*0.5 网格包含 4 个 0.25*0.25 网格,我只想要4 0.25*0.25 网格与 0.5*0.5 网格具有相同的值。

I've looked at raster but don't seem to be able to do anything with it.我看过光栅,但似乎无法对它做任何事情。

Here is a way to do it using plyr::ddply() - probably it'll be a bit slow for your table size, depending on how often you want to re-grid.这是使用plyr::ddply()一种方法 - 对于您的表格大小来说可能会有点慢,具体取决于您想要重新网格的频率。 I will have a think about a way to do it with data.table, which should be faster:我会考虑使用 data.table 的方法,它应该更快:

require(plyr)
# make your data frame
I<-data.frame(lat=seq(0.5,1000,0.5),lon=1,tmp=sample(1:100,2000,replace=T))

# make an adjustment grid
k<-expand.grid(c(0,0.25),c(0,0.25),0)

# use plyr:ddply() to expand out each entry into the correponding 4 entries
new_I<-ddply(I,.(lat,lon),function(x)as.list(x)+k)
colnames(new_I)<-c("lat","lon","newlat","newlon","tmp")

head(new_I)

  lat lon newlat newlon tmp
1 0.5   1   0.50   1.00  64
2 0.5   1   0.75   1.00  64
3 0.5   1   0.50   1.25  64
4 0.5   1   0.75   1.25  64
5 1.0   1   1.00   1.00  31
6 1.0   1   1.25   1.00  31

Actually thinking about it, here is a better way from a time perspective (although it's a bit of a hack, and gives you less control for additional data processing you may wish to do in future), but it takes 6.5sec for 2m >> 8M rows.实际上考虑一下,从时间的角度来看,这是一个更好的方法(虽然它有点hack,并且使您对将来可能希望进行的其他数据处理的控制更少),但是2m需要6.5秒>> 8M 行。

# make your data frame
I<-data.frame(lat=seq(0.5,1000000,0.5),lon=1,tmp=sample(1:100,2000000,replace=T))

# make an adjustment vector
v<-rep(0.25,times=2000000)

# make 3 new tables, apply the vector appropriately, and rbind
I_latshift<-I
I_lonshift<-I
I_bothshift<-I

I_latshift$lat<-I_latshift$lat+v
I_lonshift$lon<-I_lonshift$lon+v
I_bothshift$lat<-I_bothshift$lat+v
I_bothshift$lon<-I_bothshift$lon+v

I<-rbind(I,I_bothshift,I_latshift,I_lonshift)

# sort it for neatness
I<-I[with(I, order(lat, lon)), ]


head(I)

         lat  lon tmp
1       0.50 1.00   3
6000001 0.50 1.25   3
4000001 0.75 1.00   3
2000001 0.75 1.25   3
2       1.00 1.00  88
6000002 1.00 1.25  88

There exists a solution in R package raster . R 包raster有一个解决方案。 It goes as following它如下

library("ncdf4")
library("raster")
nc <- nc_open("my_file.nc")
lon <- ncvar_get(nc, "lon")
lat <- ncvar_get(nc, "lat")
time <- ncvar_get(nc, "time")
dname <- "pre"        ## pre for the short name of precpitation 
nlon <- dim(lon)
nlat <- dim(lat)
nt <- dim(time)
lonlat <- expand.grid(lon, lat)    # make grid of given longitude and latitude 
pr.array <- ncvar_get(nc, dname)
dlname <- ncatt_get(nc, dname, "long_name")
dunits <- ncatt_get(nc, dname, "units")
fillvalue <- ncatt_get(nc, dname, "_FillValue")

pr.vec.long <- as.vector(pr.array)
pr.mat <- matrix(pr.vec.long, nrow = nlon * nlat, ncol = nt)
pr.df <- data.frame(cbind(lonlat, pr.mat))

pr_c <- pr.df[ ,-c(1:2)]
 ### Specific region have been clipped out from global datafile by 
## selecting lon and lat range and extract regridded data at 1lon 1lat
 ## resolution.  

x0 <- seq(67.5, 98.5, by = 1) ## choose different resolution, eg. by = 0.5 
y0 <- seq(6.5, 37.5, by = 1)


m <- cbind(x0, y0)
m <- as.data.frame(m)
s <- rasterFromXYZ(m)
pts <- expand.grid(x0, y0)
pos <- pr.df[ ,c(1:2)]
l_pr <- apply(pr_c, 2, function(x) cbind(pos, x))
colnm = c("x","y","z")
for (j in seq_along(l_pr)){
  colnames(l_pr[[j]]) <- colnm
}

pr_rstr <- lapply(l_pr, function(x) rasterFromXYZ(x))
## Use resample command to regrid the data, here nearest neighbor method can also be chosen by setting method = "ngb"
pr_bn <- lapply(pr_rstr, function(x) resample(x, s, method = "bilinear"))
pr_extr <- lapply(pr_bn, function(x) extract(x, pts))
df_pr <- do.call("cbind", lapply(pr_extr, data.frame))
## write dataframe in csv format
write.csv(df_pr, "my_data_regridded_1.csv")

I hope this will serve the purpose.我希望这将达到目的。

This is not an R solution, but just to point out that you can use CDO to regrid netcdf files very easily from the command line in a linux/MAC OS environment.这不是 R 解决方案,只是指出您可以使用 CDO 在 linux/MAC OS 环境中从命令行非常轻松地重新网格化 netcdf 文件。 From your description it sounds as if you want to use nearest neighbour interpolation, which for a 0.25degree regular grid would be根据您的描述,听起来好像您想使用最近邻插值,对于 0.25 度的规则网格将是

cdo remapnn,r1440x720 in.nc out.nc

However, you can also use first or second order conservative remapping.但是,您也可以使用一阶或二阶保守重映射。 For example for first order:例如对于第一订单:

cdo remapcon,r1440x720 in.nc out.nc

You can then read in the regridded field into R in the same way you are currently doing.然后,您可以按照与当前相同的方式将重新网格化的字段读入 R。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM