简体   繁体   English

R从.CSV创建NetCDF

[英]R create NetCDF from .CSV

I am trying to create a NetCDF from a .csv file. 我正在尝试从.csv文件创建NetCDF。 I have read several tutorials here and other places and still have some doubts. 我在这里和其他地方阅读了一些教程,但仍然有一些疑问。

I have a table according to this: 我有一张桌子据此:

lat,long,time,rh,temp
41,-109,6,1,1
40,-107,18,2,2
39,-105,6,3,3
41,-103,18,4,4
40,-109,6,5,2
39,-107,18,6,4

I create the NetCDF using the ncdf4 package in R. 我使用R中的ncdf4包创建NetCDF。

xvals <- data$lon
yvals <- data$lat 
nx <- length(xvals)
ny <- length(yvals)
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
mv <- -999 #missing value to use

var_temp <- ncvar_def("temperatura", "celsius", list(lon1, lat2, time), longname="Temp. da superfície", mv) 

var_rh <- ncvar_def("humidade", "%", list(lon1, lat2, time), longname = "humidade relativa", mv )

ncnew <- nc_create(filename, list(var_temp, var_rh))
ncvar_put(ncnew, var_temp, dadostemp, start=c(1,1,1), count=c(nx,ny,nt))

When I follow the procedure it states that the NC expects 3 times the number of data that I have. 当我按照该程序进行操作时,它指出NC期望的数据量是我的3倍。 I understand why, one matrix for each dimension, since I stated that the variables are according to the Longitude, Latitude and Time. 我了解了为什么每个维度都有一个矩阵,因为我说过变量是根据经度,纬度和时间来确定的。

So, how would I import this kind of data, where I already have one Lon, Lat, Time and other variables for each data acquisition? 那么,在每次数据采集中已经有了一个Lon,Lat,Time和其他变量的情况下,我将如何导入此类数据?

Could someone shed some light? 有人可以照亮吗?

PS: The data used here is not my real data, just some example I was using for the tutorials. PS:这里使用的数据不是我的真实数据,只是我在教程中使用的一些示例。

I think there is more than one problem in your code. 我认为您的代码中存在多个问题。 Step by step: 一步步:

Create dimensions 创建尺寸

In a nc file dimensions don't work as key-values there just a vector of values defining what each position in a variable array means. 在nc文件中,维度不用作键值,只有一个向量值定义了变量数组中每个位置的含义。 This means you should create your dimensions like this: 这意味着您应该按以下方式创建尺寸:

xvals <- unique(data$lon)
xvals <- xvals[order(xvals)]
yvals <- yvals[order(unique(data$lat))] 
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
time_d <- ncdim_def("time","h",unique(time))

Where I work we use unlimited dimensions as mere indexes while a 1d-variable with same name as the dimension holds the values. 在我工作的地方,我们将无限制的维度用作索引,而与该维度同名的1d变量则保存值。 I'm not sure how unlimited dimensions work in R. Since you don't ask for it I leave this out :-) 我不确定R中的无穷大尺寸是如何工作的。由于您不要求它,所以我将其省略了:-)

define variables 定义变量

mv <- -999 #missing value to use
var_temp <- ncvar_def("temperatura", "celsius", 
                      list(lon1, lat2, time_d), 
                      longname="Temp. da superfície", mv) 
var_rh <- ncvar_def("humidade", "%", 
                     list(lon1, lat2, time_d), 
                     longname = "humidade relativa", mv )

add data 添加数据

Create an nc file: ncnew <- nc_create(f, list(var_temp, var_rh)) 创建一个nc文件: ncnew <- nc_create(f, list(var_temp, var_rh))

When adding values the object holding the data is molten to a 1d-array and a sequential write is started at the position specified by start. 当相加值时,保存数据的对象将熔化为一维数组,并在start指定的位置处开始顺序写入。 The dimension to write along is controlled by the values in count. 写入的尺寸由计数值控制。 If you have data like this: 如果您有这样的数据:

long, lat, time, t
   1,   1,    1, 1
   2,   1,    1, 2
   1,   2,    1, 3
   2,   2,    1, 4

The command ncvar_put(ncnew, var_temp,data$t,count=c(2,2,1)) would give you what you (probably) expect. 命令ncvar_put(ncnew, var_temp,data$t,count=c(2,2,1))将给您您(可能)期望的结果。

For you're data the first step is to create the indexes for the dimensions: 对于您来说,数据的第一步是为维度创建索引:

data$idx_lon <- match(data$long,xvals)
data$idx_lat <- match(data$lat,yvals)
data$idx_time <- match(data$time,unique(time))

Then create an array with the dimensions appropriate for your data: 然后使用适合您的数据的尺寸创建一个数组:

m <- array(mv,dim = c(length(yvals),length(xvals),length(unique(time))))

Then fill the array with you're values: 然后用您的值填充数组:

for(i in 1:NROW(data)){
  m[data$idx_lat[i],data$idx_lon[i],data$idx_time[i]] <- data$temp[i]
}

if speed is a concern you could calculate the linear index vectorised and use this for value assignment. 如果需要考虑速度,则可以计算线性化的线性索引,并将其用于值分配。

Write the data 写数据

ncvar_put(ncnew, var_temp,m)

Note that you don't need start and count . 请注意,您不需要startcount

Finally close the nc file to write data to the disk nc_close(ncnew) Optionally I would recommend you the ncdump console command to check your file. 最后关闭nc文件,将数据写入磁盘nc_close(ncnew)可选)我建议您使用ncdump console命令检查您的文件。

Edit 编辑

Regarding your question whether to write a complete array or use start and count I believe both methods work reliable. 关于您写一个完整的数组还是使用startcount我相信这两种方法都可以可靠地工作。 Which one to prefer depends on your data and you're personal preferences. 首选哪一个取决于您的数据和您的个人喜好。

I think the method of building an array, add the values and then write it as whole is easier to understand. 我认为构建数组,添加值然后将其整体写入的方法更容易理解。 However, when asking what is more efficient it depends on the data. 但是,在询问哪种方法更有效时,取决于数据。 If you're data is big and has many NA values I believe using multiple writes with start and count could be faster. 如果您的数据量很大,并且具有许多NA值,我相信使用具有start和count的多次写入操作可能会更快。 If NA's are rare creating one matrix and do single write would be faster. 如果不常见,则创建一个矩阵并执行一次写入会更快。 If you're data is so big creating an extra array would exceed you're available memory you have to combine both methods. 如果您的数据量很大,那么创建一个额外的数组将超出您的可用内存,则必须将这两种方法结合起来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM