简体   繁体   English

将 netcdf 时间变量转换为 R 日期对象

[英]convert a netcdf time variable to an R date object

I have a netcdf file with a timeseries and the time variable has the following typical metadata:我有一个带有时间序列的 netcdf 文件,时间变量具有以下典型元数据:

    double time(time) ;
            time:standard_name = "time" ;
            time:bounds = "time_bnds" ;
            time:units = "days since 1979-1-1 00:00:00" ;
            time:calendar = "standard" ;
            time:axis = "T" ;

Inside RI want to convert the time into an R date object. RI 内部想要将时间转换为 R 日期对象。 I achieve this at the moment in a hardwired way by reading the units attribute and splitting the string and using the third entry as my origin (thus assuming the spacing is "days" and the time is 00:00 etc):我现在通过读取单位属性并拆分字符串并使用第三个条目作为我的原点以硬连线的方式实现这一点(因此假设间距是“天”并且时间是 00:00 等):

require("ncdf4")
f1<-nc_open("file.nc")
time<-ncvar_get(f1,"time")
tunits<-ncatt_get(f1,"time",attname="units")
tustr<-strsplit(tunits$value, " ")
dates<-as.Date(time,origin=unlist(tustr)[3])

This hardwired solution works for my specific example, but I was hoping that there might be a package in R that nicely handles the UNIDATA netcdf date conventions for time units and convert them safely to an R date object?这个硬连线解决方案适用于我的具体示例,但我希望 R 中可能有一个包可以很好地处理时间单位的 UNIDATA netcdf 日期约定并将它们安全地转换为 R 日期对象?

I have just discovered (two years after posting the question!) that there is a package called ncdf.tools which has the function:我刚刚发现(发布问题两年后!)有一个名为ncdf.tools的包具有以下功能:

convertDateNcdf2R转换日期Ncdf2R

which哪个

converts a time vector from a netCDF file or a vector of Julian days (or seconds, minutes, hours) since a specified origin into a POSIXct R vector.将来自 netCDF 文件的时间向量或自指定原点以来儒略天数(或秒、分钟、小时)的向量转换为 POSIXct R 向量。

Usage:用法:

convertDateNcdf2R(time.source, units = "days", origin = as.POSIXct("1800-01-01", 
    tz = "UTC"), time.format = c("%Y-%m-%d", "%Y-%m-%d %H:%M:%S", 
    "%Y-%m-%d %H:%M", "%Y-%m-%d %Z %H:%M", "%Y-%m-%d %Z %H:%M:%S"))

Arguments:论据:

time.source 

numeric vector or netCDF connection: either a number of time units since origin or a netCDF file connection, In the latter case, the time vector is extracted from the netCDF file, This file, and especially the time variable, has to follow the CF netCDF conventions.数字向量或 netCDF 连接:自起源以来的多个时间单位或 netCDF 文件连接,在后一种情况下,时间向量是从 netCDF 文件中提取的,该文件,尤其是时间变量,必须遵循 CF netCDF公约。

units   

character string: units of the time source.字符串:时间源的单位。 If the source is a netCDF file, this value is ignored and is read from that file.如果源是 netCDF 文件,则忽略此值并从该文件中读取。

origin  

POSIXct object: Origin or day/hour zero of the time source. POSIXct 对象:时间源的原点或天/小时零。 If the source is a netCDF file, this value is ignored and is read from that file.如果源是 netCDF 文件,则忽略此值并从该文件中读取。

Thus it is enough to simply pass the netcdf connection as the first argument and the function handles the rest.因此,只需将 netcdf 连接作为第一个参数传递就足够了,函数处理其余的。 Caveat: This will only work if the netCDF file follows CF conventions (eg if your units are "years since" instead of "seconds since" or "days since" it will fail for example).警告:这仅在 netCDF 文件遵循CF 约定时才有效(例如,如果您的单位是“之后的年数”而不是“之后的秒数”或“之后的天数”,例如,它将失败)。

More details on the function are available here: https://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html有关该功能的更多详细信息,请参见此处: https ://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html

There is not, that I know of.没有,据我所知。 I have this handy function using lubridate , which is basically identical to yours.我使用lubridate有这个方便的功能,它与你的基本相同。

getNcTime <- function(nc) {
    require(lubridate)
    ncdims <- names(nc$dim) #get netcdf dimensions
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))[1]] #find time variable
    times <- ncvar_get(nc, timevar)
    if (length(timevar)==0) stop("ERROR! Could not identify the correct time variable")
    timeatt <- ncatt_get(nc, timevar) #get attributes
    timedef <- strsplit(timeatt$units, " ")[[1]]
    timeunit <- timedef[1]
    tz <- timedef[5]
    timestart <- strsplit(timedef[4], ":")[[1]]
    if (length(timestart) != 3 || timestart[1] > 24 || timestart[2] > 60 || timestart[3] > 60 || any(timestart < 0)) {
        cat("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n")
        warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
        timedef[4] <- "00:00:00"
    }
    if (! tz %in% OlsonNames()) {
        cat("Warning:", tz, "not a valid timezone. Assuming UTC\n")
        warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
        tz <- "UTC"
    }
    timestart <- ymd_hms(paste(timedef[3], timedef[4]), tz=tz)
    f <- switch(tolower(timeunit), #Find the correct lubridate time function based on the unit
        seconds=seconds, second=seconds, sec=seconds,
        minutes=minutes, minute=minutes, min=minutes,
        hours=hours,     hour=hours,     h=hours,
        days=days,       day=days,       d=days,
        months=months,   month=months,   m=months,
        years=years,     year=years,     yr=years,
        NA
    )
    suppressWarnings(if (is.na(f)) stop("Could not understand the time unit format"))
    timestart + f(times)
}

EDIT: One might also want to take a look at ncdf4.helpers::nc.get.time.series编辑:可能还想看看ncdf4.helpers::nc.get.time.series

EDIT2: note that the newly-proposed and currently in developement awesome stars package will handle dates automatically, see the first blog post for an example. EDIT2:请注意,新提出的和目前正在开发的 awesome stars包将自动处理日期,请参阅第一篇博文以获取示例。

EDIT3: another way is to use the units package directly, which is what stars uses. EDIT3:另一种方法是直接使用units包,这就是stars使用的。 One could do something like this: (still not handling the calendar correctly, I'm not sure units can)可以做这样的事情:(仍然不能正确处理日历,我不确定units可以)

getNcTime <- function(nc) { ##NEW VERSION, with the units package
    require(units)
    require(ncdf4)
    options(warn=1) #show warnings by default
    if (is.character(nc)) nc <- nc_open(nc)
    ncdims <- names(nc$dim) #get netcdf dimensions
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))] #find (first) time variable
    if (length(timevar) > 1) {
        warning(paste("Found more than one time var. Using the first:", timevar[1]))
        timevar <- timevar[1]
    }
    if (length(timevar)!=1) stop("ERROR! Could not identify the correct time variable")
    times <- ncvar_get(nc, timevar) #get time data
    timeatt <- ncatt_get(nc, timevar) #get attributes
    timeunit <- timeatt$units
    units(times) <- make_unit(timeunit)
    as.POSIXct(time)
}

I couldn't get @AF7's function to work with my files so I wrote my own.我无法让@AF7 的函数处理我的文件,所以我自己编写了。 The function below creates a POSIXct vector of dates, for which the start date, time interval, unit and length are read from the nc file.下面的函数创建一个 POSIXct 日期向量,从 nc 文件中读取开始日期、时间间隔、单位和长度。 It works with nc files of many (but probably not every...) shapes or forms.它适用于许多(但可能不是每个......)形状或形式的 nc 文件。

 ncdate <- function(nc) {
    ncdims <- names(nc$dim) #Extract dimension names
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime",
                                          "date", "Date"))[1]] # Pick the time dimension
    ntstep <-nc$dim[[timevar]]$len
    tm <- ncvar_get(nc, timevar) # Extract the timestep count
    tunits <- ncatt_get(nc, timevar, "units") # Extract the long name of units
    tspace <- tm[2] - tm[1] # Calculate time period between two timesteps, for the "by" argument 
    tstr <- strsplit(tunits$value, " ") # Extract string components of the time unit
    a<-unlist(tstr[1]) # Isolate the unit .i.e. seconds, hours, days etc.
    uname <- a[which(a %in% c("seconds","hours","days"))[1]] # Check unit
    startd <- as.POSIXct(gsub(paste(uname,'since '),'',tunits$value),format="%Y-%m-%d %H:%M:%S") ## Extract the start / origin date
    tmulti <- 3600 # Declare hourly multiplier for date
    if (uname == "days") tmulti =86400 # Declare daily multiplier for date
    ## Rename "seconds" to "secs" for "by" argument and change the multiplier.
    if (uname == "seconds") {
        uname <- "secs"
        tmulti <- 1 }
    byt <- paste(tspace,uname) # Define the "by" argument
    if (byt == "0.0416666679084301 days") { ## If the unit is "days" but the "by" interval is in hours
    byt= "1 hour"                       ## R won't understand "by < 1" so change by and unit to hour.
    uname = "hours"}
    datev <- seq(from=as.POSIXct(startd+tm[1]*tmulti),by= byt, units=uname,length=ntstep)
}

Edit编辑

To address the flaw highlighted by @AF7's comment that the above code would only work for regularly spaced files, datev could be calculated as为了解决@AF7 的评论所强调的缺陷,即上述代码仅适用于规则间隔的文件, datev可以计算为

 datev <- as.POSIXct(tm*tmulti,origin=startd)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM