当“band”维度包含 band + time 信息时,将 geotiff 与星号“along”堆叠在一起

[英]stack geotiff with stars 'along' when 'band' dimension contains band + time information

I have a timeseries of geotiff files I'd like to stack in R using stars.我有一个时间序列的 geotiff 文件,我想使用星号将其堆叠在 R 中。 Here's the first two:这是前两个:

urls <- paste0("/vsicurl/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

stars::read_stars(urls, along="time")

Errors with:错误:

Error in c.stars_proxy(`3` = list(gep01.t00z.pgrb2a.0p50.f003.tif = "/vsicurl/https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/noaa/gefs-v12/cogs/gefs.20221201/gep01.t00z.pgrb2a.0p50.f003.tif"),  : 
  don't know how to merge arrays: please specify parameter along

Context: bands contain both time+band info上下文:波段包含时间+波段信息

This fails because the dimensions do not match, which happens because the files have concatenated temporal information into the band names:这失败了,因为维度不匹配,这是因为文件将时间信息连接到波段名称中:

x<- lapply(urls, read_stars)


stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                                       Min.  1st Qu. Median     Mean  3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f003.ti...  50026.01 98094.81 101138 98347.42 101845.2 104605.2
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:3 hour fcst,...,DLWRF:surface:0-3 hour ave fcst    

stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                                       Min.  1st Qu.   Median     Mean 3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f006.ti...  50029.83 98101.83 101170.6 98337.52  101825 104588.2
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:6 hour fcst,...,DLWRF:surface:0-6 hour ave fcst    

Note the band names would align except for the existence of the timestamp being tacked on, eg PRES:surface:3 hour fcst vs PRES:surface:6 hour fcst .请注意,波段名称将对齐,除了附加的时间戳,例如PRES:surface:3 hour fcstPRES:surface:6 hour fcst

How can I best read in these files so that I have dimensions of x,y,band, and time in my stars object?我怎样才能最好地阅读这些文件,以便在我的星星 object 中具有 x、y、波段和时间的维度?

alternatives: terra?备选方案:terra?

How about terra ? terra怎么样? Note that terra is happy to read these files in directly, but treats this as 16 unique bands.请注意, terra很乐意直接读取这些文件,但将其视为 16 个独特的波段。 Can I re-align that so that I have the original 8 bands along a new "time" dimension?我可以重新对齐它,以便在新的“时间”维度上拥有原始的 8 个波段吗? (I recognize stars emphasizes 'spatio-temporal', maybe the such a cube is out of scope to terra?) Also note that terra for some reason mangles the timestamp in these band names: (我知道 stars 强调“时空”,也许这样的立方体是从 scope 到 terra 的?)还要注意 terra 出于某种原因破坏了这些乐队名称中的时间戳:

x <- terra::rast(urls)
class       : SpatRaster 
dimensions  : 361, 720, 16  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat Coordinate System imported from GRIB file 
sources     : gep01.t00z.pgrb2a.0p50.f003.tif  (8 layers) 
              gep01.t00z.pgrb2a.0p50.f006.tif  (8 layers) 
names       : PRES:~ fcst, TMP:2~ fcst, RH:2 ~ fcst, UGRD:~ fcst, VGRD:~ fcst, APCP:~ fcst, .

Just wanted to share some additional possible solutions for comparison.只是想分享一些其他可能的解决方案以进行比较。 With larger numbers of files some of these differences become more relevant.随着文件数量的增加,其中一些差异变得更加相关。 this expands a bit beyond my original question.这超出了我原来的问题。


Prof Hijmans gives a very nice solution in terra. Hijmans 教授在 terra 中给出了一个非常好的解决方案。 He also asked about the original upstream sources, which I didn't explain properly -- these are originally GRIB files for NOAA GEFS forecast .他还询问了原始上游资源,我没有正确解释——这些最初是NOAA GEFS 预报的 GRIB 文件。

Notably, we can work directly from the GRIB files.值得注意的是,我们可以直接从 GRIB 文件开始工作。 GEFS is a 35-day forecast, so let's try going more than 6 hrs into the future: GEFS 是一个 35 天的预测,所以让我们尝试进入未来 6 小时以上:


# original GRIB sources, AWS mirror
gribs <- paste0("/vsicurl/https://noaa-gefs-pds.s3.amazonaws.com/gefs.20220314/00/atmos/pgrb2ap5/geavg.t00z.pgrb2a.0p50.f",
                stringr::str_pad(seq(3,240,by=3), 3, pad="0"))

  cube <- terra::sds(gribs)

cube[1,63] |> plot()

世界预测 tmp

very nice!非常好!


gdalcubes is another package that can also leverage the gdal virtual filesystem when working with these large-ish remote files. gdalcubes是另一个 package,它在处理这些大型远程文件时也可以利用 gdal 虚拟文件系统。 It also lets us define an abstract cube at potentially a different resolution in space & time than the original sources (averaging or interpolating).它还允许我们在空间和时间上与原始源(平均或插值)可能不同的分辨率下定义一个抽象立方体。 lazy operations mean this may run a bit faster(?)惰性操作意味着这可能会运行得更快(?)

date <- as.Date("2023-01-26")
date_time = date + lubridate::hours(seq(3,240,by=3))

# USA box
v <- cube_view(srs = "EPSG:4326", 
               extent = list(left = -125, right = -66,top = 49, bottom = 25,
                             t0= as.character(min(date_time)), t1=as.character(max(date_time))),
               dx = 0.5, dy = 0.5, dt = "PT3H")

gribs <- paste0("/vsicurl/https://noaa-gefs-pds.s3.amazonaws.com/gefs.20220314/00/atmos/pgrb2ap5/geavg.t00z.pgrb2a.0p50.f",
                stringr::str_pad(seq(3,240,by=3), 3, pad="0"))

  cube <- gdalcubes::create_image_collection(gribs, date_time = date_time)

  raster_cube(cube, v) |>
    select_bands("band63") |> # tempearture
    animate(col = viridisLite::viridis, nbreaks=50, fps=10, save_as = "temp.gif")

预测 tmp 的 gif 动画


didn't translate a full stars example, but here at least is the band name correction;没有翻译全明星的例子,但这里至少是乐队名称的更正; a bit more cumbersome than the examples above.比上面的例子麻烦一点。

urls <- paste0("/vsicurl/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

#stars::read_stars(urls, along="time") # no luck!

## grab unstacked proxy object for each geotiff
x <- lapply(urls, read_stars)

# extract band-names-part
band_names <- st_get_dimension_values(x[[1]], "band") |> 
  stringr::str_extract("([A-Z]+):") |>
# apply corrected band-names
x1 <- lapply(x, st_set_dimensions, "band", band_names)

# at last, we can stack into a cube:
x1 <- do.call(c, c(x1, along="time"))

# and add correct date timestamps to the new time dimension
dates <- as.Date("2022-12-01") + lubridate::hours(c(3,6))
x1 <- st_set_dimensions(x1, "time", dates)

With terra it is pretty easy to make a time-series for each variable as I show below.使用 terra 可以很容易地为每个变量制作一个时间序列,如下所示。

urls <- paste0("/vsicurl/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

r <- rast(urls)

Extract two variables of interest提取两个感兴趣的变量

nms <- names(r)
tmp <- r[[grep("TMP", nms)]]
rh <- r[[grep("RH", nms)]]

# set time
tm <- as.POSIXct("2022-12-01", tz="GMT") + c(3,6) * 3600
time(rh) <- tm 
time(tmp) <- tm

And you could combine them into a SpatRasterDatset like this:您可以像这样将它们组合成一个 SpatRasterDatset:

s <- sds(list(tmp=tmp, rh=rh))

An alternative path to get to the same point would be to start with a SpatRasterDataset and subset it.到达同一点的另一种方法是从 SpatRasterDataset 开始并对其进行子集化。

sd <- sds(urls)
nl <- 1:length(sd)
nms <- names(sd[1])

tmp2 <- rast(sd[nl, grep("TMP", nms)])
time(tmp2) <- tm

rh2 <- rast(sd[nl, grep("RH", nms)])
time(rh2) <- tm

I made the subsetting work a little nicer in terra version 1.7-5我在 terra 版本 1.7-5 中使子集化工作得更好一些

urls <- paste0("/vsicurl/",
"noaa/gefs-v12/cogs/gefs.20221201/", c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

#terra 1.7.5
sd <- sds(urls)
tmp <- sd[,2]

#class       : SpatRaster 
#dimensions  : 361, 720, 2  (nrow, ncol, nlyr)
#resolution  : 0.5, 0.5  (x, y)
#extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat Coordinate System imported from GRIB file 
#sources     : gep01.t00z.pgrb2a.0p50.f003.tif  
#              gep01.t00z.pgrb2a.0p50.f006.tif  
#names       : TMP:2 m above g~Temperature [C], TMP:2 m above g~Temperature [C] 
#unit        :                               C,                               C 
#time        : 2022-12-01 03:00:00 to 2022-12-01 06:00:00 UTC 

As for the layer names containing the forecast time, that is just because that is what is in the tif metadata.至于包含预报时间的图层名称,那只是因为那是 tif 元数据中的内容。 It looks like that was a decision made when they were created from the original GRIB files.看起来这是在从原始 GRIB 文件创建它们时做出的决定。

The latitude extent going beyond the north and south poles is an interesting feature of this dataset.超出北极和南极的纬度范围是该数据集的一个有趣特征。

