简体   繁体   English

当“band”维度包含 band + time 信息时,将 geotiff 与星号“along”堆叠在一起

[英]stack geotiff with stars 'along' when 'band' dimension contains band + time information

I have a timeseries of geotiff files I'd like to stack in R using stars.我有一个时间序列的 geotiff 文件,我想使用星号将其堆叠在 R 中。 Here's the first two:这是前两个:

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(stars)
stars::read_stars(urls, along="time")

Errors with:错误:

Error in c.stars_proxy(`3` = list(gep01.t00z.pgrb2a.0p50.f003.tif = "/vsicurl/https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/noaa/gefs-v12/cogs/gefs.20221201/gep01.t00z.pgrb2a.0p50.f003.tif"),  : 
  don't know how to merge arrays: please specify parameter along

Context: bands contain both time+band info上下文:波段包含时间+波段信息

This fails because the dimensions do not match, which happens because the files have concatenated temporal information into the band names:这失败了,因为维度不匹配,这是因为文件将时间信息连接到波段名称中:

x<- lapply(urls, read_stars)
x

produces:产生:

[[1]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                                       Min.  1st Qu. Median     Mean  3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f003.ti...  50026.01 98094.81 101138 98347.42 101845.2 104605.2
dimension(s):
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:3 hour fcst,...,DLWRF:surface:0-3 hour ave fcst    

[[2]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                                       Min.  1st Qu.   Median     Mean 3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f006.ti...  50029.83 98101.83 101170.6 98337.52  101825 104588.2
dimension(s):
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:6 hour fcst,...,DLWRF:surface:0-6 hour ave fcst    

Note the band names would align except for the existence of the timestamp being tacked on, eg PRES:surface:3 hour fcst vs PRES:surface:6 hour fcst .请注意,波段名称将对齐,除了附加的时间戳,例如PRES:surface:3 hour fcstPRES:surface:6 hour fcst

How can I best read in these files so that I have dimensions of x,y,band, and time in my stars object?我怎样才能最好地阅读这些文件,以便在我的星星 object 中具有 x、y、波段和时间的维度?

alternatives: terra?备选方案:terra?

How about terra ? terra怎么样? Note that terra is happy to read these files in directly, but treats this as 16 unique bands.请注意, terra很乐意直接读取这些文件,但将其视为 16 个独特的波段。 Can I re-align that so that I have the original 8 bands along a new "time" dimension?我可以重新对齐它,以便在新的“时间”维度上拥有原始的 8 个波段吗? (I recognize stars emphasizes 'spatio-temporal', maybe the such a cube is out of scope to terra?) Also note that terra for some reason mangles the timestamp in these band names: (我知道 stars 强调“时空”,也许这样的立方体是从 scope 到 terra 的?)还要注意 terra 出于某种原因破坏了这些乐队名称中的时间戳:

x <- terra::rast(urls)
x
class       : SpatRaster 
dimensions  : 361, 720, 16  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat Coordinate System imported from GRIB file 
sources     : gep01.t00z.pgrb2a.0p50.f003.tif  (8 layers) 
              gep01.t00z.pgrb2a.0p50.f006.tif  (8 layers) 
names       : PRES:~ fcst, TMP:2~ fcst, RH:2 ~ fcst, UGRD:~ fcst, VGRD:~ fcst, APCP:~ fcst, .

Just wanted to share some additional possible solutions for comparison.只是想分享一些其他可能的解决方案以进行比较。 With larger numbers of files some of these differences become more relevant.随着文件数量的增加,其中一些差异变得更加相关。 this expands a bit beyond my original question.这超出了我原来的问题。

terra大地

Prof Hijmans gives a very nice solution in terra. Hijmans 教授在 terra 中给出了一个非常好的解决方案。 He also asked about the original upstream sources, which I didn't explain properly -- these are originally GRIB files for NOAA GEFS forecast .他还询问了原始上游资源,我没有正确解释——这些最初是NOAA GEFS 预报的 GRIB 文件。

Notably, we can work directly from the GRIB files.值得注意的是,我们可以直接从 GRIB 文件开始工作。 GEFS is a 35-day forecast, so let's try going more than 6 hrs into the future: GEFS 是一个 35 天的预测,所以让我们尝试进入未来 6 小时以上:

library(terra)

# original GRIB sources, AWS mirror
gribs <- paste0("/vsicurl/https://noaa-gefs-pds.s3.amazonaws.com/gefs.20220314/00/atmos/pgrb2ap5/geavg.t00z.pgrb2a.0p50.f",
                stringr::str_pad(seq(3,240,by=3), 3, pad="0"))

bench::bench_time({
  cube <- terra::sds(gribs)
})

cube[1,63] |> plot()

世界预测 tmp

very nice!非常好!

gdalcubes立方体

gdalcubes is another package that can also leverage the gdal virtual filesystem when working with these large-ish remote files. gdalcubes是另一个 package,它在处理这些大型远程文件时也可以利用 gdal 虚拟文件系统。 It also lets us define an abstract cube at potentially a different resolution in space & time than the original sources (averaging or interpolating).它还允许我们在空间和时间上与原始源(平均或插值)可能不同的分辨率下定义一个抽象立方体。 lazy operations mean this may run a bit faster(?)惰性操作意味着这可能会运行得更快(?)

library(gdalcubes)
date <- as.Date("2023-01-26")
date_time = date + lubridate::hours(seq(3,240,by=3))

# USA box
v <- cube_view(srs = "EPSG:4326", 
               extent = list(left = -125, right = -66,top = 49, bottom = 25,
                             t0= as.character(min(date_time)), t1=as.character(max(date_time))),
               dx = 0.5, dy = 0.5, dt = "PT3H")

gribs <- paste0("/vsicurl/https://noaa-gefs-pds.s3.amazonaws.com/gefs.20220314/00/atmos/pgrb2ap5/geavg.t00z.pgrb2a.0p50.f",
                stringr::str_pad(seq(3,240,by=3), 3, pad="0"))

bench::bench_time({
  cube <- gdalcubes::create_image_collection(gribs, date_time = date_time)
})

bench::bench_time({
  raster_cube(cube, v) |>
    select_bands("band63") |> # tempearture
    animate(col = viridisLite::viridis, nbreaks=50, fps=10, save_as = "temp.gif")
})

预测 tmp 的 gif 动画

stars星星

didn't translate a full stars example, but here at least is the band name correction;没有翻译全明星的例子,但这里至少是乐队名称的更正; a bit more cumbersome than the examples above.比上面的例子麻烦一点。

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(stars)
#stars::read_stars(urls, along="time") # no luck!


## grab unstacked proxy object for each geotiff
x <- lapply(urls, read_stars)

# extract band-names-part
band_names <- st_get_dimension_values(x[[1]], "band") |> 
  stringr::str_extract("([A-Z]+):") |>
  str_remove(":")
# apply corrected band-names
x1 <- lapply(x, st_set_dimensions, "band", band_names)

# at last, we can stack into a cube:
x1 <- do.call(c, c(x1, along="time"))

# and add correct date timestamps to the new time dimension
dates <- as.Date("2022-12-01") + lubridate::hours(c(3,6))
x1 <- st_set_dimensions(x1, "time", dates)
x1

With terra it is pretty easy to make a time-series for each variable as I show below.使用 terra 可以很容易地为每个变量制作一个时间序列,如下所示。

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(terra)
r <- rast(urls)

Extract two variables of interest提取两个感兴趣的变量

nms <- names(r)
tmp <- r[[grep("TMP", nms)]]
rh <- r[[grep("RH", nms)]]

# set time
tm <- as.POSIXct("2022-12-01", tz="GMT") + c(3,6) * 3600
time(rh) <- tm 
time(tmp) <- tm

And you could combine them into a SpatRasterDatset like this:您可以像这样将它们组合成一个 SpatRasterDatset:

s <- sds(list(tmp=tmp, rh=rh))

An alternative path to get to the same point would be to start with a SpatRasterDataset and subset it.到达同一点的另一种方法是从 SpatRasterDataset 开始并对其进行子集化。

sd <- sds(urls)
nl <- 1:length(sd)
nms <- names(sd[1])

tmp2 <- rast(sd[nl, grep("TMP", nms)])
time(tmp2) <- tm

rh2 <- rast(sd[nl, grep("RH", nms)])
time(rh2) <- tm

I made the subsetting work a little nicer in terra version 1.7-5我在 terra 版本 1.7-5 中使子集化工作得更好一些

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/", c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(terra)
#terra 1.7.5
sd <- sds(urls)
tmp <- sd[,2]

tmp
#class       : SpatRaster 
#dimensions  : 361, 720, 2  (nrow, ncol, nlyr)
#resolution  : 0.5, 0.5  (x, y)
#extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat Coordinate System imported from GRIB file 
#sources     : gep01.t00z.pgrb2a.0p50.f003.tif  
#              gep01.t00z.pgrb2a.0p50.f006.tif  
#names       : TMP:2 m above g~Temperature [C], TMP:2 m above g~Temperature [C] 
#unit        :                               C,                               C 
#time        : 2022-12-01 03:00:00 to 2022-12-01 06:00:00 UTC 

As for the layer names containing the forecast time, that is just because that is what is in the tif metadata.至于包含预报时间的图层名称,那只是因为那是 tif 元数据中的内容。 It looks like that was a decision made when they were created from the original GRIB files.看起来这是在从原始 GRIB 文件创建它们时做出的决定。

The latitude extent going beyond the north and south poles is an interesting feature of this dataset.超出北极和南极的纬度范围是该数据集的一个有趣特征。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM