简体   繁体   中英

Significantly different speeds on raster (netCDF) calculations in R

I have some WRF output data that was subsetted and masked using pythons xarray module.

I'm now performing calculations on raster bricks using R's raster package and finding very different speeds for very similar files.

Knowns:

  1. There are 3 netCDF files, all the exact same size - 9.47 GB, that contain 9 variables
  2. They all have the exact same dimensions (nrow 327, ncol 348, nlayer 365)
  3. All calculations are on individual files (layer calculations)
  4. All calculations are on the same variable with the same values (except for the second which is masked)

     system.time(sum(d97[[1:365]])) user system elapsed 5.428 2.771 8.840 

The second file is the exact same file but a masked portion, with all the masked values converted to NaN.

system.time(sum(masked_d97[[1:365]]))
user  system elapsed 
10.784   2.157  13.052 

The last file is a slightly modified version (daily values rather than cummulative values) of the first file. It was modified using Xarray in Python.

 system.time(sum(mod_d97[[1:365]]))
 user  system elapsed 
 22.015   1.773  24.474

What on earth is happening here? I'm happy to provide more details (code, ncdumps, etc) as requested.

EDIT: added str() of files

d97 <- brick(files[8], varname = "TMIN")
masked_97 <- brick(files[3], varname = "TMIN")
d03 <- brick(files[11], varname = "TMIN")

str(d97)

Formal class 'RasterBrick' [package "raster"] with 12 slots
..@ file    :Formal class '.RasterFile' [package "raster"] with 13 slots
.. .. ..@ name        : chr "/Users/charlesbecker/Desktop/Data/Project Data/Shiny/WY1997_yearly_stats.nc"
.. .. ..@ datanotation: chr "FLT4S"
.. .. ..@ byteorder   : chr "little"
.. .. ..@ nodatavalue : num NaN
.. .. ..@ NAchanged   : logi FALSE
.. .. ..@ nbands      : int 365
.. .. ..@ bandorder   : chr "BIL"
.. .. ..@ offset      : int 0
.. .. ..@ toptobottom : logi TRUE
.. .. ..@ blockrows   : int 0
.. .. ..@ blockcols   : int 0
.. .. ..@ driver      : chr "netcdf"
.. .. ..@ open        : logi FALSE
..@ data    :Formal class '.MultipleRasterData' [package "raster"] with 14 slots
.. .. ..@ values    : logi[0 , 0 ] 
.. .. ..@ offset    : num 0
.. .. ..@ gain      : num 1
.. .. ..@ inmemory  : logi FALSE
.. .. ..@ fromdisk  : logi TRUE
.. .. ..@ nlayers   : int 365
.. .. ..@ dropped   : NULL
.. .. ..@ isfactor  : logi FALSE
.. .. ..@ attributes: list()
.. .. ..@ haveminmax: logi FALSE
.. .. ..@ min       : num [1:365] Inf Inf Inf Inf Inf ...
.. .. ..@ max       : num [1:365] -Inf -Inf -Inf -Inf -Inf ...
.. .. ..@ unit      : chr "K"
.. .. ..@ names     : chr [1:365] "X1" "X2" "X3" "X4" ...
..@ legend  :Formal class '.RasterLegend' [package "raster"] with 5 slots
.. .. ..@ type      : chr(0) 
.. .. ..@ values    : logi(0) 
.. .. ..@ color     : logi(0) 
.. .. ..@ names     : logi(0) 
.. .. ..@ colortable: logi(0) 
..@ title   : chr "TMIN"
..@ extent  :Formal class 'Extent' [package "raster"] with 4 slots
.. .. ..@ xmin: num 0.5
.. .. ..@ xmax: num 348
.. .. ..@ ymin: num 0.5
.. .. ..@ ymax: num 328
..@ rotated : logi FALSE
..@ rotation:Formal class '.Rotation' [package "raster"] with 2 slots
.. .. ..@ geotrans: num(0) 
.. .. ..@ transfun:function ()  
    ..@ ncols   : int 348
..@ nrows   : int 327
..@ crs     :Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr NA
..@ history : list()
..@ z       :List of 1
.. ..$ : int [1:365] 1 2 3 4 5 6 7 8 9 10 ...

str(masked_d97)

Formal class 'RasterBrick' [package "raster"] with 12 slots
..@ file    :Formal class '.RasterFile' [package "raster"] with 13 slots
.. .. ..@ name        : chr "/Users/charlesbecker/Desktop/Data/Project Data/Shiny/AVA_WY1997_yearly_stats.nc"
.. .. ..@ datanotation: chr "FLT4S"
.. .. ..@ byteorder   : chr "little"
.. .. ..@ nodatavalue : num NaN
.. .. ..@ NAchanged   : logi FALSE
.. .. ..@ nbands      : int 365
.. .. ..@ bandorder   : chr "BIL"
.. .. ..@ offset      : int 0
.. .. ..@ toptobottom : logi TRUE
.. .. ..@ blockrows   : int 0
.. .. ..@ blockcols   : int 0
.. .. ..@ driver      : chr "netcdf"
.. .. ..@ open        : logi FALSE
..@ data    :Formal class '.MultipleRasterData' [package "raster"] with 14 slots
.. .. ..@ values    : logi[0 , 0 ] 
.. .. ..@ offset    : num 0
.. .. ..@ gain      : num 1
.. .. ..@ inmemory  : logi FALSE
.. .. ..@ fromdisk  : logi TRUE
.. .. ..@ nlayers   : int 365
.. .. ..@ dropped   : NULL
.. .. ..@ isfactor  : logi FALSE
.. .. ..@ attributes: list()
.. .. ..@ haveminmax: logi FALSE
.. .. ..@ min       : num [1:365] Inf Inf Inf Inf Inf ...
.. .. ..@ max       : num [1:365] -Inf -Inf -Inf -Inf -Inf ...
.. .. ..@ unit      : chr ""
.. .. ..@ names     : chr [1:365] "X1" "X2" "X3" "X4" ...
..@ legend  :Formal class '.RasterLegend' [package "raster"] with 5 slots
.. .. ..@ type      : chr(0) 
.. .. ..@ values    : logi(0) 
.. .. ..@ color     : logi(0) 
.. .. ..@ names     : logi(0) 
.. .. ..@ colortable: logi(0) 
..@ title   : chr "TMIN"
..@ extent  :Formal class 'Extent' [package "raster"] with 4 slots
.. .. ..@ xmin: num 0.5
.. .. ..@ xmax: num 348
.. .. ..@ ymin: num 0.5
.. .. ..@ ymax: num 328
..@ rotated : logi FALSE
..@ rotation:Formal class '.Rotation' [package "raster"] with 2 slots
.. .. ..@ geotrans: num(0) 
.. .. ..@ transfun:function ()  
    ..@ ncols   : int 348
..@ nrows   : int 327
..@ crs     :Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr NA
..@ history : list()
..@ z       :List of 1
.. ..$ : int [1:365] 1 2 3 4 5 6 7 8 9 10 ...

str(d03)

Formal class 'RasterBrick' [package "raster"] with 12 slots
..@ file    :Formal class '.RasterFile' [package "raster"] with 13 slots
.. .. ..@ name        : chr "/Users/charlesbecker/Desktop/Data/Project Data/Shiny/WY2003_yearly_stats.nc"
.. .. ..@ datanotation: chr "FLT4S"
.. .. ..@ byteorder   : chr "little"
.. .. ..@ nodatavalue : num NaN
.. .. ..@ NAchanged   : logi FALSE
.. .. ..@ nbands      : int 365
.. .. ..@ bandorder   : chr "BIL"
.. .. ..@ offset      : int 0
.. .. ..@ toptobottom : logi TRUE
.. .. ..@ blockrows   : int 0
.. .. ..@ blockcols   : int 0
.. .. ..@ driver      : chr "netcdf"
.. .. ..@ open        : logi FALSE
..@ data    :Formal class '.MultipleRasterData' [package "raster"] with 14 slots
.. .. ..@ values    : logi[0 , 0 ] 
.. .. ..@ offset    : num 0
.. .. ..@ gain      : num 1
.. .. ..@ inmemory  : logi FALSE
.. .. ..@ fromdisk  : logi TRUE
.. .. ..@ nlayers   : int 365
.. .. ..@ dropped   : NULL
.. .. ..@ isfactor  : logi FALSE
.. .. ..@ attributes: list()
.. .. ..@ haveminmax: logi FALSE
.. .. ..@ min       : num [1:365] Inf Inf Inf Inf Inf ...
.. .. ..@ max       : num [1:365] -Inf -Inf -Inf -Inf -Inf ...
.. .. ..@ unit      : chr "K"
.. .. ..@ names     : chr [1:365] "X1" "X2" "X3" "X4" ...
..@ legend  :Formal class '.RasterLegend' [package "raster"] with 5 slots
.. .. ..@ type      : chr(0) 
.. .. ..@ values    : logi(0) 
.. .. ..@ color     : logi(0) 
.. .. ..@ names     : logi(0) 
.. .. ..@ colortable: logi(0) 
..@ title   : chr "TMIN"
..@ extent  :Formal class 'Extent' [package "raster"] with 4 slots
.. .. ..@ xmin: num 0.5
.. .. ..@ xmax: num 348
.. .. ..@ ymin: num 0.5
.. .. ..@ ymax: num 328
..@ rotated : logi FALSE
..@ rotation:Formal class '.Rotation' [package "raster"] with 2 slots
.. .. ..@ geotrans: num(0) 
.. .. ..@ transfun:function ()  
    ..@ ncols   : int 348
..@ nrows   : int 327
..@ crs     :Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr NA
..@ history : list()
..@ z       :List of 1
.. ..$ : int [1:365] 1 2 3 4 5 6 7 8 9 10 ...

system.time(sum(d97[[1:365]]))
user  system elapsed 
5.569   2.219   8.048

system.time(sum(masked_97[[1:365]]))
user  system elapsed 
11.887   2.342  14.569

system.time(sum(d03[[1:365]]))
user  system elapsed 
22.253   1.772  24.879

The most likely difference is that data in your new netCDF file is now compressed differently. Two forms of compression are common with netCDF files:

  • scale/offset encoding, eg, to decode from int16 via a formula like scale_factor * values + add_offset .
  • zlib compression on individual chunks of the array (only supported with netCDF4 files).

If you don't slice or manipulate your variables, xarray will preserve compression setting via the encoding attribute, but this is generally dropped by xarray operations. See the xarray docs on reading/writing encoded data for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM