Execution and parallelization

Lazy evaluation

Creating data cubes from image collections or deriving data cubes from existing data cubes using the built-in operations is computationally cheap because no pixel values will be read and no complex computations will be started. Instead, gdalcubes uses proxy objects that simply know the shape of a data cube, what operations they refer to, and where the data comes from. Actual computations are started when users explicitly read pixels from a data cube, e.g., by calling plot, write_ncdf, write_tif, as_array, or similar operations.

This concept is sometimes referred to as lazy evaluation and allows for a few optimizations when operations on data cubes are chained. For example, when plotting RGB bands only, it is possible to avoid reading any files that relate to other bands.

Chaining data cube operations will create a so called process graph that can be converted to a JSON format. The format is not particularly important for users but it allows to save virtual data cube objects (no pixel data) to disk (see here).

Below you can find an example process graph for computing the median NDVI of pixel time series over some area in the Brazilian Amazon forest.

{
  "cube_type": "reduce_time",
  "in_cube": {
      "band_names": [
      "NDVI"
      ],
      "cube_type": "apply_pixel",
      "expr": [
      "(b05-b04)/(b05+b04)"
      ],
      "in_cube": {
      "bands": [
          "B04",
          "B05"
      ],
      "cube_type": "select_bands",
      "in_cube": {
          "chunk_size": [
          1,
          256,
          256
          ],
          "cube_type": "image_collection",
          "file": "/tmp/RtmpacmRAy/file11af57b771e8.sqlite",
          "view": {
          "aggregation": "median",
          "resampling": "bilinear",
          "space": {
              "bottom": -550000.0,
              "left": -6180000.0,
              "nx": 2000,
              "ny": 2000,
              "right": -6080000.0,
              "srs": "EPSG:3857",
              "top": -450000.0
          },
          "time": {
              "dt": "P1Y",
              "t0": "2014",
              "t1": "2018"
          }
          },
          "warp_args": []
      }
      },
      "keep_bands": false
  },
  "reducer_bands": [
      [
      "median",
      "NDVI"
      ]
  ]
}

Multidimensional chunking

To avoid running out of memory and to allow for parallel processing, gdalcubes divides data cubes into smaller chunks. A chunk is defined by its size in the temporal and spatial dimensions and contains all bands / variables. Internally, a chunk is simply stored as a four dimensional numeric (double-precision floating-point number) array.

Empty chunks that do not contain any data in most cases do not consume any memory. Data cubes in many case contain quite a large number of empty chunks for example when there is simply no satellite image available in a collection that intersects with the spatiotemporal extent of the chunk. Most data cube operations propagate empty chunks, i.e., they check if a chunk of the input cube is empty and simply forward the chunk without doing expensive computations. As a result, the data model can be seen as a semi-sparse array. One important consequence is that in many cases, using a higher temporal resolution (e.g. daily in case of Sentinel-2 data with 5 days revisit time) does not cause computations to take significantly longer.

While a data cube is being processed, gdalcubes simply iterates over its chunks. If the cube is the result from chained operations, the process graph is automatically traversed backwards and each chunk knows which chunks from the input cubes must be read or which images must be read.

Defining custom chunk sizes in R

A custom chunk size of data cubes can be set either as an additional argument chunking to the raster_cube() function, or as a global package default variable using gdalcubes_options(). Sizes are defined as an integer vector with 3 elements, representing the size in t, y, and x dimensions (in this order). For gdalcubes_options(), it is also possible to define a function that calculates the chunk size based on the size of the input cube.

The following examples show how to define a custom chunk size either for a specific cube, or as a global package variable.

L8.col = image_collection(file.path(tempdir(), "L8.db"))
v = cube_view(extent=L8.col, srs="EPSG:32618", nx = 497, ny=526, dt="P1M")
raster_cube(L8.col, v, chunking = c(1,512,512))
gdalcubes_options(default_chunksize = c(1, 512, 512))

Parallel execution

gdalcubes has built-in support for processing data cubes in parallel using multiple CPU cores. Users can define the maximum number of worker processes as a global package variable (see below). Chunks of the data cube will then be distributed among the worker processes.

gdalcubes_options(parallel = 8)

Practical considerations

The size of data cube chunks has a strong effect on computation times and memory consumption. There is no golden rule to find an optimal chunk size, because it depends on

  • available hardware,
  • input imagery (data format, tiling), and
  • particular analysis (how the data is accessed).

By default, gdalcubes tries to find an appropriate size considering the size of the cube and the number of worker processes. However, below are some practical considerations that might be helpful to find a better chunk size:

  1. The larger the chunks and the more parallel processes are being used, the more memory is consumed. If you run into out-of-memory issues / swapping, try decreasing one or both.

  2. If you find that the number of true processes used is lower than set, the data cube might have less chunks than worker processes. You can try to decrease the chunk size to improve parallelization.

  3. If the computations are I/O-bound (most time is required for disk reads / writes or downloads), try to find out the block size of the image data (e.g., with the gdalinfo utility) and use this as a chunk size. You also might want to try out different GDAL configuration options to define how downloads and/or memory blocks are being cached.

  4. There are only very few cases where chunks with temporal size greater than one improve computation times, so there is likely no need to change this.