Data cube operations

gdalcubes includes some built-in functions to process data cubes. The table below lists available operations that take one (or more) data cube(s) and produce a derived data cube as output. Depending on the operation and their parameters, the output data cube may have different shape.

Built-in operations

Table 1: Built-in data cube operations
Operation	Description
`aggregate_space`	Reduce spatial resolution of a cube by applying a spatial aggregation function.
`aggregate_time`	Aggregate and/or regularize time series.
`apply_pixel`	Apply an arithmetic expression to all data cube pixels.
`crop`	Extract a rectangular spatial / temporal / spatiotemporal window.
`fill_time`	Fill missing values of a data cube by simple time series interpolation.
`filter_geom`	Filter pixels by a a spatial polygon.
`filter_pixel`	Filter pixels by a logical expressions on band values.
`join_bands`	Combine bands of two identically shaped data cubes.
`reduce_space`	Apply a reducer function to all spatial slices of a data cube.
`reduce_time`	Apply a reducer function to all pixel time series.
`select_bands`	Select specific bands of a data cube.
`select_time`	Select irregular time slices of a data cube.
`slice_space`	Select a single time series of a data cube.
`slice_time`	Select a single time slice of a data cube.
`window_time`	Apply a moving window aggregate or convolution kernel to all pixel time series.

Notice that results of these operations are still data cubes. There are other functions, e.g, extracting irregular points from data cubes / or computing zonal statistics resulting in a different output type (see Reference).

Creating data cubes and calling operations on data cubes is very fast, because no image data is read and actual computations are delayed. Instead, operations return a proxy object or a virtual data cube without running any expensive computations on the data (see Execution).

Chaining of operations

Operations can be chained and still simply produce proxy objects without any expensive computations involved. Chained processes produce a process graph that simply memorizes what operations need to be executed and where the data comes from. Actual computations are delayed until users call a method that reads data of the resulting data cube (such as plotting, or exporting a data cube as a file, see Data cube export). Below, you can find some examples how data cube operations can be chained programmatically.

Examples

In R, chaining operations can be written nicely using the pipe symbol |> (or %>% from the magrittr package), as used in the following examples.

Median NDVI over time series

col = image_collection("/path/to/collection.db")
v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D")
                        
raster_cube(col, v) |>
 select_bands(c("B04", "B08")) |>
 apply_pixel("(B08-B04)/(B08+B04)","NDVI") |>
 reduce_time("median(NDVI)")

Simple RGB composite for a polygon

col = image_collection("/path/to/collection.db")
v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D")
                        
raster_cube(col, v) |>
 select_bands(c("B02", "B03", "B04")) |>
 filter_geom(sf::read_sf("polygon_geometry.gpkg")$geometry) |>
 reduce_time(c("median(B02)", "median(B03)", "median(B04)")

Count large NDVI values within a forest area

col = image_collection("/path/to/collection.db")
v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D")
                        
raster_cube(col, v) |>
 select_bands(c("B04", "B08")) |>
 apply_pixel("(B08-B04)/(B08+B04)","NDVI") |>
 filter_pixel("NDVI > 0.8") |>
 filter_geom(sf::read_sf("forest_boundaries.gpkg")$geometry) |>
 reduce_space("count(NDVI)")

Data cube export

Data cubes (including the output of operations, see above) can be exported as single netCDF files, or as a collection of (possibly cloud-optimized) GeoTIFF files.

NetCDF export

Produced netCDF files use the enhanced netCDF-4 data model, follow the CF-1.6 conventions and include bands of the data cube as variables. Dimensions are in the order time, y, x or time, lat, lon respectively.

Users can pass additional options to control properties of the resulting netCDF file:

DEFLATE compression can be specified by integer values between 0 (no compression) and 9 (maximum compression).
Users can furthermore enable or disable the inclusion of boundary variables in the output netCDF file. These variables give exact upper and lower limits of dimension values (e.g. start and end time of a pixel).

GeoTIFF export

Exporting a data cube as a collection of GeoTIFF files will produce one multiband GeoTIFF file per time slice of the data cube. Files will be named by a user defined prefix and the date/time of the slice, such as “cube_2018-01.tif”.

Users can pass additional options to control properties of the resulting GeoTIFF files:

Setting overviews = TRUE will generate GDAL overviews (image pyramids) for the resulting files. Overview levels reduce the pixel size by powers of 2, until width and height are lower than or equal to 256. The resampling algorithm for generating overviews can be selected from what GDAL offers (see here, default is nearest neighbor).
Setting COG = TRUE will produce cloud-optimized GeoTIFF (COG) files and includes overviews = TRUE.
Additional GDAL creation options can be passed, e.g., to define LZW or DEFLATE compression (see here).

Packing data values

Both, GeoTIFF as well as netCDF export support packing data, i.e. reducing the size of the output file(s) by conversion of double values to smaller integer types using an affine transformation of original values by an offset and scale, where values are transformed according to the formula:

\[ \hat{x} = \frac{x - offset}{scale} \]

By conversion to integer, packing values reduces the precision of the data. Packing in gdalcubes requires users to pass

the output data type (one of uint8 , uint16, int16, uint32, int32, float32).
scale, offset, and nodata values (unless type is float32 ).

Nodata values of data cubes will automatically converted to provided nodata values in the target data type. Individual scale, offset, and nodata values can be specified either once (applied for all bands), or individually per band. Packing to type float32 will ignore any of the offset, scale, and nodata values but implicitly cast 8 byte doubles to 4 byte float values.

Saving virtual data cubes as JSON files

Data cube and image collection objects cannot be saved and restored across different R sessions using save() and load(), because they simply point to a memory address that is destroyed when the R session ends (internally they are external pointers). However, since the creation of objects is relatively cheap, it is possible to serialize and store the process graph as a lightweight JSON file using as_json() and recreate a cube from a file with json_cube().

# Data cube operations gdalcubes includes some built-in functions to process data cubes. The table below lists available operations that take one (or more) data cube(s) and produce a derived data cube as output. Depending on the operation and their parameters, the output data cube may have different shape. ## Built-in operations | Operation | Description | |:--------------------|:-----------------------------------------------------------------------------------| |`aggregate_space` | Reduce spatial resolution of a cube by applying a spatial aggregation function. | |`aggregate_time` | Aggregate and/or regularize time series. | |`apply_pixel` | Apply an arithmetic expression to all data cube pixels. | |`crop` | Extract a rectangular spatial / temporal / spatiotemporal window. | |`fill_time` | Fill missing values of a data cube by simple time series interpolation. | |`filter_geom` | Filter pixels by a a spatial polygon. | |`filter_pixel` | Filter pixels by a logical expressions on band values. | |`join_bands` | Combine bands of two identically shaped data cubes. | |`reduce_space` | Apply a reducer function to all spatial slices of a data cube. | |`reduce_time` | Apply a reducer function to all pixel time series. | |`select_bands` | Select specific bands of a data cube. | |`select_time` | Select irregular time slices of a data cube. | |`slice_space` | Select a single time series of a data cube. | |`slice_time` | Select a single time slice of a data cube. | |`window_time` | Apply a moving window aggregate or convolution kernel to all pixel time series. | : Built-in data cube operations {#tbl-operations} Notice that results of these operations are still data cubes. There are other functions, e.g, extracting irregular points from data cubes / or computing zonal statistics resulting in a different output type (see [Reference](../reference/index.qmd)). Creating data cubes and calling operations on data cubes is very fast, because no image data is read and actual computations are delayed. Instead, operations return a *proxy* object or a _virtual_ data cube without running any expensive computations on the data (see [Execution](execution.qmd)). ## Chaining of operations Operations can be chained and still simply produce proxy objects without any expensive computations involved. Chained processes produce a _process graph_ that simply memorizes what operations need to be executed and where the data comes from. Actual computations are delayed until users call a method that reads data of the resulting data cube (such as plotting, or exporting a data cube as a file, see [Data cube export](#export)). Below, you can find some examples how data cube operations can be chained programmatically. ## Examples In R, chaining operations can be written nicely using the pipe symbol `|>` (or `%>%` from the [`magrittr` package](https://cran.r-project.org/package=magrittr)), as used in the following examples. #### Median NDVI over time series ``` {.r} col = image_collection("/path/to/collection.db") v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D") raster_cube(col, v) |> select_bands(c("B04", "B08")) |> apply_pixel("(B08-B04)/(B08+B04)","NDVI") |> reduce_time("median(NDVI)") ``` #### Simple RGB composite for a polygon ``` {.r} col = image_collection("/path/to/collection.db") v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D") raster_cube(col, v) |> select_bands(c("B02", "B03", "B04")) |> filter_geom(sf::read_sf("polygon_geometry.gpkg")$geometry) |> reduce_time(c("median(B02)", "median(B03)", "median(B04)") ``` #### Count large NDVI values within a forest area ``` {.r} col = image_collection("/path/to/collection.db") v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D") raster_cube(col, v) |> select_bands(c("B04", "B08")) |> apply_pixel("(B08-B04)/(B08+B04)","NDVI") |> filter_pixel("NDVI > 0.8") |> filter_geom(sf::read_sf("forest_boundaries.gpkg")$geometry) |> reduce_space("count(NDVI)") ``` ## Data cube export {#export} Data cubes (including the output of operations, see above) can be exported as single netCDF files, or as a collection of (possibly cloud-optimized) GeoTIFF files. ### NetCDF export Produced netCDF files use the enhanced netCDF-4 data model, follow the [CF-1.6 conventions](http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html) and include bands of the data cube as variables. Dimensions are in the order *time*, *y*, *x* or *time*, *lat*, *lon* respectively. Users can pass additional options to control properties of the resulting netCDF file: - DEFLATE compression can be specified by integer values between 0 (no compression) and 9 (maximum compression). - Users can furthermore enable or disable the inclusion of boundary variables in the output netCDF file. These variables give exact upper and lower limits of dimension values (e.g. start and end time of a pixel). ### GeoTIFF export Exporting a data cube as a collection of GeoTIFF files will produce one multiband GeoTIFF file per time slice of the data cube. Files will be named by a user defined prefix and the date/time of the slice, such as "cube_2018-01.tif". Users can pass additional options to control properties of the resulting GeoTIFF files: - Setting `overviews = TRUE` will generate GDAL overviews (image pyramids) for the resulting files. Overview levels reduce the pixel size by powers of 2, until width and height are lower than or equal to 256. The resampling algorithm for generating overviews can be selected from what GDAL offers (see [here](https://gdal.org/programs/gdaladdo.html#cmdoption-gdaladdo-r), default is nearest neighbor). - Setting `COG = TRUE` will produce [cloud-optimized GeoTIFF (COG)](https://www.cogeo.org) files and includes `overviews = TRUE`. - Additional GDAL creation options can be passed, e.g., to define LZW or DEFLATE compression (see [here](https://gdal.org/drivers/raster/gtiff.html#creation-options)). ### Packing data values Both, GeoTIFF as well as netCDF export support *packing data*, i.e. reducing the size of the output file(s) by conversion of double values to smaller integer types using an affine transformation of original values by an offset and scale, where values are transformed according to the formula: $$ \hat{x} = \frac{x - offset}{scale} $$ By conversion to integer, packing values reduces the precision of the data. Packing in gdalcubes requires users to pass - the output data type (one of `uint8` , `uint16`, `int16`, `uint32`, `int32`, `float32`). - scale, offset, and nodata values (unless type is `float32` ). Nodata values of data cubes will automatically converted to provided nodata values in the target data type. Individual scale, offset, and nodata values can be specified either once (applied for all bands), or individually per band. Packing to type `float32` will ignore any of the offset, scale, and nodata values but implicitly cast 8 byte doubles to 4 byte float values. ## Saving virtual data cubes as JSON files {#savejson} Data cube and image collection objects cannot be saved and restored across different R sessions using `save()` and `load()`, because they simply point to a memory address that is destroyed when the R session ends (internally they are _external pointers_). However, since the creation of objects is relatively cheap, it is possible to serialize and store the process graph as a lightweight JSON file using `as_json()` and recreate a cube from a file with `json_cube()`.