Data cube operations
gdalcubes includes some built-in functions to process data cubes. The table below lists available operations that take one (or more) data cube(s) and produce a derived data cube as output. Depending on the operation and their parameters, the output data cube may have different shape.
Built-in operations
Operation | Description |
---|---|
aggregate_space |
Reduce spatial resolution of a cube by applying a spatial aggregation function. |
aggregate_time |
Aggregate and/or regularize time series. |
apply_pixel |
Apply an arithmetic expression to all data cube pixels. |
crop |
Extract a rectangular spatial / temporal / spatiotemporal window. |
fill_time |
Fill missing values of a data cube by simple time series interpolation. |
filter_geom |
Filter pixels by a a spatial polygon. |
filter_pixel |
Filter pixels by a logical expressions on band values. |
join_bands |
Combine bands of two identically shaped data cubes. |
reduce_space |
Apply a reducer function to all spatial slices of a data cube. |
reduce_time |
Apply a reducer function to all pixel time series. |
select_bands |
Select specific bands of a data cube. |
select_time |
Select irregular time slices of a data cube. |
slice_space |
Select a single time series of a data cube. |
slice_time |
Select a single time slice of a data cube. |
window_time |
Apply a moving window aggregate or convolution kernel to all pixel time series. |
Notice that results of these operations are still data cubes. There are other functions, e.g, extracting irregular points from data cubes / or computing zonal statistics resulting in a different output type (see Reference).
Creating data cubes and calling operations on data cubes is very fast, because no image data is read and actual computations are delayed. Instead, operations return a proxy object or a virtual data cube without running any expensive computations on the data (see Execution).
Chaining of operations
Operations can be chained and still simply produce proxy objects without any expensive computations involved. Chained processes produce a process graph that simply memorizes what operations need to be executed and where the data comes from. Actual computations are delayed until users call a method that reads data of the resulting data cube (such as plotting, or exporting a data cube as a file, see Data cube export). Below, you can find some examples how data cube operations can be chained programmatically.
Examples
In R, chaining operations can be written nicely using the pipe symbol |>
(or %>%
from the magrittr
package), as used in the following examples.
Median NDVI over time series
Simple RGB composite for a polygon
col = image_collection("/path/to/collection.db")
v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D")
raster_cube(col, v) |>
select_bands(c("B02", "B03", "B04")) |>
filter_geom(sf::read_sf("polygon_geometry.gpkg")$geometry) |>
reduce_time(c("median(B02)", "median(B03)", "median(B04)")
Count large NDVI values within a forest area
col = image_collection("/path/to/collection.db")
v = cube_view(srs = "EPSG:4326", extent = col, dx = 0.01, dy = 0.01, dt = "P5D")
raster_cube(col, v) |>
select_bands(c("B04", "B08")) |>
apply_pixel("(B08-B04)/(B08+B04)","NDVI") |>
filter_pixel("NDVI > 0.8") |>
filter_geom(sf::read_sf("forest_boundaries.gpkg")$geometry) |>
reduce_space("count(NDVI)")
Data cube export
Data cubes (including the output of operations, see above) can be exported as single netCDF files, or as a collection of (possibly cloud-optimized) GeoTIFF files.
NetCDF export
Produced netCDF files use the enhanced netCDF-4 data model, follow the CF-1.6 conventions and include bands of the data cube as variables. Dimensions are in the order time, y, x or time, lat, lon respectively.
Users can pass additional options to control properties of the resulting netCDF file:
DEFLATE compression can be specified by integer values between 0 (no compression) and 9 (maximum compression).
Users can furthermore enable or disable the inclusion of boundary variables in the output netCDF file. These variables give exact upper and lower limits of dimension values (e.g. start and end time of a pixel).
GeoTIFF export
Exporting a data cube as a collection of GeoTIFF files will produce one multiband GeoTIFF file per time slice of the data cube. Files will be named by a user defined prefix and the date/time of the slice, such as “cube_2018-01.tif”.
Users can pass additional options to control properties of the resulting GeoTIFF files:
Setting
overviews = TRUE
will generate GDAL overviews (image pyramids) for the resulting files. Overview levels reduce the pixel size by powers of 2, until width and height are lower than or equal to 256. The resampling algorithm for generating overviews can be selected from what GDAL offers (see here, default is nearest neighbor).Setting
COG = TRUE
will produce cloud-optimized GeoTIFF (COG) files and includesoverviews = TRUE
.Additional GDAL creation options can be passed, e.g., to define LZW or DEFLATE compression (see here).
Packing data values
Both, GeoTIFF as well as netCDF export support packing data, i.e. reducing the size of the output file(s) by conversion of double values to smaller integer types using an affine transformation of original values by an offset and scale, where values are transformed according to the formula:
\[ \hat{x} = \frac{x - offset}{scale} \]
By conversion to integer, packing values reduces the precision of the data. Packing in gdalcubes requires users to pass
- the output data type (one of
uint8
,uint16
,int16
,uint32
,int32
,float32
). - scale, offset, and nodata values (unless type is
float32
).
Nodata values of data cubes will automatically converted to provided nodata values in the target data type. Individual scale, offset, and nodata values can be specified either once (applied for all bands), or individually per band. Packing to type float32
will ignore any of the offset, scale, and nodata values but implicitly cast 8 byte doubles to 4 byte float values.
Saving virtual data cubes as JSON files
Data cube and image collection objects cannot be saved and restored across different R sessions using save()
and load()
, because they simply point to a memory address that is destroyed when the R session ends (internally they are external pointers). However, since the creation of objects is relatively cheap, it is possible to serialize and store the process graph as a lightweight JSON file using as_json()
and recreate a cube from a file with json_cube()
.