Notes for development using spectral-cube¶
spectral-cube is flexible and can used within other packages for development beyond the core package’s capabilities. Two significant strengths are the use of memory-mapping and the integration with dask (Integration with dask) to efficiently handle larger than memory data.
This page provides suggestions for software development using spectral-cube in other packages.
Handling large data cubes¶
spectral-cube is specifically designed for handling larger-than-memory data
and minimizes creating copies of the data.
SpectralCube uses memory-mapping
and provides options for executing operations with only subsets of the data
(for example, the
how keyword in
Masking operations can be performed “lazily”, where the computation is completed only when a view of the underlying boolean mask array is returned. See Masking for details on these implementations.
Further strategies for handling large data is given in Handling large datasets.
New methods can take advantage of these features by creating custom functions
to pass to
SpectralCube.apply_function_parallel_spectral(). These methods accept
functions that take a data and mask array input, with optional
and that return an output array of the same shape as the input.
Unifying large-data handling and parallelization with dask¶
spectral-cube’s dask integration unifies many of the above features and further options leveraging the dask ecosystem. The Integration with dask page provides an overview of general usage and recommended practices, including:
Using different dask schedulers (synchronous, threads, and distributed)
Triggering dask executions and saving intermediate results to disk
Efficiently rechunking large data for parallel operations
Loading cubes in CASA image format
For an interactive demonstration of these features, see the Guide to Dask Optimization.
For further development, we highlight the ability to apply custom functions using dask.
DaskSpectralCube loads the data as a dask Array.
Similar to the non-dask
SpectralCube, custom functions can be used with
DaskSpectralCube.apply_function_parallel_spatial(). Effectively these are
wrappers on dask.array.map_blocks
and accept common kwargs.
The dask array can be accessed with
DaskSpectralCube._data but we discourage
this as the builtin functions include checks, such as applying the mask to the
The Integration with dask page gives a basic example of using a custom function. A more
advanced example is shown in the parallel fitting with dask tutorial.
This tutorial demonstrates fitting a spectral model to every spectrum in a cube, applied
in parallel over chunks of the data. This fitting example is a guide for using
A change in array shape and dimensions in the output (
block_infodictionary in a custom function to track the location of a chunk in the cube