Notes for development using spectral-cube¶
spectral-cube is flexible and can used within other packages for development beyond the core package’s capabilities. Two significant strengths are the use of memory-mapping and the integration with dask (Integration with dask) to efficiently handle larger than memory data.
This page provides suggestions for software development using spectral-cube in other packages.
The following two sections give information on standard usage of SpectralCube
.
The third discusses usage with dask integration in DaskSpectralCube
.
Handling large data cubes¶
spectral-cube is specifically designed for handling larger-than-memory data
and minimizes creating copies of the data. SpectralCube
uses memory-mapping
and provides options for executing operations with only subsets of the data
(for example, the how
keyword in SpectralCube.moment()
).
Masking operations can be performed “lazily”, where the computation is completed only when a view of the underlying boolean mask array is returned. See Masking for details on these implementations.
Further strategies for handling large data is given in Handling large datasets.
Parallelizing operations¶
Several operations implemented in SpectralCube
can be parallelized
using the joblib package. Builtin methods
in SpectralCube
with the parallel
keyword will enable using joblib.
New methods can take advantage of these features by creating custom functions
to pass to SpectralCube.apply_function_parallel_spatial()
and
SpectralCube.apply_function_parallel_spectral()
. These methods accept
functions that take a data and mask array input, with optional **kwargs
,
and that return an output array of the same shape as the input.
Unifying large-data handling and parallelization with dask¶
spectral-cube’s dask integration unifies many of the above features and further options leveraging the dask ecosystem. The Integration with dask page provides an overview of general usage and recommended practices, including:
Using different dask schedulers (synchronous, threads, and distributed)
Triggering dask executions and saving intermediate results to disk
Efficiently rechunking large data for parallel operations
Loading cubes in CASA image format
For an interactive demonstration of these features, see the Guide to Dask Optimization.
For further development, we highlight the ability to apply custom functions using dask.
A DaskSpectralCube
loads the data as a dask Array.
Similar to the non-dask SpectralCube
, custom functions can be used with
DaskSpectralCube.apply_function_parallel_spectral()
and
DaskSpectralCube.apply_function_parallel_spatial()
. Effectively these are
wrappers on dask.array.map_blocks
and accept common kwargs.
Note
The dask array can be accessed with DaskSpectralCube._data
but we discourage
this as the builtin functions include checks, such as applying the mask to the
data.
If you have a use case needing on of dask array’s other operation tools please raise an issue on GitHub so we can add this support!
The Integration with dask page gives a basic example of using a custom function. A more
advanced example is shown in the parallel fitting with dask tutorial.
This tutorial demonstrates fitting a spectral model to every spectrum in a cube, applied
in parallel over chunks of the data. This fitting example is a guide for using
DaskSpectralCube.apply_function_parallel_spectral()
with:
A change in array shape and dimensions in the output (
drop_axis
andchunks
in dask.array.map_blocks)Using dask’s
block_info
dictionary in a custom function to track the location of a chunk in the cube