.. _doc_developersnotes:
Notes for development using spectral-cube
=========================================
.. currentmodule:: spectral_cube
spectral-cube is flexible and can used within other packages for
development beyond the core package's capabilities. Two significant strengths
are the use of memory-mapping and the integration with `dask `_
(:ref:`doc_dask`) to efficiently handle larger than memory data.
This page provides suggestions for software development using spectral-cube in other
packages.
The following two sections give information on standard usage of :class:`SpectralCube`.
The third discusses usage with dask integration in :class:`DaskSpectralCube`.
Handling large data cubes
-------------------------
spectral-cube is specifically designed for handling larger-than-memory data
and minimizes creating copies of the data. :class:`SpectralCube` uses memory-mapping
and provides options for executing operations with only subsets of the data
(for example, the `how` keyword in :meth:`SpectralCube.moment`).
Masking operations can be performed "lazily", where the computation is completed
only when a view of the underlying boolean mask array is returned.
See :ref:`doc_masking` for details on these implementations.
Further strategies for handling large data is given in :ref:`doc_handling_large_datasets`.
Parallelizing operations
------------------------
Several operations implemented in :class:`SpectralCube` can be parallelized
using the `joblib `_ package. Builtin methods
in :class:`SpectralCube` with the `parallel` keyword will enable using joblib.
New methods can take advantage of these features by creating custom functions
to pass to :meth:`SpectralCube.apply_function_parallel_spatial` and
:meth:`SpectralCube.apply_function_parallel_spectral`. These methods accept
functions that take a data and mask array input, with optional `**kwargs`,
and that return an output array of the same shape as the input.
Unifying large-data handling and parallelization with dask
----------------------------------------------------------
spectral-cube's dask integration unifies many of the above features and further
options leveraging the dask ecosystem. The :ref:`doc_dask` page provides an overview
of general usage and recommended practices, including:
* Using different dask schedulers (synchronous, threads, and distributed)
* Triggering dask executions and saving intermediate results to disk
* Efficiently rechunking large data for parallel operations
* Loading cubes in CASA image format
For an interactive demonstration of these features, see
the `Guide to Dask Optimization `_.
..
TODO: UPDATE THE LINK TO THE TUTORIAL once merged
For further development, we highlight the ability to apply custom functions using dask.
A :class:`DaskSpectralCube` loads the data as a `dask Array `_.
Similar to the non-dask :class:`SpectralCube`, custom functions can be used with
:meth:`DaskSpectralCube.apply_function_parallel_spectral` and
:meth:`DaskSpectralCube.apply_function_parallel_spatial`. Effectively these are
wrappers on `dask.array.map_blocks `_
and accept common kwargs.
.. note::
The dask array can be accessed with `DaskSpectralCube._data` but we discourage
this as the builtin functions include checks, such as applying the mask to the
data.
If you have a use case needing on of dask array's other `operation tools `_
please raise an `issue on GitHub `_
so we can add this support!
The :ref:`doc_dask` page gives a basic example of using a custom function. A more
advanced example is shown in the `parallel fitting with dask tutorial `_.
This tutorial demonstrates fitting a spectral model to every spectrum in a cube, applied
in parallel over chunks of the data. This fitting example is a guide for using
:meth:`DaskSpectralCube.apply_function_parallel_spectral` with:
* A change in array shape and dimensions in the output (`drop_axis` and `chunks` in `dask.array.map_blocks `_)
* Using dask's `block_info` dictionary in a custom function to track the location of a chunk in the cube
..
TODO: UPDATE THE LINK TO THE TUTORIAL once merged