Masking

Getting started

In addition to supporting the representation of data and associated WCS, it is also possible to attach a boolean mask to the SpectralCube class. Masks can take various forms, but one of the more common ones is a cube with the same dimensions as the data, and that contains e.g. the boolean value True where data should be used, and the value False when the data should be ignored (though it is also possible to flip the convention around; see Inclusion and Exclusion). To create a boolean mask from a boolean array mask_array, you can for example use:

>>> from astropy import units as u
>>> from spectral_cube import BooleanArrayMask
>>> mask = BooleanArrayMask(mask=mask_array, wcs=cube.wcs)  

Note

Currently, the mask convention is opposite of what is defined for Numpy masked array and Astropy Table.

Using a pure boolean array may not always be the most efficient solution because it may require a large amount of memory.

You can also create a mask using simple conditions directly on the cube values themselves, for example:

>>> include_mask = cube > 1.3*u.K  

This is more efficient because the condition is actually evaluated on-the-fly as needed. Note that units equivalent to the cube’s units must be used.

Masks can be combined using standard boolean comparison operators:

>>> new_mask = (cube > 1.3*u.K) & (cube < 100.*u.K)  

The available operators are & (and), | (or), and ~ (not).

To apply a new mask to a SpectralCube class, use the with_mask() method, which can take a mask and combine it with any pre-existing mask:

>>> cube2 = cube.with_mask(new_mask)  

In the above example, cube2 contains a mask that is the & combination of new_mask with the existing mask on cube. The cube2 object contains a view to the same data as cube, so no data is copied during this operation.

Boolean arrays can also be used as input to with_mask(), assuming the shape of the mask and the data match:

>>> cube2 = cube.with_mask(boolean_array)  

Any boolean area that can be broadcast to the cube shape can be used as a boolean array mask.

Accessing masked data

As mentioned in Accessing data, the raw and unmasked data can be accessed with the spectral_cube.spectral_cube.BaseSpectralCube.unmasked_data attribute. You can access the masked data using filled_data. This array is a copy of the original data with any masked value replaced by a fill value (which is np.nan by default but can be changed using the fill_value option in the SpectralCube initializer). The ‘filled’ data is accessed with e.g.:

>>> slice_filled = cube.filled_data[0,:,:]  

Note that accessing the filled data should still be efficient because the data are loaded and filled only once you access the actual data values, so this should still be efficient for large datasets.

If you are only interested in getting a flat (i.e. 1-d) array of all the non-masked values, you can also make use of the flattened() method:

>>> flat_array = cube.flattened()  

Fill values

When accessing the data (see Accessing data), the mask may be applied to the data and the masked values replaced by a fill value. This fill value can be set using the fill_value initializer in SpectralCube, and is set to np.nan by default. To change the fill value on a cube, you can make use of the with_fill_value() method:

>>> cube2 = cube.with_fill_value(0.)  

This returns a new SpectralCube instance that contains a view to the same data in cube (so no data are copied).

Inclusion and Exclusion

The term “mask” is often used to refer both to the act of exluding and including pixels from analysis. To be explicit about how they behave, all mask objects have an include() method that returns a boolean array. True values in this array indicate that the pixel is included/valid, and not filtered/replaced in any way. Conversely, True values in the output from exclude() indicate the pixel is excluded/invalid, and will be filled/filtered. The inclusion/exclusion behavior of any mask can be inverted via:

>>> mask_inverse = ~mask  

Advanced masking

Masks based on simple functions that operate on the initial data can be defined using the LazyMask class. The motivation behind the LazyMask class is that it is essentially equivalent to a boolean array, but the boolean values are computed on-the-fly as needed, meaning that the whole boolean array does not ever necessarily need to be computed or stored in memory, making it ideal for very large datasets. The function passed to LazyMask should be a simple function taking one argument - the dataset itself:

>>> from spectral_cube import LazyMask
>>> cube = read(...)  
>>> LazyMask(np.isfinite, cube=cube)  

or for example:

>>> def threshold(data):
...     return data > 3.
>>> LazyMask(threshold, cube=cube)  

As shown in Getting Started, LazyMask instances can also be defined directly by specifying conditions on SpectralCube objects:

>>> cube > 5*u.K  
LazyComparisonMask(...)

Outputting masks

The attached mask to the given SpectralCube class can be converted into a CASA image using make_casa_mask():

>>> from spectral_cube.io.casa_masks import make_casa_mask
>>> make_casa_mask(cube, 'casa_mask.image', add_stokes=False)  

Optionally, a redundant Stokes axis can be added to match the original CASA image.

Note

Outputting to CASA masks requires that spectral_cube be run from a CASA python session.

Masking cubes with other cubes

A common use case is to mask a cube based on another cube in the same coordinates. For example, you want to create a mask of 13CO based on the brightness of 12CO. This can be done straightforwardly if they are on an identical grid:

>>> mask_12co = cube12co > 0.5*u.Jy  
>>> masked_cube13co = cube13co.with_mask(mask_12co)  

If you see errors such as WCS does not match mask WCS, but you’re confident that your two cube are on the same grid, you should have a look at the cube.wcs attribute and see if there are subtle differences in the world coordinate parameters. These frequently occur when converting from frequency to velocity as there is inadequate precision in the rest frequency.

For example, these two axes are nearly identical, but not perfectly so:

Number of WCS axes: 3
CTYPE : 'RA---SIN'  'DEC--SIN'  'VRAD'
CRVAL : 269.08866286689999  -21.956244813729999  -3000.000559989533
CRPIX : 161.0  161.0  1.0
PC1_1 PC1_2 PC1_3  : 1.0  0.0  0.0
PC2_1 PC2_2 PC2_3  : 0.0  1.0  0.0
PC3_1 PC3_2 PC3_3  : 0.0  0.0  1.0
CDELT : -1.3888888888889999e-05  1.3888888888889999e-05  299.99999994273281
NAXIS    : 0 0

Number of WCS axes: 3
CTYPE : 'RA---SIN'  'DEC--SIN'  'VRAD'
CRVAL : 269.08866286689999  -21.956244813729999  -3000.0000242346514
CRPIX : 161.0  161.0  1.0
PC1_1 PC1_2 PC1_3  : 1.0  0.0  0.0
PC2_1 PC2_2 PC2_3  : 0.0  1.0  0.0
PC3_1 PC3_2 PC3_3  : 0.0  0.0  1.0
CDELT : -1.3888888888889999e-05  1.3888888888889999e-05  300.00000001056611
NAXIS    : 0 0

In order to compose masks from these, we need to set the wcs_tolerance parameter:

>>> masked_cube13co = cube13co.with_mask(mask_12co, wcs_tolerance=1e-3)  

which in this case will check equality at the 1e-3 level, which truncates the 3rd CRVAL to the point of equality before comparing the values.