inpystem.tools.PCA module

This module implements tools to perform PCA transformation.

The main element is the PcaHandler class which is a user interface. It performs direct and inverse PCA transformation for 3D data.

Dimension_Reduction is the background function which performs PCA while the EigenEstimate function improves the estimation of PCA eigenvalues.

The PcaHandler interface

class inpystem.tools.PCA.PcaHandler(Y, mask=None, PCA_transform=True, PCA_th='auto', verbose=True)

Interface to perform PCA.

The PCA is applied at class initialization based on the input data. This same operation can be applied afterward to other data using the direct and inverse methods.

Variables
  • Y ((m, n, l) numpy array) – Multi-band data.

  • Y_PCA ((m, n, PCA_th) numpy array) – The data in PCA space.

  • mask (optional, (m, n) numpy array) – Spatial sampling mask. Default is full sampling.

  • PCA_transform (optional, bool) – Flag that sets if PCA should really be applied. This is useful in soma cases where PCA has already been applied. Default is True.

  • verbose (optional, bool) – If True, information is sent to output.

  • H ((l, PCA_th) numpy array) – The subspace base.

  • Ym ((m, n, l) numpy array) – Matrix whose spectra are all composed of the data spectral mean.

  • PCA_th (int) – The estimated data dimension.

  • InfoOut (dict) – The dictionary contaning additional information about the reduction. See Note.

Note

The InfoOut dictionary containg the thee following keys:

  1. ‘H’ which is the base of the reduced subspace. Its shape is (l, PCA_th) where PCA_th is the estimated data dimension.

  2. ‘d’ which is the evolution of the PCA-eigenvalues after estimation.

  3. ‘PCA_th’ which is the estimated data dimension.

  4. ‘sigma’ which is the estimated Gaussian noise standard deviation.

  5. ‘Ym’ which is a (m, n, l) numpy array where the data mean over bands is repeated for each spatial location.

__init__(Y, mask=None, PCA_transform=True, PCA_th='auto', verbose=True)

PcaHandler constructor.

Parameters
  • Y ((m, n, l) numpy array) – Multi-band data.

  • mask ((m, n) numpy array) – Spatial sampling mask.

  • PCA_transform (optional, bool) – Flag that sets if PCA should really be applied. This is useful in soma cases where PCA has already been applied. Default is True.

  • verbose (optional, bool) – If True, information is sent to output.

direct(X=None)

Performs direct PCA transformation.

The input X array can be data to project into the PCA subspace or None. If input is None (which is default), the output will be simply self.Y_PCA.

Caution

The input data to transform should have the same shape as the Y initial data.

Parameters

X ((m, n, l) numpy array) – The data to transform into PCA space.

Returns

Multi-band data in reduced space.

Return type

(m, n, PCA_th) numpy array

inverse(X_PCA)

Performs inverse PCA transformation.

Caution

The input data to transform should have the same shape as the self.Y_PCA transformed data.

Parameters

X_PCA ((m, n, PCA_th) numpy array) – The data to transform into data space.

Returns

Multi-band data after inverse transformation.

Return type

(m, n, l) numpy array

Backgroud functions

inpystem.tools.PCA.Dimension_Reduction(Y, mask=None, PCA_th='auto', verbose=True)

Reduces the dimension of a multi-band image.

Parameters
  • Y ((m, n, l) numpy array) – The multi-band image where the last axis is the spectral one.

  • mask (optional, (m, n) numpy array) – The spatial sampling mask filled with True where pixels are sampled. This is used to remove correctly the data mean. Default if a matrix full of True.

  • PCA_th (optional, str, int) – The PCA threshold. ‘auto’ for automatic estimation. ‘max’ to keep all components. An interger to choose the threshold. In case there are less samples (N) than the data dimension (l), thi sparameter is overridded to keep a threshold of N-1.

  • verbose (optional, bool) – Prints output if True. Default is True.

Returns

  • (m, n, PCA_th) numpy array – The data in the reduced subspace. Its shape is (m, n, PCA_th) where PCA_th is the estimated data dimension.

  • dict – The dictionary contaning additional information about the reduction. See Note.

Note

The InfoOut dictionary containg the thee following keys:

  1. ‘H’ which is the base of the reduced subspace. Its shape is (l, PCA_th) where PCA_th is the estimated data dimension.

  2. ‘d’ which is the evolution of the PCA-eigenvalues after estimation.

  3. ‘PCA_th’ which is the estimated data dimension.

  4. ‘sigma’ which is the estimated Gaussian noise standard deviation.

  5. ‘Ym’ which is a (m, n, l) numpy array where the data mean over bands is repeated for each spatial location.

inpystem.tools.PCA.EigenEstimate(l, Ns)

Computes an estimate of the covariance eigenvalues given the sample covariance eigenvalues. The Stein estimator coupled with isotonic regression has been used here.

For more information, have a look at:

  • MESTRE, Xavier. Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. IEEE Transactions on Information Theory, 2008, vol. 54, no 11, p. 5113-5129.s

Parameters
  • l (numpy array) – Sample eigenvalues

  • Ns (int) – Number of observations

Returns

  • numpy array – Estimated covariance matrix eigenvalues.

  • float – Estimated Gaussian noise standard deviation.

  • int – Estimated dimension of the signal subspace.