pydrobert.speech.post

Classes for post-processing feature matrices

pydrobert.speech.post.CMVN

alias of Standardize

class pydrobert.speech.post.Deltas(num_deltas, target_axis=-1, concatenate=True, context_window=2, pad_mode='edge', **kwargs)[source]

Bases: PostProcessor

Calculate feature deltas (weighted rolling averages)

Deltas are calculated by correlating the feature tensor with a 1D delta filter by enumerating over all but one axis (the “time axis” equivalent).

Intermediate values are calculated with 64-bit floats, then cast back to the input data type.

Deltas will increase the size of the feature tensor when num_deltas is positive and passed features are non-empty.

If concatenate is True, target_axis specifies the axis along which new deltas are appended. For example,

>>> deltas = Deltas(num_deltas=2, concatenate=True, target_axis=1)
>>> features_shape = list(features.shape)
>>> features_shape[1] *= 3
>>> assert deltas.apply(features).shape == tuple(features_shape)

If concatenate is False, target_axis dictates the location of a new axis in the resulting feature tensor that will index the deltas (0 for the original features, 1 for deltas, 2 for double deltas, etc.). For example:

>>> deltas = Deltas(num_deltas=2, concatenate=False, target_axis=1)
>>> features_shape = list(features.shape)
>>> features_shape.insert(1, 3)
>>> assert deltas.apply(features).shape == tuple(features_shape)

Deltas act as simple low-pass filters. Flipping the direction of the real filter to turn the delta operation into a simple convolution, the first order delta is defined as

\[\begin{split}f(t) = \begin{cases} \frac{-t}{Z} & -W \leq t \leq W \\ 0 & \mathrm{else} \end{cases}\end{split}\]

where

\[Z = \sum_t f(t)^2\]

for some \(W \geq 1\). Its Fourier Transform is

\[F(\omega) = \frac{-2i}{Z\omega^2}\left( W\omega \cos W\omega - \sin W\omega \right)\]

Note that it is completely imaginary. For \(W \geq 2\), \(F\) is bound below \(\frac{i}{\omega}\). Hence, \(F\) exhibits low-pass characteristics. Second order deltas are generating by convolving \(f(-t)`\) with itself, third order is an additional \(f(-t)`\), etc. By the convolution theorem, higher order deltas have Fourier responses that become tighter around \(F(0)`\) (more lowpass).

Parameters:
  • num_deltas (int) –

  • target_axis (int) –

  • concatenate (bool) –

  • context_window (int) – The length of the filter to either side of the window. Positive.

  • pad_mode (Union[str, Callable]) – How to pad the input sequence when correlating

  • **kwargs – Additional keyword arguments to be passed to numpy.pad()

aliases = {'deltas'}
concatenate
num_deltas
class pydrobert.speech.post.PostProcessor[source]

Bases: AliasedFactory

A container for post-processing features with a transform

abstract apply(features, axis=-1, in_place=False)[source]

Applies the transformation to a feature tensor

Consult the class documentation for more details on what the transformation is.

Parameters:
  • features (ndarray) –

  • axis (int) – The axis of features to apply the transformation along

  • in_place (bool) – Whether it is okay to modify features (True) or whether a copy should be made (False)

Returns:

out (numpy.ndarray) – The transformed features

class pydrobert.speech.post.Stack(num_vectors, time_axis=0, pad_mode=None, **kwargs)[source]

Bases: PostProcessor

Stack contiguous feature vectors together

Parameters:
  • num_vectors (int) – The number of subsequent feature vectors in time to be stacked.

  • time_axis (int) – The axis along which subsequent feature vectors are drawn.

  • pad_mode (Union[str, Callable, None]) – Specified how the axis in time will be padded on the right in order to be divisible by num_vectors. Additional keyword arguments will be passed to numpy.pad(). If unspecified, frames will instead be discarded in order to be divisible by num_vectors.

aliases = {'stack'}
num_vectors
time_axis
class pydrobert.speech.post.Standardize(rfilename=None, norm_var=True, **kwargs)[source]

Bases: PostProcessor

Standardize each feature coefficient

Though the exact behaviour of an instance varies according to below, the “goal” of this transformation is such that every feature coefficient on the chosen axis has mean 0 and variance 1 (if norm_var is True) over the other axes. Features are assumed to be real; the return data type after apply() is always a 64-bit float.

If rfilename is not specified or the associated file is empty, coefficients are standardized locally (within the target tensor). If rfilename is specified, feature coefficients are standardized according to the sufficient statistics collected in the file. The latter implementation is based off [povey2011]. The statistics will be loaded with pydrobert.speech.util.read_signal().

Parameters:

Notes

Additional keyword arguments can be passed to the initializer if rfilename is set. They will be passed on to pydrobert.speech.util.read_signal()

See also

pydrobert.speech.util.read_signal

Describes the strategy used for loading signals

accumulate(features, axis=-1)[source]

Accumulate statistics from a feature tensor

Parameters:
Raises:

ValueError – If the length of axis does not match that of past feature vector lengths

aliases = {'cmvn', 'normalize', 'standardize', 'unit'}
property have_stats

Whether at least one feature vector has been accumulated

Type:

bool

save(wfilename, key=None, compress=False, overwrite=True)[source]

Save accumulated statistics to file

If wfilename ends in '.npy', stats will be written using numpy.save().

If wfilename ends in '.npz', stats will be written to a numpy archive. If overwrite is False, other key-values will be loaded first if possible, then resaved. If key is set, data will be indexed by key in the archive. Otherwise, the data will be stored at the first unused key of the pattern 'arr_d+'. If compress is True, numpy.savez_compressed() will be used over numpy.savez().

Otherwise, data will be written using numpy.ndarray.tofile()

Parameters:
Raises:

ValueError – If no stats have been accumulated