pydrobert.speech.post

Classes for post-processing feature matrices

pydrobert.speech.post.CMVN: alias of Standardize

class pydrobert.speech.post.Deltas(num_deltas, target_axis=-1, concatenate=True, context_window=2, pad_mode='edge', **kwargs)[source]

Bases: PostProcessor

Calculate feature deltas (weighted rolling averages)

Deltas are calculated by correlating the feature tensor with a 1D delta filter by enumerating over all but one axis (the “time axis” equivalent).

Intermediate values are calculated with 64-bit floats, then cast back to the input data type.

Deltas will increase the size of the feature tensor when num_deltas is positive and passed features are non-empty.

If concatenate is True, target_axis specifies the axis along which new deltas are appended. For example,

>>> deltas = Deltas(num_deltas=2, concatenate=True, target_axis=1)
>>> features_shape = list(features.shape)
>>> features_shape[1] *= 3
>>> assert deltas.apply(features).shape == tuple(features_shape)

If concatenate is False, target_axis dictates the location of a new axis in the resulting feature tensor that will index the deltas (0 for the original features, 1 for deltas, 2 for double deltas, etc.). For example:

>>> deltas = Deltas(num_deltas=2, concatenate=False, target_axis=1)
>>> features_shape = list(features.shape)
>>> features_shape.insert(1, 3)
>>> assert deltas.apply(features).shape == tuple(features_shape)

Deltas act as simple low-pass filters. Flipping the direction of the real filter to turn the delta operation into a simple convolution, the first order delta is defined as

\[\begin{split}f(t) = \begin{cases} \frac{-t}{Z} & -W \leq t \leq W \\ 0 & \mathrm{else} \end{cases}\end{split}\]

where

\[Z = \sum_t f(t)^2\]

for some \(W \geq 1\). Its Fourier Transform is

\[F(\omega) = \frac{-2i}{Z\omega^2}\left( W\omega \cos W\omega - \sin W\omega \right)\]

Note that it is completely imaginary. For \(W \geq 2\), \(F\) is bound below \(\frac{i}{\omega}\). Hence, \(F\) exhibits low-pass characteristics. Second order deltas are generating by convolving \(f(-t)`\) with itself, third order is an additional \(f(-t)`\), etc. By the convolution theorem, higher order deltas have Fourier responses that become tighter around \(F(0)`\) (more lowpass).

Parameters:

num_deltas (int) –
target_axis (int) –
concatenate (bool) –
context_window (int) – The length of the filter to either side of the window. Positive.
pad_mode (Union[str, Callable]) – How to pad the input sequence when correlating
**kwargs – Additional keyword arguments to be passed to numpy.pad()

aliases = {'deltas'}

concatenate

num_deltas

class pydrobert.speech.post.PostProcessor[source]

Bases: AliasedFactory

A container for post-processing features with a transform

abstract apply(features, axis=-1, in_place=False)[source]

Applies the transformation to a feature tensor

Consult the class documentation for more details on what the transformation is.

Parameters:

features (ndarray) –
axis (int) – The axis of features to apply the transformation along
in_place (bool) – Whether it is okay to modify features (True) or whether a copy should be made (False)

Returns:

out (numpy.ndarray) – The transformed features

class pydrobert.speech.post.Stack(num_vectors, time_axis=0, pad_mode=None, **kwargs)[source]

Bases: PostProcessor

Stack contiguous feature vectors together

Parameters:

num_vectors (int) – The number of subsequent feature vectors in time to be stacked.
time_axis (int) – The axis along which subsequent feature vectors are drawn.
pad_mode (Union[str, Callable, None]) – Specified how the axis in time will be padded on the right in order to be divisible by num_vectors. Additional keyword arguments will be passed to numpy.pad(). If unspecified, frames will instead be discarded in order to be divisible by num_vectors.

aliases = {'stack'}

num_vectors

time_axis

class pydrobert.speech.post.Standardize(rfilename=None, norm_var=True, **kwargs)[source]

Bases: PostProcessor

Standardize each feature coefficient

Though the exact behaviour of an instance varies according to below, the “goal” of this transformation is such that every feature coefficient on the chosen axis has mean 0 and variance 1 (if norm_var is True) over the other axes. Features are assumed to be real; the return data type after apply() is always a 64-bit float.

If rfilename is not specified or the associated file is empty, coefficients are standardized locally (within the target tensor). If rfilename is specified, feature coefficients are standardized according to the sufficient statistics collected in the file. The latter implementation is based off [povey2011]. The statistics will be loaded with pydrobert.speech.util.read_signal().

Parameters:

rfilename (Optional[str]) –
norm_var (bool) –
**kwargs –

Notes

Additional keyword arguments can be passed to the initializer if rfilename is set. They will be passed on to pydrobert.speech.util.read_signal()