pydrobert.speech.post
Classes for post-processing feature matrices
- pydrobert.speech.post.CMVN
alias of
Standardize
- class pydrobert.speech.post.Deltas(num_deltas, target_axis=-1, concatenate=True, context_window=2, pad_mode='edge', **kwargs)[source]
Bases:
PostProcessor
Calculate feature deltas (weighted rolling averages)
Deltas are calculated by correlating the feature tensor with a 1D delta filter by enumerating over all but one axis (the “time axis” equivalent).
Intermediate values are calculated with 64-bit floats, then cast back to the input data type.
Deltas
will increase the size of the feature tensor when num_deltas is positive and passed features are non-empty.If concatenate is
True
, target_axis specifies the axis along which new deltas are appended. For example,>>> deltas = Deltas(num_deltas=2, concatenate=True, target_axis=1) >>> features_shape = list(features.shape) >>> features_shape[1] *= 3 >>> assert deltas.apply(features).shape == tuple(features_shape)
If concatenate is
False
, target_axis dictates the location of a new axis in the resulting feature tensor that will index the deltas (0 for the original features, 1 for deltas, 2 for double deltas, etc.). For example:>>> deltas = Deltas(num_deltas=2, concatenate=False, target_axis=1) >>> features_shape = list(features.shape) >>> features_shape.insert(1, 3) >>> assert deltas.apply(features).shape == tuple(features_shape)
Deltas act as simple low-pass filters. Flipping the direction of the real filter to turn the delta operation into a simple convolution, the first order delta is defined as
\[\begin{split}f(t) = \begin{cases} \frac{-t}{Z} & -W \leq t \leq W \\ 0 & \mathrm{else} \end{cases}\end{split}\]where
\[Z = \sum_t f(t)^2\]for some \(W \geq 1\). Its Fourier Transform is
\[F(\omega) = \frac{-2i}{Z\omega^2}\left( W\omega \cos W\omega - \sin W\omega \right)\]Note that it is completely imaginary. For \(W \geq 2\), \(F\) is bound below \(\frac{i}{\omega}\). Hence, \(F\) exhibits low-pass characteristics. Second order deltas are generating by convolving \(f(-t)`\) with itself, third order is an additional \(f(-t)`\), etc. By the convolution theorem, higher order deltas have Fourier responses that become tighter around \(F(0)`\) (more lowpass).
- Parameters:
- aliases = {'deltas'}
- concatenate
- num_deltas
- class pydrobert.speech.post.PostProcessor[source]
Bases:
AliasedFactory
A container for post-processing features with a transform
- class pydrobert.speech.post.Stack(num_vectors, time_axis=0, pad_mode=None, **kwargs)[source]
Bases:
PostProcessor
Stack contiguous feature vectors together
- Parameters:
num_vectors (
int
) – The number of subsequent feature vectors in time to be stacked.time_axis (
int
) – The axis along which subsequent feature vectors are drawn.pad_mode (
Union
[str
,Callable
,None
]) – Specified how the axis in time will be padded on the right in order to be divisible by num_vectors. Additional keyword arguments will be passed tonumpy.pad()
. If unspecified, frames will instead be discarded in order to be divisible by num_vectors.
- aliases = {'stack'}
- num_vectors
- time_axis
- class pydrobert.speech.post.Standardize(rfilename=None, norm_var=True, **kwargs)[source]
Bases:
PostProcessor
Standardize each feature coefficient
Though the exact behaviour of an instance varies according to below, the “goal” of this transformation is such that every feature coefficient on the chosen axis has mean 0 and variance 1 (if norm_var is
True
) over the other axes. Features are assumed to be real; the return data type afterapply()
is always a 64-bit float.If rfilename is not specified or the associated file is empty, coefficients are standardized locally (within the target tensor). If rfilename is specified, feature coefficients are standardized according to the sufficient statistics collected in the file. The latter implementation is based off [povey2011]. The statistics will be loaded with
pydrobert.speech.util.read_signal()
.Notes
Additional keyword arguments can be passed to the initializer if rfilename is set. They will be passed on to
pydrobert.speech.util.read_signal()
See also
pydrobert.speech.util.read_signal
Describes the strategy used for loading signals
- accumulate(features, axis=-1)[source]
Accumulate statistics from a feature tensor
- Parameters:
- Raises:
ValueError – If the length of axis does not match that of past feature vector lengths
- aliases = {'cmvn', 'normalize', 'standardize', 'unit'}
- save(wfilename, key=None, compress=False, overwrite=True)[source]
Save accumulated statistics to file
If wfilename ends in
'.npy'
, stats will be written usingnumpy.save()
.If wfilename ends in
'.npz'
, stats will be written to a numpy archive. If overwrite isFalse
, other key-values will be loaded first if possible, then resaved. If key is set, data will be indexed by key in the archive. Otherwise, the data will be stored at the first unused key of the pattern'arr_d+'
. If compress isTrue
,numpy.savez_compressed()
will be used overnumpy.savez()
.Otherwise, data will be written using
numpy.ndarray.tofile()