pydrobert.speech.torch

PyTorch compatibility module

This submodule is intended to provide PyTorch implementations of the components critical to feature computation. It is not meant to comprehensively reproduce all functionality in PyTorch. Each PyTorch module here contains a class method which initializes the PyTorch module with some analogous Numpy instance discussed elsewhere. For example, assuming stft_frame_computer is an instance of a pydrobert.speech.STFTFrameComputer, one may instantiate a PyTorchSTFTFrameComputer via

>>> pytorch_stft_frame_computer = PyTorchSTFTFrameComputer.from_stft_frame_computer(
...     stft_frame_computer)
class pydrobert.speech.torch.PyTorchDither(coeff=1.0)[source]

Bases: Module

PyTorch implementation of Dither

Add random, normally-distributed noise to a signal

Parameters:
  • coeff (float) – The standard deviation of the noise

  • dim – The dimension to apply noise to. If unspecified, applied to all coefficients

Notes

While it is usually the case in PyTorch that random noise is only added during training, dithering serves a

class pydrobert.speech.torch.PyTorchPostProcessorWrapper(postprocessor)[source]

Bases: Module

A PyTorch wrapper around a PostProcessor

This module merely casts incoming tensors to a numpy.ndarray, runs pydrobert.speech.post.PostProcessor.apply() on the result, then casts it back into a tensor.

Most PostProcessor classes have been reimplemented in pydrobert.torch with a bona fide PyTorch implementation, which should be preferred.

class pydrobert.speech.torch.PyTorchPreemphasize(coeff=0.97)[source]

Bases: Module

PyTorch implementation of Preemphasize

Parameters:

coeff (float) – Preemphasis coefficient

pydrobert.speech.torch.PyTorchSIFrameComputer

alias of PyTorchShortIntegrationFrameComputer

pydrobert.speech.torch.PyTorchSTFTFrameComputer

alias of PyTorchShortTimeFourierTransformFrameComputer

class pydrobert.speech.torch.PyTorchShortIntegrationFrameComputer(si_frame_computer)[source]

Bases: Module

PyTorch implementation of SIFrameComputer

This module is a port of pydrobert.speech.compute.ShortIntegrationFrameComputer to PyTorch. When called, the output should be nearly identical to a call to ShortIntegrationFrameComputer.compute_full(), except torch.Tensor inputs and outputs are expected.

Warning

This module is currently a mere wrapper around a ShortIntegrationFrameComputer instance. While we plan on reimplementing the computer with bona fide PyTorch operations at a later date, for now, relying on the factory function from_si_frame_computer() is the best way to ensure forward compatibility. In addition, the module state dict cannot be saved nor loaded to ensure forward compatibility.

class pydrobert.speech.torch.PyTorchShortTimeFourierTransformFrameComputer(offsets_and_truncated_filters, frame_length, frame_shift, frame_style='centered', window=None, dft_size=None, use_log=True, use_power=False, include_energy=False, kaldi_shift=False, is_real=False)[source]

Bases: Module

PyTorch implementation of STFTFrameComputer

This module is a port of pydrobert.speech.compute.ShortTimeFourierTransformFrameComputer to PyTorch. When called, the output should be nearly identical to a call to ShortTimeFourierTransformFrameComputer.compute_full(), except torch.Tensor inputs and outputs are expected.

The easiest means of initializing this module is through the factory function from_numpy_frame_computer(), which determines the below parameters from an STFTFrameComputer which has already been initialized.

The filters and window are learnable/adjustable. Be sure to disable gradients with torch.no_grad() if a fixed feature representation is desirable.

Parameters:
  • offsets_and_truncated_filters (Sequence[Tuple[int, Tensor]]) – A sequence of pairs (offset, truncated_filter). truncated_filter is a one-dimensional tensor of the non-zero frequency response of a single filter in the bank. offset is the index in the short-time spectrum at which the truncated_filter begins.

  • frame_length (int) – The number of audio samples constituting a frame.

  • frame_shift (int) – The number of audio samples between subsequent frames.

  • frame_style (Literal['centered', 'causal']) – If 'causal', the first frame begins at sample 0. Otherwise, the first frame is centered around sample 0 with the exact behaviour dictated by the kaldi_shift flag.

  • window (Optional[Tensor]) – If specified, a tensor of shape (frame_length,) containing the windowing function. If unspecified, implicit rectangular windowing will be performed (with no gradient).

  • dft_size (Optional[int]) – The size of the spectrum to compute for each frame. Must be greater than or equal to frame_length. If unspecified, the first power of two at or beyond the frame length will be chosen.

  • use_log (bool) – Whether to take the logarithm of the resulting representation

  • use_power (bool) – Take the power spectrum instead of the magnitude spectrum

  • include_energy (bool) – Whether to add a coefficient at index 0 corresponding to the frame-wise energy of the signal

  • kaldi_shift (bool) – Dictates how to center frames when frame_style is 'centered'. If True, the k-th frame will be computed using the signal between signal[ k * frame_shift - frame_length // 2 + frame_shift // 2:k * frame_shift + (frame_length + 1) // 2 + frame_shift // 2]. These are the frame bounds for Kaldi [povey2011]. Otherwise, the k-th frame is signal[ k * frame_shift - (frame_length + 1) // 2 + 1: k * frame_shift + frame_length // 2 + 1].

  • is_real (bool) – Whether the filters are real in the time domain. If True, coefficients will be doubled (pre-log) to account for Hermitian symmetry.

classmethod from_stft_frame_computer(computer, filter_type=torch.cfloat, window_type=torch.float)[source]

Create an instance using an STFTFrameComputer

Parameters:
  • computer (ShortTimeFourierTransformFrameComputer) – The initialized instance to pull parameters from

  • filter_type (dtype) – The data type to store filter parameters as

  • window_type (dtype) – The data type to store the window parameter as

pydrobert.speech.torch.pytorch_dither(sig, coeff=1.0)[source]

Functional implementation of PyTorchDither

pydrobert.speech.torch.pytorch_preemphasize(sig, coeff=0.97)[source]

Functional implementation of PyTorchPreemphasize

pydrobert.speech.torch.pytorch_stft_frame_computer(sig, filters, offsets, frame_length, frame_shift, centered=True, window=None, dft_size=None, use_log=True, use_power=False, include_energy=False, kaldi_shift=False, is_real=True, eps=1e-05)

Functional implementation of PyTorchShortTimeFourierTransformFrameComputer