pydrobert.speech.torch
PyTorch compatibility module
This submodule is intended to provide PyTorch implementations of the components critical
to feature computation. It is not meant to comprehensively reproduce all functionality
in PyTorch. Each PyTorch module here contains a class method which initializes the
PyTorch module with some analogous Numpy instance discussed elsewhere. For example,
assuming stft_frame_computer is an instance of a
pydrobert.speech.STFTFrameComputer
, one may instantiate a
PyTorchSTFTFrameComputer
via
>>> pytorch_stft_frame_computer = PyTorchSTFTFrameComputer.from_stft_frame_computer(
... stft_frame_computer)
- class pydrobert.speech.torch.PyTorchDither(coeff=1.0)[source]
Bases:
Module
PyTorch implementation of Dither
Add random, normally-distributed noise to a signal
- Parameters:
coeff (
float
) – The standard deviation of the noisedim – The dimension to apply noise to. If unspecified, applied to all coefficients
Notes
While it is usually the case in PyTorch that random noise is only added during training, dithering serves a
- class pydrobert.speech.torch.PyTorchPostProcessorWrapper(postprocessor)[source]
Bases:
Module
A PyTorch wrapper around a PostProcessor
This module merely casts incoming tensors to a
numpy.ndarray
, runspydrobert.speech.post.PostProcessor.apply()
on the result, then casts it back into a tensor.Most
PostProcessor
classes have been reimplemented inpydrobert.torch
with a bona fide PyTorch implementation, which should be preferred.
- class pydrobert.speech.torch.PyTorchPreemphasize(coeff=0.97)[source]
Bases:
Module
PyTorch implementation of Preemphasize
- Parameters:
coeff (
float
) – Preemphasis coefficient
- pydrobert.speech.torch.PyTorchSIFrameComputer
alias of
PyTorchShortIntegrationFrameComputer
- pydrobert.speech.torch.PyTorchSTFTFrameComputer
- class pydrobert.speech.torch.PyTorchShortIntegrationFrameComputer(si_frame_computer)[source]
Bases:
Module
PyTorch implementation of SIFrameComputer
This module is a port of
pydrobert.speech.compute.ShortIntegrationFrameComputer
to PyTorch. When called, the output should be nearly identical to a call toShortIntegrationFrameComputer.compute_full()
, excepttorch.Tensor
inputs and outputs are expected.Warning
This module is currently a mere wrapper around a
ShortIntegrationFrameComputer
instance. While we plan on reimplementing the computer with bona fide PyTorch operations at a later date, for now, relying on the factory functionfrom_si_frame_computer()
is the best way to ensure forward compatibility. In addition, the module state dict cannot be saved nor loaded to ensure forward compatibility.
- class pydrobert.speech.torch.PyTorchShortTimeFourierTransformFrameComputer(offsets_and_truncated_filters, frame_length, frame_shift, frame_style='centered', window=None, dft_size=None, use_log=True, use_power=False, include_energy=False, kaldi_shift=False, is_real=False)[source]
Bases:
Module
PyTorch implementation of STFTFrameComputer
This module is a port of
pydrobert.speech.compute.ShortTimeFourierTransformFrameComputer
to PyTorch. When called, the output should be nearly identical to a call toShortTimeFourierTransformFrameComputer.compute_full()
, excepttorch.Tensor
inputs and outputs are expected.The easiest means of initializing this module is through the factory function
from_numpy_frame_computer()
, which determines the below parameters from anSTFTFrameComputer
which has already been initialized.The filters and window are learnable/adjustable. Be sure to disable gradients with
torch.no_grad()
if a fixed feature representation is desirable.- Parameters:
offsets_and_truncated_filters (
Sequence
[Tuple
[int
,Tensor
]]) – A sequence of pairs(offset, truncated_filter)
. truncated_filter is a one-dimensional tensor of the non-zero frequency response of a single filter in the bank. offset is the index in the short-time spectrum at which the truncated_filter begins.frame_length (
int
) – The number of audio samples constituting a frame.frame_shift (
int
) – The number of audio samples between subsequent frames.frame_style (
Literal
['centered'
,'causal'
]) – If'causal'
, the first frame begins at sample0
. Otherwise, the first frame is centered around sample0
with the exact behaviour dictated by the kaldi_shift flag.window (
Optional
[Tensor
]) – If specified, a tensor of shape(frame_length,)
containing the windowing function. If unspecified, implicit rectangular windowing will be performed (with no gradient).dft_size (
Optional
[int
]) – The size of the spectrum to compute for each frame. Must be greater than or equal to frame_length. If unspecified, the first power of two at or beyond the frame length will be chosen.use_log (
bool
) – Whether to take the logarithm of the resulting representationuse_power (
bool
) – Take the power spectrum instead of the magnitude spectruminclude_energy (
bool
) – Whether to add a coefficient at index 0 corresponding to the frame-wise energy of the signalkaldi_shift (
bool
) – Dictates how to center frames when frame_style is'centered'
. IfTrue
, the k-th frame will be computed using the signal betweensignal[ k * frame_shift - frame_length // 2 + frame_shift // 2:k * frame_shift + (frame_length + 1) // 2 + frame_shift // 2]
. These are the frame bounds for Kaldi [povey2011]. Otherwise, the k-th frame issignal[ k * frame_shift - (frame_length + 1) // 2 + 1: k * frame_shift + frame_length // 2 + 1]
.is_real (
bool
) – Whether the filters are real in the time domain. IfTrue
, coefficients will be doubled (pre-log) to account for Hermitian symmetry.
- classmethod from_stft_frame_computer(computer, filter_type=torch.cfloat, window_type=torch.float)[source]
Create an instance using an STFTFrameComputer
- Parameters:
computer (
ShortTimeFourierTransformFrameComputer
) – The initialized instance to pull parameters fromfilter_type (
dtype
) – The data type to store filter parameters aswindow_type (
dtype
) – The data type to store the window parameter as
- pydrobert.speech.torch.pytorch_dither(sig, coeff=1.0)[source]
Functional implementation of PyTorchDither
- pydrobert.speech.torch.pytorch_preemphasize(sig, coeff=0.97)[source]
Functional implementation of PyTorchPreemphasize
- pydrobert.speech.torch.pytorch_stft_frame_computer(sig, filters, offsets, frame_length, frame_shift, centered=True, window=None, dft_size=None, use_log=True, use_power=False, include_energy=False, kaldi_shift=False, is_real=True, eps=1e-05)
Functional implementation of PyTorchShortTimeFourierTransformFrameComputer