pydrobert.speech.compute
Compute features from speech signals
- class pydrobert.speech.compute.FrameComputer[source]
Bases:
AliasedFactory
Construct features from a signal from fixed-length segments
A signal is treated as a (possibly overlapping) time series of frames. Each frame is transformed into a fixed-length vector of coefficients.
Features can be computed one at a time, for example:
>>> chunk_size = 2 ** 10 >>> while len(signal):Z >>> segment = signal[:chunk_size] >>> feats = computer.compute_chunk(segment) >>> # do something with feats >>> signal = signal[chunk_size:] >>> feats = computer.finalize()
Or all at once (which can be much faster, depending on how the computer is optimized):
>>> feats = computer.compute_full(signal)
The k-th frame can be roughly localized to the signal offset to about
signal[k * computer.frame_shift]
. The signal’s exact region of influence is dictated by the frame_style property.- abstract compute_chunk(chunk)[source]
Compute some coefficients, given a chunk of audio
- Parameters:
chunk (
ndarray
) – A 1D float array of the signal. Should be contiguous and non-overlapping with any previously processed segments in the audio stream- Returns:
chunk (
numpy.ndarray
) – A 2D float array of shape(num_frames, num_coeffs)
.num_frames
is nonnegative (possibly 0). Contains some number of feature vectors, ordered in time over axis 0.
- compute_full(signal)[source]
Compute a full signal’s worth of feature coefficients
- Parameters:
signal (
ndarray
) – A 1D float array of the entire signal- Returns:
spec (
numpy.ndarray
) – A 2D float array of shape(num_frames, num_coeffs)
.num_frames
is nonnegative (possibly 0). Contains some number of feature vectors, ordered in time over axis 0.- Raises:
ValueError – If already begin computing frames (
started=True
), andfinalize()
has not been called
- abstract finalize()[source]
Conclude processing a stream of audio, processing any stored buffer
- Returns:
chunk (
numpy.ndarray
) – A 2D float array of shape(num_frames, num_coeffs)
.num_frames
is either 1 or 0.
- property frame_length_ms
Number of milliseconds of audio which dictate a feature vector
- Type:
- abstract property frame_shift
Number of samples absorbed between successive frame computations
- Type:
- abstract property frame_style
Dictates how the signal is split into frames
If
'causal'
, the k-th frame is computed over the indicessignal[k * frame_shift:k * frame_shift + frame_length]
(at most). If'centered'
, the k-th frame is computed over the indicessignal[k * frame_shift - (frame_length + 1) // 2 + 1:k * frame_shift + frame_length // 2 + 1]
. Any range beyond the bounds of the signal is generated in an implementation-specific way.- Type:
- abstract property started
Whether computations for a signal have started
Becomes
True
after the first call tocompute_chunk()
. BecomesFalse
after call tofinalize()
- Type:
- class pydrobert.speech.compute.LinearFilterBankFrameComputer(bank, include_energy=False)[source]
Bases:
FrameComputer
Frame computers whose features are derived from linear filter banks
Computers based on linear filter banks have a predictable number of coefficients and organization. Like the banks, the features with lower indices correspond to filters with lower bandwidths. num_coeffs will be simply
bank.num_filts + int(include_energy)
.- Parameters:
bank (
Union
[LinearFilterBank
,Mapping
,str
]) – Each filter in the bank corresponds to a coefficient in a frame vector. Can be aLinearFilterBank
or something compatible withpydrobert.speech.alias.alias_factory_subclass_from_arg()
include_energy (
bool
) – Whether to include a coefficient based on the energy of the signal within the frame. IfTrue
, the energy coefficient will be inserted at index 0.
- property bank
The LinearFilterBank from which features are derived
- Type:
- pydrobert.speech.compute.SIFrameComputer
alias of
ShortIntegrationFrameComputer
- pydrobert.speech.compute.STFTFrameComputer
- class pydrobert.speech.compute.ShortIntegrationFrameComputer(bank, frame_shift_ms=10, frame_style=None, include_energy=False, pad_to_nearest_power_of_two=True, window_function=None, use_power=False, use_log=True)[source]
Bases:
LinearFilterBankFrameComputer
Compute features by integrating over the filter modulus
Each filter in the bank is convolved with the signal. A pointwise nonlinearity pushes the frequency band towards zero. Most of the energy of the signal can be captured in a short time integration. Though best suited to processing whole utterances at once, short integration is compatable with the frame analogy if the frame is assumed to be the cone of influence of the maximum-length filter.
For computational purposes, each filter’s impulse response is clamped to zero outside the support of the largest filter in the bank, making it a finite impulse response filter. This effectively decreases the frequency resolution of the filters which aren’t already FIR. For better frequency resolution at the cost of computational time, increase
pydrobert.speech.config.EFFECTIVE_SUPPORT_THRESHOLD
.- Parameters:
bank (
Union
[LinearFilterBank
,Mapping
,str
]) – Each filter in the bank corresponds to a coefficient in a frame vector. Can be aLinearFilterBank
or something compatible withpydrobert.speech.alias.alias_factory_subclass_from_arg()
frame_shift_ms (
float
) – The offset between successive frames, in milliseconds. Also the length of the integrationframe_style (
Optional
[Literal
['causal'
,'centered'
]]) – Defaults to'centered'
if bank.is_zero_phase,'causal'
otherwise. If'centered'
each filter of the bank is translated so that its support lies in the center of the frameinclude_energy (
bool
) –pad_to_nearest_power_of_two (
bool
) – Pad the DFTs used in computation to a power of two for efficient computationwindow_function (
pydrobert.speech.filters.WindowFunction
,dict
, orstr
) – The window used to weigh integration. Can be aWindowFunction
or something compatible withpydrobert.speech.alias_factory_subclass_from_arg()
. Defaults topydrobert.speech.filters.GammaWindow
whenframe_style
is'causal'
, otherwisepydrobert.speech.filters.HannWindow
.use_power (
bool
) – Whether the pointwise linearity is the signal’s power or magnitudeuse_log (
bool
) – Whether to take the log of the integration
- aliases = {'si'}
- class pydrobert.speech.compute.ShortTimeFourierTransformFrameComputer(bank, frame_length_ms=None, frame_shift_ms=10, frame_style=None, include_energy=False, pad_to_nearest_power_of_two=True, window_function=None, use_log=True, use_power=False, kaldi_shift=False)[source]
Bases:
LinearFilterBankFrameComputer
Compute features of a signal by integrating STFTs
Computations are per frame and as follows:
The current frame is multiplied with some window (rectangular, Hamming, Hanning, etc)
A DFT is performed on the result
For each filter in the provided input bank:
Multiply the result of 2. with the frequency response of the filter
Sum either the pointwise square or absolute value of elements in the buffer from 3a.
Optionally take the log of the sum
Warning
This behaviour differs from that of [povey2011] or [young] in three ways. First, the sum (3b) comes after the filtering (3a), which changes the result in the squared case. Second, the sum is over the full power spectrum, rather than just between 0 and the Nyquist. This doubles the value at the end of 3c. if a real filter is used. Third, frame boundaries are calculated diffferently.
- Parameters:
bank (
Union
[LinearFilterBank
,Mapping
,str
]) – Each filter in the bank corresponds to a coefficient in a frame vector. Can be aLinearFilterBank
or something compatible withpydrobert.speech.alias.alias_factory_subclass_from_arg()
frame_length_ms (
Optional
[float
]) – The length of a frame, in milliseconds. Defaults to the length of the largest filter in the bankframe_shift_ms (
float
, optional) – The offset between successive frames, in millisecondsframe_style (
Optional
[Literal
['causal'
,'centered'
]]) – Defaults to'centered'
ifbank.is_zero_phase
,'causal'
otherwise.include_energy (
bool
) –pad_to_nearest_power_of_two (
bool
) – Whether the DFT should be a padded to a power of two for computational efficiencywindow_function (
Union
[WindowFunction
,Mapping
,str
,None
]) – The window used in step 1. Can be aWindowFunction
or something compatible withpydrobert.speech.alias_factory_subclass_from_arg()
. Defaults topydrobert.speech.filters.GammaWindow
when frame_style is'causal'
, otherwisepydrobert.speech.filters.HannWindow
.use_log (
bool
) – Whether to take the log of the sum from 3b.use_power (
bool
) – Whether to sum the power spectrum or the magnitude spectrumkaldi_shift (
bool
) – Dictates how to center frames when frame_style is'centered'
. IfTrue
, the k-th frame will be computed using the signal betweensignal[ k * frame_shift - frame_length // 2 + frame_shift // 2:k * frame_shift + (frame_length + 1) // 2 + frame_shift // 2]
. These are the frame bounds for Kaldi [povey2011]. Otherwise, the k-th frame issignal[ k * frame_shift - (frame_length + 1) // 2 + 1: k * frame_shift + frame_length // 2 + 1]
.
- aliases = {'stft'}
- pydrobert.speech.compute.frame_by_frame_calculation(computer, signal, chunk_size=1024)[source]
Compute feature representation of entire signal iteratively
This function constructs a feature matrix of a signal through successive calls to
computer.compute_chunk
. Its return value should be identical to that of callingcomputer.compute_full(signal)
, but is possibly much slower.computer.compute_full
should be favoured.- Parameters:
computer (
FrameComputer
) –signal (
ndarray
) – A 1D float array of the entire signalchunk_size (
int
) – The length of the signal buffer to process at a given time
- Returns:
spec (
numpy.ndarray
) – A 2D float array of shape(num_frames, num_coeffs)
.num_frames
is nonnegative (possibly 0). Contains some number of feature vectors, ordered in time over axis 0.- Raises:
ValueError – If already begin computing frames (
computer.started == True
)