pydrobert.speech.filters
Filters and filter banks
- class pydrobert.speech.filters.BartlettWindow[source]
Bases:
WindowFunction
A unit-normalized triangular window
See also
- aliases = {'bartlett', 'tri', 'triangular'}
- class pydrobert.speech.filters.BlackmanWindow[source]
Bases:
WindowFunction
A unit-normalized Blackman window
See also
- aliases = {'black', 'blackman'}
- class pydrobert.speech.filters.ComplexGammatoneFilterBank(scaling_function, num_filts=40, high_hz=None, low_hz=20.0, sampling_rate=16000, order=4, max_centered=False, scale_l2_norm=False, erb=False)[source]
Bases:
LinearFilterBank
Gammatone filters with complex carriers
A complex gammatone filter [flanagan1960] [aertsen1981] can be defined as
\[h(t) = c t^{n - 1} e^{- \alpha t + i\xi t} u(t)\]in the time domain, where \(\alpha\) is the bandwidth parameter, \(\xi\) is the carrier frequency, \(n\) is the order of the function, \(u(t)\) is the step function, and \(c\) is a normalization constant. In the frequency domain, the filter is defined as
\[H(\omega) = \frac{c(n - 1)!}{\left( \alpha + i(\omega - \xi) \right)^n}\]For large \(\xi\), the complex gammatone is approximately analytic.
scaling_function is used to split up the frequencies between high_hz and low_hz into a series of filters. Every subsequent filter’s width is scaled such that, if the filters are all of the same height, the intersection with the precedent filter’s response matches the filter’s Equivalent Rectangular Bandwidth (
erb == True
) or its 3dB bandwidths (erb == False
). The ERB is the width of a rectangular filter with the same height as the filter’s maximum frequency response that has the same \(L^2\) norm.- Parameters:
scaling_function (
Union
[ScalingFunction
,Mapping
,str
]) – Dictates the layout of filters in the Fourier domain. Can be aScalingFunction
or something compatible withpydrobert.speech.alias.alias_factory_subclass_from_arg()
num_filts (
int
) – The number of filters in the bankhigh_hz (
Optional
[float
]) – The topmost edge of the filter frequencies. The default is the Nyquist for sampling_rate.low_hz (
float
) – The bottommost edge of the filter frequences.sampling_rate (
float
, optional) – The sampling rate (cycles/sec) of the target recordingsorder (
int
) – The \(n\) parameter in the Gammatone. Should be positive. Larger orders will make the gammatone more symmetrical.max_centered (
bool
) – While normally causal, setting max_centered to true will shift all filters in the bank such that the maximum absolute value in time is centered at sample 0.scale_l2_norm (
bool
) – Whether to scale the l2 norm of each filter to 1. Otherwise the frequency response of each filter will max out at an absolute value of 1.erb (
bool
) –
See also
pydrobert.speech.config.EFFECTIVE_SUPPORT_THRESHOLD
The absolute value below which counts as zero
- aliases = {'gammatone', 'tonebank'}
- class pydrobert.speech.filters.GaborFilterBank(scaling_function, num_filts=40, high_hz=None, low_hz=20.0, sampling_rate=16000, scale_l2_norm=False, erb=False)[source]
Bases:
LinearFilterBank
Gabor filters with ERBs between points from a scale
Gabor filters are complex, mostly analytic filters that have a Gaussian envelope in both the time and frequency domains. They are defined as
\[f(t) = C \sigma^{-1/2} \pi^{-1/4} e^{\frac{-t^2}{2\sigma^2} + i\xi t}\]in the time domain and
\[\widehat{f}(\omega) = C \sqrt{2\sigma} \pi^{1/4} e^{\frac{-\sigma^2(\xi - \omega)^2}{2}}\]in the frequency domain. Though Gaussians never truly reach 0, in either domain, they are effectively compactly supported. Gabor filters are optimal with respect to their time-bandwidth product.
scaling_function is used to split up the frequencies between high_hz and low_hz into a series of filters. Every subsequent filter’s width is scaled such that, if the filters are all of the same height, the intersection with the precedent filter’s response matches the filter’s Equivalent Rectangular Bandwidth (
erb == True
) or its 3dB bandwidths (erb == False
). The ERB is the width of a rectangular filter with the same height as the filter’s maximum frequency response that has the same \(L^2\) norm.- Parameters:
scaling_function (
Union
[ScalingFunction
,Mapping
,str
]) – Dictates the layout of filters in the Fourier domain. Can be aScalingFunction
or something compatible withpydrobert.speech.alias.alias_factory_subclass_from_arg()
num_filts (
int
) – The number of filters in the bankhigh_hz (
Optional
[float
]) – The topmost edge of the filter frequencies. The default is the Nyquist for sampling_rate.low_hz (
float
) – The bottommost edge of the filter frequences.sampling_rate (
float
) – The sampling rate (cycles/sec) of the target recordingsscale_l2_norm (
bool
) – Whether to scale the l2 norm of each filter to 1. Otherwise the frequency response of each filter will max out at an absolute value of 1.erb (
bool
) –
See also
pydrobert.speech.config.EFFECTIVE_SUPPORT_THRESHOLD
The absolute value below which counts as zero
- aliases = {'gabor'}
- class pydrobert.speech.filters.GammaWindow(order=4, peak=0.75)[source]
Bases:
WindowFunction
A lowpass filter based on the Gamma function
A Gamma function is defined as:
\[p(t; \alpha, n) = t^{n - 1} e^{-\alpha t} u(t)\]Where \(n\) is the order of the function, \(\alpha\) controls the bandwidth of the filter, and \(u\) is the step function.
This function returns a window based off a reflected Gamma function. \(\alpha\) is chosen such that the maximum value of the window aligns with peak. The window is clipped to the width. For reasonable values of peak (i.e. in the last quarter of samples), the majority of the support should lie in this interval anyways.
- Parameters:
- aliases = {'gamma'}
- order
- peak
- class pydrobert.speech.filters.HammingWindow[source]
Bases:
WindowFunction
A unit-normalized Hamming window
See also
- aliases = {'hamming'}
- class pydrobert.speech.filters.HannWindow[source]
Bases:
WindowFunction
A unit-normalized Hann window
See also
- aliases = {'hann', 'hanning'}
- class pydrobert.speech.filters.LinearFilterBank[source]
Bases:
AliasedFactory
A collection of linear, time invariant filters
A
LinearFilterBank
instance is expected to provide factory methods for instantiating a fixed number of LTI filters in either the time or frequency domain. Filters should be organized lowest frequency first.- abstract get_frequency_response(filt_idx, width, half=False)[source]
Construct filter frequency response in a fixed-width buffer
Construct the 2pi-periodized filter in the frequency domain. Zero-phase filters are returned as 8-byte float arrays. Otherwise, they will be 16-byte complex floats.
- Parameters:
- Returns:
fr (
numpy.ndarray
) – If half isFalse
, returns a 1D float64 or complex128 numpy array of length width. If half isTrue
and width is even, the returned array is of lengthwidth // 2 + 1
. If width is odd, the returned array is of length(width + 1) // 2
.
- abstract get_impulse_response(filt_idx, width)[source]
Construct filter impulse response in a fixed-width buffer
Construct the filter in the time domain.
- Parameters:
- Returns:
ir (
numpy.ndarray
) – 1D float64 or complex128 numpy array of length width
- abstract get_truncated_response(filt_idx, width)[source]
Get nonzero region of filter frequency response
Many filters will be compactly supported in frequency (or approximately so). This method generates a tuple (bin_idx, buf) of the nonzero region.
In the case of a complex filter,
bin_idx + len(buf)
may be greater than width; the filter wraps around in this case. The full frequency response can be calculated from the truncated response by:>>> bin_idx, trnc = bank.get_truncated_response(filt_idx, width) >>> full = numpy.zeros(width, dtype=trnc.dtype) >>> wrap = min(bin_idx + len(trnc), width) - bin_idx >>> full[bin_idx:bin_idx + wrap] = trnc[:wrap] >>> full[:len(trnc) - wrap] = tnc[wrap:]
In the case of a real filter, only the nonzero region between
[0, pi]
(half-spectrum) is returned. No wrapping can occur since it would inevitably interfere with itself due to conjugate symmetry. The half-spectrum can easily be recovered by:>>> half_width = (width + width % 2) // 2 + 1 - width % 2 >>> half = numpy.zeros(half_width, dtype=trnc.dtype) >>> half[bin_idx:bin_idx + len(trnc)] = trnc
And the full spectrum by:
>>> full[bin_idx:bin_idx + len(trnc)] = trnc >>> full[width - bin_idx - len(trnc) + 1:width - bin_idx + 1] = \ ... trnc[:None if bin_idx else 0:-1].conj()
(the embedded if-statement is necessary when bin_idx is 0, as the full fft excludes its symmetric bin)
- abstract property is_analytic
Whether the filters are (approximately) analytic
An analytic signal has no negative frequency components. A real signal cannot be analytic.
- Type:
- abstract property is_zero_phase
Whether the filters are zero phase or not
Zero phase filters are even functions with no imaginary part in the fourier domain. Their impulse responses center around 0.
- Type:
- abstract property supports
Boundaries of effective support of filter impulse resps, in samples
Returns a tuple of length num_filts containing pairs of integers of the first and last (effectively) nonzero samples.
The boundaries need not be tight, i.e. the region inside the boundaries could be zero. It is more important to guarantee that the region outside the boundaries is approximately zero.
If a filter is instantiated using a buffer that is unable to fully contain the supported region, samples will wrap around the boundaries of the buffer.
Noncausal filters will have start indices less than 0. These samples will wrap to the end of the filter buffer when the filter is instantiated.
- Type:
- abstract property supports_hz
Boundaries of effective support of filter freq responses, in Hz.
Returns a tuple of length num_filts containing pairs of floats of the low and high frequencies. Frequencies outside the span have a response of approximately (with magnitude up to
pydrobert.speech.EFFECTIVE_SUPPORT_SIGNAL
) zero.The boundaries need not be tight, i.e. the region inside the boundaries could be zero. It is more important to guarantee that the region outside the boundaries is approximately zero.
The boundaries ignore the Hermitian symmetry of the filter if it is real. Bounds of
(10, 20)
for a real filter imply that the region(-20, -10)
could also be nonzero.The user is responsible for adjusting the for the periodicity induced by sampling. For example, if the boundaries are
(-5, 10)
and the filter is sampled at 15Hz, then all bins of an associated DFT could be nonzero.- Type:
- class pydrobert.speech.filters.TriangularOverlappingFilterBank(scaling_function, num_filts=40, high_hz=None, low_hz=20.0, sampling_rate=16000, analytic=False)[source]
Bases:
LinearFilterBank
Triangular frequency response whose vertices are along the scale
The vertices of the filters are sampled uniformly along the passed scale. If the scale is nonlinear, the triangles will be asymmetrical. This is closely related to, but not identical to, the filters described in [povey2011] and [young].
- Parameters:
scaling_function (
Union
[ScalingFunction
,Mapping
,str
]) – Dictates the layout of filters in the Fourier domain. Can be aScalingFunction
or something compatible withpydrobert.speech.alias.alias_factory_subclass_from_arg()
num_filts (
int
) – The number of filters in the bankhigh_hz (
Optional
[float
]) – The topmost edge of the filter frequencies. The default is the Nyquist for sampling_rate.low_hz (
float
) – The bottommost edge of the filter frequences.sampling_rate (
float
) – The sampling rate (cycles/sec) of the target recordingsanalytic (
bool
) – Whether to use an analytic form of the bank. The analytic form is easily derived from the real form in [povey2011] and [young]. Since the filter is compactly supported in frequency, the analytic form is simply the suppression of the[-pi, 0)
frequencies
- Raises:
ValueError – If high_hz is above the Nyquist, or low_hz is below
0
, orhigh_hz <= low_hz
- aliases = {'tri', 'triangular'}
- property centers_hz
The point of maximum gain in each filter’s frequency response, in Hz
This property gives the so-called “center frequencies” - the point of maximum gain - of each filter.
- class pydrobert.speech.filters.WindowFunction[source]
Bases:
AliasedFactory
A real linear filter, usually lowpass
- abstract get_impulse_response(width)[source]
Write the filter into a numpy array of fixed width
- Parameters:
width (
int
) – The length of the window in samples- Returns:
ir (
numpy.ndarray
) – A 1D vector of length width