pydrobert.speech.filters

Filters and filter banks

class pydrobert.speech.filters.BartlettWindow[source]

Bases: WindowFunction

A unit-normalized triangular window

See also

numpy.bartlett

aliases = {'bartlett', 'tri', 'triangular'}

class pydrobert.speech.filters.BlackmanWindow[source]

Bases: WindowFunction

A unit-normalized Blackman window

See also

numpy.blackman

aliases = {'black', 'blackman'}

class pydrobert.speech.filters.ComplexGammatoneFilterBank(scaling_function, num_filts=40, high_hz=None, low_hz=20.0, sampling_rate=16000, order=4, max_centered=False, scale_l2_norm=False, erb=False)[source]

Bases: LinearFilterBank

Gammatone filters with complex carriers

A complex gammatone filter [flanagan1960] [aertsen1981] can be defined as

\[h(t) = c t^{n - 1} e^{- \alpha t + i\xi t} u(t)\]

in the time domain, where \(\alpha\) is the bandwidth parameter, \(\xi\) is the carrier frequency, \(n\) is the order of the function, \(u(t)\) is the step function, and \(c\) is a normalization constant. In the frequency domain, the filter is defined as

\[H(\omega) = \frac{c(n - 1)!}{\left( \alpha + i(\omega - \xi) \right)^n}\]

For large \(\xi\), the complex gammatone is approximately analytic.

scaling_function is used to split up the frequencies between high_hz and low_hz into a series of filters. Every subsequent filter’s width is scaled such that, if the filters are all of the same height, the intersection with the precedent filter’s response matches the filter’s Equivalent Rectangular Bandwidth (erb == True) or its 3dB bandwidths (erb == False). The ERB is the width of a rectangular filter with the same height as the filter’s maximum frequency response that has the same \(L^2\) norm.

Parameters:

scaling_function (Union[ScalingFunction, Mapping, str]) – Dictates the layout of filters in the Fourier domain. Can be a ScalingFunction or something compatible with pydrobert.speech.alias.alias_factory_subclass_from_arg()
num_filts (int) – The number of filters in the bank
high_hz (Optional[float]) – The topmost edge of the filter frequencies. The default is the Nyquist for sampling_rate.
low_hz (float) – The bottommost edge of the filter frequences.
sampling_rate (float, optional) – The sampling rate (cycles/sec) of the target recordings
order (int) – The \(n\) parameter in the Gammatone. Should be positive. Larger orders will make the gammatone more symmetrical.
max_centered (bool) – While normally causal, setting max_centered to true will shift all filters in the bank such that the maximum absolute value in time is centered at sample 0.
scale_l2_norm (bool) – Whether to scale the l2 norm of each filter to 1. Otherwise the frequency response of each filter will max out at an absolute value of 1.
erb (bool) –

See also

pydrobert.speech.config.EFFECTIVE_SUPPORT_THRESHOLD: The absolute value below which counts as zero

aliases = {'gammatone', 'tonebank'}

property centers_hz

The point of maximum gain in each filter’s frequency response, in Hz

This property gives the so-called “center frequencies” - the point of maximum gain - of each filter.

Type:: int

class pydrobert.speech.filters.GaborFilterBank(scaling_function, num_filts=40, high_hz=None, low_hz=20.0, sampling_rate=16000, scale_l2_norm=False, erb=False)[source]

Bases: LinearFilterBank

Gabor filters with ERBs between points from a scale

Gabor filters are complex, mostly analytic filters that have a Gaussian envelope in both the time and frequency domains. They are defined as

\[f(t) = C \sigma^{-1/2} \pi^{-1/4} e^{\frac{-t^2}{2\sigma^2} + i\xi t}\]

in the time domain and

\[\widehat{f}(\omega) = C \sqrt{2\sigma} \pi^{1/4} e^{\frac{-\sigma^2(\xi - \omega)^2}{2}}\]

in the frequency domain. Though Gaussians never truly reach 0, in either domain, they are effectively compactly supported. Gabor filters are optimal with respect to their time-bandwidth product.

scaling_function is used to split up the frequencies between high_hz and low_hz into a series of filters. Every subsequent filter’s width is scaled such that, if the filters are all of the same height, the intersection with the precedent filter’s response matches the filter’s Equivalent Rectangular Bandwidth (erb == True) or its 3dB bandwidths (erb == False). The ERB is the width of a rectangular filter with the same height as the filter’s maximum frequency response that has the same \(L^2\) norm.

Parameters:

scaling_function (Union[ScalingFunction, Mapping, str]) – Dictates the layout of filters in the Fourier domain. Can be a ScalingFunction or something compatible with pydrobert.speech.alias.alias_factory_subclass_from_arg()
num_filts (int) – The number of filters in the bank
high_hz (Optional[float]) – The topmost edge of the filter frequencies. The default is the Nyquist for sampling_rate.
low_hz (float) – The bottommost edge of the filter frequences.
sampling_rate (float) – The sampling rate (cycles/sec) of the target recordings
scale_l2_norm (bool) – Whether to scale the l2 norm of each filter to 1. Otherwise the frequency response of each filter will max out at an absolute value of 1.
erb (bool) –

See also

pydrobert.speech.config.EFFECTIVE_SUPPORT_THRESHOLD: The absolute value below which counts as zero

aliases = {'gabor'}

property centers_hz

The point of maximum gain in each filter’s frequency response, in Hz

This property gives the so-called “center frequencies” - the point of maximum gain - of each filter.

Type:: tuple

class pydrobert.speech.filters.GammaWindow(order=4, peak=0.75)[source]

Bases: WindowFunction

A lowpass filter based on the Gamma function

A Gamma function is defined as:

\[p(t; \alpha, n) = t^{n - 1} e^{-\alpha t} u(t)\]

Where \(n\) is the order of the function, \(\alpha\) controls the bandwidth of the filter, and \(u\) is the step function.

This function returns a window based off a reflected Gamma function. \(\alpha\) is chosen such that the maximum value of the window aligns with peak. The window is clipped to the width. For reasonable values of peak (i.e. in the last quarter of samples), the majority of the support should lie in this interval anyways.

Parameters:

order (int) –
peak (float) – peak * width, where width is the length of the window in samples, is where the approximate maximal value of the window lies

aliases = {'gamma'}

order

peak

class pydrobert.speech.filters.HammingWindow[source]

Bases: WindowFunction

A unit-normalized Hamming window

See also

numpy.hamming

aliases = {'hamming'}

class pydrobert.speech.filters.HannWindow[source]

Bases: WindowFunction

A unit-normalized Hann window

See also

numpy.hanning

aliases = {'hann', 'hanning'}

class pydrobert.speech.filters.LinearFilterBank[source]

Bases: AliasedFactory

A collection of linear, time invariant filters

A LinearFilterBank instance is expected to provide factory methods for instantiating a fixed number of LTI filters in either the time or frequency domain. Filters should be organized lowest frequency first.

abstract get_frequency_response(filt_idx, width, half=False)[source]

Construct filter frequency response in a fixed-width buffer

Construct the 2pi-periodized filter in the frequency domain. Zero-phase filters are returned as 8-byte float arrays. Otherwise, they will be 16-byte complex floats.

Parameters:

filt_idx (int) – The index of the filter to generate. Less than num_filts
width (int) – The length of the DFT to output
half (bool) – Whether to return only the DFT bins between [0,pi]

Returns:

fr (numpy.ndarray) – If half is False, returns a 1D float64 or complex128 numpy array of length width. If half is True and width is even, the returned array is of length width // 2 + 1. If width is odd, the returned array is of length (width + 1) // 2.

abstract get_impulse_response(filt_idx, width)[source]

Construct filter impulse response in a fixed-width buffer

Construct the filter in the time domain.

Parameters:

filt_idx (int) – The index of the filter to generate. Less than num_filts
width (int) – The length of the buffer, in samples. If less than the support of the filter, the filter will alias.

Returns:

ir (numpy.ndarray) – 1D float64 or complex128 numpy array of length width

abstract get_truncated_response(filt_idx, width)[source]

Get nonzero region of filter frequency response

Many filters will be compactly supported in frequency (or approximately so). This method generates a tuple (bin_idx, buf) of the nonzero region.

In the case of a complex filter, bin_idx + len(buf) may be greater than width; the filter wraps around in this case. The full frequency response can be calculated from the truncated response by:

>>> bin_idx, trnc = bank.get_truncated_response(filt_idx, width)
>>> full = numpy.zeros(width, dtype=trnc.dtype)
>>> wrap = min(bin_idx + len(trnc), width) - bin_idx
>>> full[bin_idx:bin_idx + wrap] = trnc[:wrap]
>>> full[:len(trnc) - wrap] = tnc[wrap:]

In the case of a real filter, only the nonzero region between [0, pi] (half-spectrum) is returned. No wrapping can occur since it would inevitably interfere with itself due to conjugate symmetry. The half-spectrum can easily be recovered by:

>>> half_width = (width + width % 2) // 2 + 1 - width % 2
>>> half = numpy.zeros(half_width, dtype=trnc.dtype)
>>> half[bin_idx:bin_idx + len(trnc)] = trnc

And the full spectrum by:

>>> full[bin_idx:bin_idx + len(trnc)] = trnc
>>> full[width - bin_idx - len(trnc) + 1:width - bin_idx + 1] = \
...     trnc[:None if bin_idx else 0:-1].conj()

(the embedded if-statement is necessary when bin_idx is 0, as the full fft excludes its symmetric bin)

Parameters:

filt_idx (int) – The index of the filter to generate. Less than num_filts
width (int) – The length of the DFT to output

Returns:

tfr (tuple of int, array)

abstract property is_analytic

Whether the filters are (approximately) analytic

An analytic signal has no negative frequency components. A real signal cannot be analytic.

Type:: bool

abstract property is_real

Whether the filters are real or complex

Type:: bool

abstract property is_zero_phase

Whether the filters are zero phase or not

Zero phase filters are even functions with no imaginary part in the fourier domain. Their impulse responses center around 0.

Type:: bool

abstract property num_filts

Number of filters in the bank

Type:: int

abstract property sampling_rate

Number of samples in a second of a target recording

Type:: float

abstract property supports

Boundaries of effective support of filter impulse resps, in samples

Returns a tuple of length num_filts containing pairs of integers of the first and last (effectively) nonzero samples.

The boundaries need not be tight, i.e. the region inside the boundaries could be zero. It is more important to guarantee that the region outside the boundaries is approximately zero.

If a filter is instantiated using a buffer that is unable to fully contain the supported region, samples will wrap around the boundaries of the buffer.

Noncausal filters will have start indices less than 0. These samples will wrap to the end of the filter buffer when the filter is instantiated.

Type:: tuple

abstract property supports_hz

Boundaries of effective support of filter freq responses, in Hz.

Returns a tuple of length num_filts containing pairs of floats of the low and high frequencies. Frequencies outside the span have a response of approximately (with magnitude up to pydrobert.speech.EFFECTIVE_SUPPORT_SIGNAL) zero.

The boundaries need not be tight, i.e. the region inside the boundaries could be zero. It is more important to guarantee that the region outside the boundaries is approximately zero.

The boundaries ignore the Hermitian symmetry of the filter if it is real. Bounds of (10, 20) for a real filter imply that the region (-20, -10) could also be nonzero.

The user is responsible for adjusting the for the periodicity induced by sampling. For example, if the boundaries are (-5, 10) and the filter is sampled at 15Hz, then all bins of an associated DFT could be nonzero.

Type:: tuple

property supports_ms

Boundaries of effective support of filter impulse resps, in ms

Type:: tuple

class pydrobert.speech.filters.TriangularOverlappingFilterBank(scaling_function, num_filts=40, high_hz=None, low_hz=20.0, sampling_rate=16000, analytic=False)[source]

Bases: LinearFilterBank

Triangular frequency response whose vertices are along the scale

The vertices of the filters are sampled uniformly along the passed scale. If the scale is nonlinear, the triangles will be asymmetrical. This is closely related to, but not identical to, the filters described in [povey2011] and [young].

Parameters:

scaling_function (Union[ScalingFunction, Mapping, str]) – Dictates the layout of filters in the Fourier domain. Can be a ScalingFunction or something compatible with pydrobert.speech.alias.alias_factory_subclass_from_arg()
num_filts (int) – The number of filters in the bank
high_hz (Optional[float]) – The topmost edge of the filter frequencies. The default is the Nyquist for sampling_rate.
low_hz (float) – The bottommost edge of the filter frequences.
sampling_rate (float) – The sampling rate (cycles/sec) of the target recordings
analytic (bool) – Whether to use an analytic form of the bank. The analytic form is easily derived from the real form in [povey2011] and [young]. Since the filter is compactly supported in frequency, the analytic form is simply the suppression of the [-pi, 0) frequencies

Raises:

ValueError – If high_hz is above the Nyquist, or low_hz is below 0, or high_hz <= low_hz

aliases = {'tri', 'triangular'}

property centers_hz

The point of maximum gain in each filter’s frequency response, in Hz

This property gives the so-called “center frequencies” - the point of maximum gain - of each filter.

class pydrobert.speech.filters.WindowFunction[source]

Bases: AliasedFactory

A real linear filter, usually lowpass

abstract get_impulse_response(width)[source]

Write the filter into a numpy array of fixed width

Parameters:: width (int) – The length of the window in samples
Returns:: ir (numpy.ndarray) – A 1D vector of length width