pydrobert.speech.util

Miscellaneous utility functions

pydrobert.speech.util.angular_to_hertz(angle, samp_rate)[source]

Convert radians/sec to cycles/sec

pydrobert.speech.util.circshift_fourier(filt, shift, start_idx=0, dft_size=None, copy=True)[source]

Circularly shift a filter in the time domain, from the fourier domain

A simple application of the shift theorem

\[DFT(T_u x)[k] = DFT(x)[k] e^{-2i\pi k u}\]

Where we set u = shift / dft_size

Parameters:
  • filt (ndarray) – The filter, in the fourier domain

  • shift (float) – The number of samples to be translated by.

  • start_idx (int) – If filt is a truncated frequency response, this parameter indicates at what index in the dft the nonzero region starts

  • dft_size (int) – The dft_size of the filter. Defaults to len(filt) + start_idx

  • copy (bool) – Whether it is okay to modify and return filt

Returns:

out (numpy.ndarray) – The 128-bit complex filter frequency response, shifted by u

pydrobert.speech.util.gauss_quant(p, mu=0, std=1)[source]

Gaussian quantile function

Given a probability from a univariate Gaussian, determine the value of the random variable such that the probability of drawing a value l.t.e. to that value is equal to the probability. In other words, the so-called inverse cumulative distribution function.

If scipy can be imported, this function uses scipy.norm.ppf() to calculate the result. Otherwise, it uses the approximation from Odeh & Evans 1974 (thru Brophy 1985)

Parameters:
  • p (float) – The probability

  • mu (float) – The Gaussian mean

  • std (float) – The Gaussian standard deviation

Returns:

q (float) – The random variable value

pydrobert.speech.util.hertz_to_angular(hertz, samp_rate)[source]

Convert cycles/sec to radians/sec

pydrobert.speech.util.read_signal(rfilename, dtype=None, key=None, force_as=None, **kwargs)[source]

Read a signal from a variety of possible sources

Though the goal of this function is to return an array representing a signal of some sort, the way it goes about doing so depends on the setting of rfilename, processed in the following order:

  1. If rfilename starts with the regular expression r'^(ark|scp)(,w+)*:', the file is treated as a Kaldi table and opened with the kaldi data type dtype (defaults to BaseMatrix). The package pydrobert.kaldi will be imported to handle reading. If key is set, the value associated with that key is retrieved. Otherwise the first listed value is returned.

  2. If rfilename ends with a file type listed in pydrobert.speech.config.SOUNDFILE_SUPPORTED_FILE_TYPES (requires soundfile), the file will be opened with that audio file type.

  3. If rfilename ends with '.wav', the file is assumed to be a wave file. The function will rely on the scipy package to load the file if scipy can be imported. Otherwise, it uses the standard wave package. The type of data encodings each package can handle varies, though neither can handle compressed data.

  4. If rfilename ends with '.hdf5', the file is assumed to be an HDF5 file. HDF5 and h5py must be installed on the host system to read this way. If key is set, the data will assumed to be indexed by key on the archive. Otherwise, a depth-first search of the archive will be performed for the first data set. If set, data will be cast to as the numpy data type dtype

  5. If rfilename ends with '.npy', the file is assumed to be a binary in Numpy format. If set, the result will be cast as the numpy data type dtype.

  6. If rfilename ends with '.npz', the file is assumed to be an archive in Numpy format. If key is swet, the data indexed by key will be loaded. Otherwise the data indexed by the key 'arr_0' will be loaded. If set, the result will be cast as the numpy data type dtype.

  7. If rfilename ends with '.pt', the file is assumed to be a binary in PyTorch format. If set, the results will be cast as the numpy data type dtype.

  8. If rfilename ends with '.sph', the file is assumed to be a NIST SPHERE file. If set, the results will be cast as the numpy data type dtype

  9. If rfilename` ends with '|', it will try to read an object of kaldi data type dtype (defaults to BaseMatrix) from a basic kaldi input stream.

  10. Otherwise, we throw an IOError

Additional keyword arguments are passed along to the associated open or read operation.

Parameters:
  • rfilename (Union[str, BinaryIO]) – Either a string or a binary file type. If a file, force_as must be specified, and the Kaldi types are unsupported.

  • dtype (Optional[dtype]) – If set, will cast the return type to it

  • key (Any) – The key used in 'hdf5' or 'table' decoding.

  • force_as (Optional[str]) – If not None, forces rfilename to be interpreted as a specific file type, bypassing the above selection strategy. 'table': Kaldi table; 'wav': wave file; 'hdf5': HDF5 file; 'npy': Numpy binary; 'npz': Numpy archive; 'pt': PyTorch binary; 'sph': NIST sphere; 'kaldi' Kaldi object; 'file' read via numpy.fromfile(). The types in SOUNDFILE_SUPPORTED_FILE_TYPES are also valid values. ‘soundfile’ will use soundfile to read the file regardless of the suffix.

  • **kwargs

Returns:

signal (numpy.ndarray)

Warning

Post v 0.2.0, the behaviour after step 8 changed. Instead of trying to read first as Kaldi input, and, failing that, via numpy.fromfile(), we try to read as Kaldi input if the file name ends with '|' and error otherwise. The catch-all behaviour was disabled due to the interaction with pydrobert.speech.config.SOUNDFILE_SUPPORTED_FILE_TYPES whose value depends on the existence of soundfile and the underlying version of libsndfile.

Notes

Python code for reading SPHERE files (not via :mod:soundfile`) was based off of sph2pipe v 2.5. That code can only suppport the “shorten” audio format up to version 2.

pydrobert.speech.util.wds_read_signal(key, data)[source]

Wrapper around read_signal for webdataset

This method is intended for Data Decoding in a WebDataset. It uses read_signal() to read a file and returns it as a Numpy array.

Examples

>>> import webdataset as wds
>>> url = 'pipe:curl -L -s https://dl.fbaipublicfiles.com/librilight/data/small.tar'
>>> ds = (
...     wds.WebDataset(url)
...     .decode(wds_read_signal)
...     .to_tuple('json', 'flac', handler=wds.ignore_and_continue)
... )
>>> for info, signal in ds:
...     # do something

Warning

Kaldi types are currently unsupported.

This decoder clobbers the default WebDataset decoder for “npy” and “pt” files.