pydrobert.speech.util
Miscellaneous utility functions
- pydrobert.speech.util.circshift_fourier(filt, shift, start_idx=0, dft_size=None, copy=True)[source]
Circularly shift a filter in the time domain, from the fourier domain
A simple application of the shift theorem
\[DFT(T_u x)[k] = DFT(x)[k] e^{-2i\pi k u}\]Where we set
u = shift / dft_size
- Parameters:
filt (
ndarray
) – The filter, in the fourier domainshift (
float
) – The number of samples to be translated by.start_idx (
int
) – If filt is a truncated frequency response, this parameter indicates at what index in the dft the nonzero region startsdft_size (
int
) – The dft_size of the filter. Defaults tolen(filt) + start_idx
copy (
bool
) – Whether it is okay to modify and return filt
- Returns:
out (
numpy.ndarray
) – The 128-bit complex filter frequency response, shifted by u
- pydrobert.speech.util.gauss_quant(p, mu=0, std=1)[source]
Gaussian quantile function
Given a probability from a univariate Gaussian, determine the value of the random variable such that the probability of drawing a value l.t.e. to that value is equal to the probability. In other words, the so-called inverse cumulative distribution function.
If scipy can be imported, this function uses
scipy.norm.ppf()
to calculate the result. Otherwise, it uses the approximation from Odeh & Evans 1974 (thru Brophy 1985)
- pydrobert.speech.util.read_signal(rfilename, dtype=None, key=None, force_as=None, **kwargs)[source]
Read a signal from a variety of possible sources
Though the goal of this function is to return an array representing a signal of some sort, the way it goes about doing so depends on the setting of rfilename, processed in the following order:
If rfilename starts with the regular expression
r'^(ark|scp)(,w+)*:'
, the file is treated as a Kaldi table and opened with the kaldi data type dtype (defaults toBaseMatrix
). The packagepydrobert.kaldi
will be imported to handle reading. If key is set, the value associated with that key is retrieved. Otherwise the first listed value is returned.If rfilename ends with a file type listed in
pydrobert.speech.config.SOUNDFILE_SUPPORTED_FILE_TYPES
(requiressoundfile
), the file will be opened with that audio file type.If rfilename ends with
'.wav'
, the file is assumed to be a wave file. The function will rely on thescipy
package to load the file ifscipy
can be imported. Otherwise, it uses the standardwave
package. The type of data encodings each package can handle varies, though neither can handle compressed data.If rfilename ends with
'.hdf5'
, the file is assumed to be an HDF5 file. HDF5 andh5py
must be installed on the host system to read this way. If key is set, the data will assumed to be indexed by key on the archive. Otherwise, a depth-first search of the archive will be performed for the first data set. If set, data will be cast to as the numpy data type dtypeIf rfilename ends with
'.npy'
, the file is assumed to be a binary in Numpy format. If set, the result will be cast as the numpy data type dtype.If rfilename ends with
'.npz'
, the file is assumed to be an archive in Numpy format. If key is swet, the data indexed by key will be loaded. Otherwise the data indexed by the key'arr_0'
will be loaded. If set, the result will be cast as the numpy data type dtype.If rfilename ends with
'.pt'
, the file is assumed to be a binary in PyTorch format. If set, the results will be cast as the numpy data type dtype.If rfilename ends with
'.sph'
, the file is assumed to be a NIST SPHERE file. If set, the results will be cast as the numpy data type dtypeIf rfilename` ends with
'|'
, it will try to read an object of kaldi data type dtype (defaults toBaseMatrix
) from a basic kaldi input stream.Otherwise, we throw an
IOError
Additional keyword arguments are passed along to the associated open or read operation.
- Parameters:
rfilename (
Union
[str
,BinaryIO
]) – Either a string or a binary file type. If a file, force_as must be specified, and the Kaldi types are unsupported.dtype (
Optional
[dtype
]) – If set, will cast the return type to itkey (
Any
) – The key used in'hdf5'
or'table'
decoding.force_as (
Optional
[str
]) – If notNone
, forces rfilename to be interpreted as a specific file type, bypassing the above selection strategy.'table'
: Kaldi table;'wav'
: wave file;'hdf5'
: HDF5 file;'npy'
: Numpy binary;'npz'
: Numpy archive;'pt'
: PyTorch binary;'sph'
: NIST sphere;'kaldi'
Kaldi object;'file'
read vianumpy.fromfile()
. The types inSOUNDFILE_SUPPORTED_FILE_TYPES
are also valid values. ‘soundfile’ will usesoundfile
to read the file regardless of the suffix.**kwargs –
- Returns:
signal (
numpy.ndarray
)
Warning
Post v 0.2.0, the behaviour after step 8 changed. Instead of trying to read first as Kaldi input, and, failing that, via
numpy.fromfile()
, we try to read as Kaldi input if the file name ends with'|'
and error otherwise. The catch-all behaviour was disabled due to the interaction withpydrobert.speech.config.SOUNDFILE_SUPPORTED_FILE_TYPES
whose value depends on the existence ofsoundfile
and the underlying version of libsndfile.Notes
Python code for reading SPHERE files (not via :mod:soundfile`) was based off of sph2pipe v 2.5. That code can only suppport the “shorten” audio format up to version 2.
- pydrobert.speech.util.wds_read_signal(key, data)[source]
Wrapper around read_signal for webdataset
This method is intended for Data Decoding in a WebDataset. It uses
read_signal()
to read a file and returns it as a Numpy array.Examples
>>> import webdataset as wds >>> url = 'pipe:curl -L -s https://dl.fbaipublicfiles.com/librilight/data/small.tar' >>> ds = ( ... wds.WebDataset(url) ... .decode(wds_read_signal) ... .to_tuple('json', 'flac', handler=wds.ignore_and_continue) ... ) >>> for info, signal in ds: ... # do something
Warning
Kaldi types are currently unsupported.
This decoder clobbers the default WebDataset decoder for “npy” and “pt” files.