usage: signals-to-torch-feat-dir [-h] [--channel CHANNEL] [--preprocess PREPROCESS] [--postprocess POSTPROCESS] [--force-as {aiff,npz,hdf5,table,kaldi,soundfile,pt,ogg,npy,file,flac,sph,wav}] [--seed SEED] [--file-prefix FILE_PREFIX] [--file-suffix FILE_SUFFIX] [--num-workers NUM_WORKERS]
[--manifest MANIFEST]
map [computer_config] dir
Convert a map of signals to a torch SpectDataSet
This command serves to process audio signals and convert them into a format that can
be leveraged by "SpectDataSet" in "pydrobert-pytorch"
(https://github.com/sdrobert/pydrobert-pytorch). It reads in a text file of
format
<utt_id_1> <path_to_signal_1>
<utt_id_2> <path_to_signal_2>
...
computes features according to passed-in settings, and stores them in the
target directory as
dir/
<file_prefix><utt_id_1><file_suffix>
<file_prefix><utt_id_2><file_suffix>
...
Each signal is read using the utility "pydrobert.speech.util.read_signal()", which
is a bit slow, but very robust to different file types (such as wave files, hdf5,
numpy binaries, or Pytorch binaries). A signal is expected to have shape (C, S),
where C is some number of channels and S is some number of samples. The
signal can have shape (S,) if the flag "--channels" is set to "-1".
Features are output as "torch.FloatTensor" of shape "(T, F)", where "T" is some
number of frames and "F" is some number of filters.
No checks are performed to ensure that read signals match the feature computer's
sampling rate (this info may not even exist for some sources).
positional arguments:
map Path to the file containing (<utterance>, <path>) pairs
computer_config JSON file or string to configure a pydrobert.speech.compute.FrameComputer object to calculate features with. If unspecified, the audio (with channels removed) will be stored directly with shape (S, 1), where S is the number of samples
dir Directory to output features to. If the directory does not exist, it will be created
options:
-h, --help show this help message and exit
--channel CHANNEL Channel to draw audio from. Default is to assume mono
--preprocess PREPROCESS
JSON list of configurations for 'pydrobert.speech.pre.PreProcessor' objects. Audio will be preprocessed in the same order as the list
--postprocess POSTPROCESS
JSON list of configurations for 'pydrobert.speech.post.PostProcessor' objects. Features will be postprocessed in the same order as the list
--force-as {aiff,npz,hdf5,table,kaldi,soundfile,pt,ogg,npy,file,flac,sph,wav}
Force the paths in 'map' to be interpreted as a specific type of data. table: kaldi table (key is utterance id); wav: wave file; hdf5: HDF5 archive (key is utterance id); npy: Numpy binary; npz: numpy archive (key is utterance id); pt: PyTorch binary; sph: NIST
SPHERE file; kaldi: kaldi object; file: numpy.fromfile binary. soundfile: force soundfile processing.
--seed SEED A random seed used for determinism. This affects operations like dithering. If unset, a seed will be generated at the moment
--file-prefix FILE_PREFIX
The file prefix indicating a torch data file
--file-suffix FILE_SUFFIX
The file suffix indicating a torch data file
--num-workers NUM_WORKERS
The number of workers simultaneously computing features. Should not affect determinism when used in tandem with --seed. '0' means all work is done on the main thread
--manifest MANIFEST If specified, a list of utterances which have already been computed will be stored in this file. Utterances already listed in the file will be not be computed. Useful for resuming computations after an unexpected termination
New in version 0.4.0: if ruamel.yaml is installed
(https://yaml.readthedocs.io/en/latest/), JSON arguments will be parsed as YAML 1.2
by default. As JSON is valid YAML 1.2, you can continue to use JSON for configurations.