Command-Line Interface
compute-feats-from-kaldi-tables
compute-feats-from-kaldi-tables -h
usage: compute-feats-from-kaldi-tables [-h] [-v VERBOSE] [--config CONFIG]
[--print-args PRINT_ARGS]
[--min-duration MIN_DURATION]
[--channel CHANNEL]
[--preprocess PREPROCESS]
[--postprocess POSTPROCESS]
[--seed SEED]
wav_rspecifier feats_wspecifier
computer_config
Store features from a kaldi archive in a kaldi archive
This command is intended to replace Kaldi's (https://kaldi-asr.org/) series of
"compute-<something>-feats" scripts in a Kaldi pipeline.
positional arguments:
wav_rspecifier Input wave table rspecifier
feats_wspecifier Output feature table wspecifier
computer_config JSON file or string to configure a
'pydrobert.speech.compute.FrameComputer' object to
calculate features with
optional arguments:
-h, --help show this help message and exit
-v VERBOSE, --verbose VERBOSE
Verbose level (higher->more logging)
--config CONFIG
--print-args PRINT_ARGS
--min-duration MIN_DURATION
Min duration of segments to process (in seconds)
--channel CHANNEL Channel to draw audio from. Default is to assume mono
--preprocess PREPROCESS
JSON list of configurations for
'pydrobert.speech.pre.PreProcessor' objects. Audio
will be preprocessed in the same order as the list
--postprocess POSTPROCESS
JSON List of configurations for
'pydrobert.speech.post.PostProcessor' objects.
Features will be postprocessed in the same order as
the list
--seed SEED A random seed used for determinism. This affects
operations like dithering. If unset, a seed will be
generated at the moment
signals-to-torch-feat-dir
usage: signals-to-torch-feat-dir [-h] [--channel CHANNEL]
[--preprocess PREPROCESS]
[--postprocess POSTPROCESS]
[--force-as {npz,wav,table,hdf5,pt,soundfile,aiff,ogg,sph,flac,file,npy,kaldi}]
[--seed SEED] [--file-prefix FILE_PREFIX]
[--file-suffix FILE_SUFFIX]
[--num-workers NUM_WORKERS]
map [computer_config] dir
Convert a map of signals to a torch SpectDataSet
This command serves to process audio signals and convert them into a format that can
be leveraged by "SpectDataSet" in "pydrobert-pytorch"
(https://github.com/sdrobert/pydrobert-pytorch). It reads in a text file of
format
<utt_id_1> <path_to_signal_1>
<utt_id_2> <path_to_signal_2>
...
computes features according to passed-in settings, and stores them in the
target directory as
dir/
<file_prefix><utt_id_1><file_suffix>
<file_prefix><utt_id_2><file_suffix>
...
Each signal is read using the utility "pydrobert.speech.util.read_signal()", which
is a bit slow, but very robust to different file types (such as wave files, hdf5,
numpy binaries, or Pytorch binaries). A signal is expected to have shape (C, S),
where C is some number of channels and S is some number of samples. The
signal can have shape (S,) if the flag "--channels" is set to "-1".
Features are output as "torch.FloatTensor" of shape "(T, F)", where "T" is some
number of frames and "F" is some number of filters.
No checks are performed to ensure that read signals match the feature computer's
sampling rate (this info may not even exist for some sources).
positional arguments:
map Path to the file containing (<utterance>, <path>)
pairs
computer_config JSON file or string to configure a
pydrobert.speech.compute.FrameComputer object to
calculate features with. If unspecified, the audio
(with channels removed) will be stored directly with
shape (S, 1), where S is the number of samples
dir Directory to output features to. If the directory does
not exist, it will be created
optional arguments:
-h, --help show this help message and exit
--channel CHANNEL Channel to draw audio from. Default is to assume mono
--preprocess PREPROCESS
JSON list of configurations for
'pydrobert.speech.pre.PreProcessor' objects. Audio
will be preprocessed in the same order as the list
--postprocess POSTPROCESS
JSON List of configurations for
'pydrobert.speech.post.PostProcessor' objects.
Features will be postprocessed in the same order as
the list
--force-as {npz,wav,table,hdf5,pt,soundfile,aiff,ogg,sph,flac,file,npy,kaldi}
Force the paths in 'map' to be interpreted as a
specific type of data. table: kaldi table (key is
utterance id); wav: wave file; hdf5: HDF5 archive (key
is utterance id); npy: Numpy binary; npz: numpy
archive (key is utterance id); pt: PyTorch binary;
sph: NIST SPHERE file; kaldi: kaldi object; file:
numpy.fromfile binary. soundfile: force soundfile
processing.
--seed SEED A random seed used for determinism. This affects
operations like dithering. If unset, a seed will be
generated at the moment
--file-prefix FILE_PREFIX
The file prefix indicating a torch data file
--file-suffix FILE_SUFFIX
The file suffix indicating a torch data file
--num-workers NUM_WORKERS
The number of workers simultaneously computing
features. Should not affect determinism when used in
tandem with --seed. '0' means all work is done on the
main thread