720 lines
27 KiB
HTML
720 lines
27 KiB
HTML
<! $Id: nbest-scripts.1,v 1.52 2019/09/09 22:35:37 stolcke Exp $>
|
|
<HTML>
|
|
<HEADER>
|
|
<TITLE>nbest-scripts</TITLE>
|
|
<BODY>
|
|
<H1>nbest-scripts</H1>
|
|
<H2> NAME </H2>
|
|
nbest-scripts, combine-rover-controls, compare-sclite, compute-best-rover-mix, compute-sclite-nbest, compute-sclite, fix-ctm, merge-nbest, nbest-error, nbest-posteriors, nbest-rover, nbest-vocab, nbest-words, nbest2-to-nbest1, rescore-acoustic, rescore-decipher, rescore-reweight, rover-control-tying, rover-control-weights, search-rover-combo, sentid-to-sclite - rescore and evaluate N-best lists
|
|
<H2> SYNOPSIS </H2>
|
|
<PRE>
|
|
<B>rescore-decipher</B> [ <B>-bytelog</B> ] [ <B>-nodecipherlm</B> ] [ <B>-multiwords</B> ] \
|
|
[ <B>-multi-char</B> <I>C</I> ] [ <B>-pretty</B> <I>mapfile</I> ] \
|
|
[ <B>-ngram-tool</B> <I>program</I> ][ <B>-filter</B> <I>command</I> ] \
|
|
[ <B>-norescore</B> ] [ <B>-lm-only</B> ] [ <B>-count-oovs</B> ] [ <B>-limit-vocab</B> ] \
|
|
[ <B>-vocab-aliases</B> <I>mapfile</I> ] [ <B>-fast</B> ] \
|
|
<I>nbest-file-list</I> <I>score-dir</I> <B>-lm</B> ... <I>lm-options</I> ...
|
|
<B>rescore-acoustic</B> <I>old-nbest-dir</I>|<I>old-file-list</I> <I>old-ac-weight</I> \
|
|
<I>new-score-dir1</I> <I>new-ac-weight1</I> ... <I>new-nbest-dir</I> [ <I>max-nbest</I> ]
|
|
<B>rescore-reweight</B> [ <B>-multiwords</B> ] [ <B>-multi-char</B> <I>C</I> ] <I>score-dir</I>|<I>file-list</I> \
|
|
[ <I>lmw</I> [ <I>wtw</I> [ <I>score-dir1 score-weight1</I> ... ] [ <I>max-nbest</I> ]]]
|
|
<B>rescore-minimize-wer</B> <I>score-dir</I> [ <I>lmw</I> [ <I>wtw</I> [ <I>max-nbest</I> ]]]
|
|
<B>nbest2-to-nbest1</B> [ <I>nbest-file</I> ]
|
|
<B>nbest-rover</B> [ <I>sentid-list</I> | <B>-</B> ] <I>control-file</I> \
|
|
[ <I>posterior-file</I> [ <I>nbest-lattice-options</I> ] ]
|
|
<B>combine-rover-controls</B> [ <B>lambda=</B><I>weights</I> ] [ <B>postscale=</B><I>S</I> ] \
|
|
[ <B>keeppaths=1</B> ] <I>rover-control</I> [ ... ]
|
|
<B>rover-control-weights</B> [ <B>weights=</B>"<I>w1 ... wn</I>" ] <I>control-file</I>
|
|
<B>rover-control-tying</B> <I>control-file</I>
|
|
<B>nbest-optimize-args-from-rover-control</B> [ <B>print_weights=1</B> ] [ <B>print_dirs=1</B> ] <I>control-file</I>
|
|
<B>compute-best-rover-mix</B> [ <B>lambda=</B><I>weights</I> ] [ <B>addone=</B><I>c</I> ] [ <B>precision=</B><I>p</I> ] \
|
|
[ <B>tying=</B>"<I>b1 ... bn</I>" ] [ <B>write_weights=</B><I>file</I> ] <I>reference-posteriors-output</I>
|
|
<B>search-rover-combo</B> [ <B>-scorer</B> <I>script</I> ] [ <B>-datadir</B> <I>dir</I> ] \
|
|
[ <B>-weights</B> <I>weights</I> ] [ <B>-sentids</B> <I>list</I> ] \
|
|
[ <B>-refs</B> <I>refs</I> ] [ <B>-smooth-weight</B> <I>S</I> ] \
|
|
[ <B>-J</B> <I>n</I> ] <I>list-of-control-files</I>
|
|
<B>nbest-posteriors</B> [ <B>weight=</B><I>W</I> ] [ <B>lmw=</B><I>lmw</I> ] [ <B>wtw=</B><I>wtw</I> ] [ <B>postscale=</B><I>S</I> ] \
|
|
[ <B>max_nbest=</B><I>M</I> ] <I>nbest-file</I>
|
|
<B>merge-nbest</B> [ <B>multiwords=1</B> ] [ <B>multichar=</B><I>C</I> ] [ <B>nopauses=1</B> ] \
|
|
[ <B>max_nbest=</B><I>M</I> ] <I>nbest-file</I> ...
|
|
<B>nbest-vocab</B> [ <I>nbest-list</I> ... ]
|
|
<B>nbest-words</B> [ <I>nbest-list</I> ... ]
|
|
<B>nbest-oov-counts</B> <B>vocab=</B><I>vocabfile</I> [ <B>vocab_aliases=</B><I>aliasfile</I> ] <I>nbest-list</I>
|
|
<B>nbest-error</B> <I>score-dir</I>|<I>file-list</I> <I>refs</I> [ <I>nbest-lattice-option</I> ... ]
|
|
<B>sentid-to-sclite</B> <I>hyps</I>
|
|
<B>sentid-to-ctm</B> <I>hyps</I>
|
|
<B>fix-ctm</B> <I>ctmfile</I>
|
|
<B>compute-sclite</B> <B>-r</B> <I>refs</I> <B>-h</B> <I>hyps</I> [ <B>-h</B> <I>hyps</I> ... ] [ <B>-S</B> <I>subset</I> ... ] \
|
|
[ <B>-multiwords</B>|<B>-M</B> ] [ <B>-noperiods</B> ] [ <B>-R</B> ] [ <B>-g</B> <I>glmfile</I> ] [ <B>-H</B> ] \
|
|
[ <B>-v</B> ] [ <I>sclite-options</I> ...]
|
|
<B>compute-sclite-nbest</B> <I>file-list</I> <I>output-dir</I> -r <I>refs</I> [ <B>-filter</B> <I>script</I> ] [ <I>sclite-options</I> ...]
|
|
<B>compare-sclite</B> <B>-r</B> <I>refs</I> <B>-h1</B> <I>hyps1</I> <B>-h2</B> <I>hyps2</I> [ <B>-S</B> <I>subset</I> ] \
|
|
[ <I>compute-sclite-options</I> ... ]
|
|
</PRE>
|
|
<H2> DESCRIPTION </H2>
|
|
These scripts perform common tasks on N-best hypotheses in
|
|
<A HREF="nbest-format.5.html">nbest-format(5)</A>,
|
|
especially those needed for rescoring and extracting and evaluating
|
|
1-best hypotheses.
|
|
<P>
|
|
<B> rescore-decipher </B>
|
|
applies a language model implemented by
|
|
<A HREF="ngram.1.html">ngram(1)</A>
|
|
to the N-best lists listed in
|
|
<I>nbest-file-list</I>.<I></I><I></I><I></I>
|
|
The N-best files may be in compressed format.
|
|
The rescored N-best lists are stored in directory
|
|
<I>score-dir</I>.<I></I><I></I><I></I>
|
|
All following arguments are passed to
|
|
<A HREF="ngram.1.html">ngram(1)</A>
|
|
and are used to control the language model.
|
|
The following options are handled by
|
|
<B> rescore-decipher </B>
|
|
itself:
|
|
<DL>
|
|
<DT><B> -bytelog </B>
|
|
<DD>
|
|
causes scores to be output on the bytelog scale
|
|
(see
|
|
<A HREF="nbest-format.1.html">nbest-format(1)</A>).
|
|
<DT><B> -nodecipherlm </B>
|
|
<DD>
|
|
indicates that the recognizer language model is not being provided
|
|
(with
|
|
<B>-decipher-lm</B>).<B></B><B></B><B></B>
|
|
(This is only possible if the N-best lists are not in ``NBestList1.0'' format.)
|
|
<DT><B> -multiwords </B>
|
|
<DD>
|
|
specifies that N-best lists contain words joined by underscores, which are
|
|
to be split into their component prior to rescoring.
|
|
<DT><B>-multi-char</B><I> C</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
defines a multiword separator character.
|
|
The default is underscore ``_''.
|
|
<DT><B>-pretty</B><I> mapfile</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
specifies a word mapping file that allows individual words to be globally
|
|
replaced by strings of zero or more other words, e.g., to remove vocabulary
|
|
mismatches between the input N-best lists and the rescoring LM.
|
|
The
|
|
<I> mapfile </I>
|
|
contains one mapping per line, the first field specifying the word to be
|
|
replaced and subsequent fields forming the replacement string.
|
|
<DT><B>-ngram-tool</B><I> program</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
specifies a non-standard
|
|
<I> program </I>
|
|
to perform the actual LM evaluation
|
|
(by default,
|
|
<A HREF="ngram.1.html">ngram(1)</A>
|
|
is used).
|
|
Such a program must understand
|
|
<B>ngram</B>'s<B></B><B></B><B></B>
|
|
command-line options related to N-best rescoring.
|
|
<DT><B>-filter</B><I> command</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
specifies a
|
|
<I> command </I>
|
|
that is used to filter the N-best hypotheses prior to
|
|
evaluating the language model.
|
|
This may be used for more general textual rewriting so that non-standard
|
|
LMs can be applied.
|
|
The output N-best lists will contain the filtered hypotheses.
|
|
<DT><B> -norescore </B>
|
|
<DD>
|
|
causes N-best lists to be simply reformatted from one of the Decipher formats
|
|
into the SRILM N-best format, separating acoustic and LM scores, without
|
|
replacing the existing LM scores.
|
|
In this case only the
|
|
<A HREF="ngram.1.html">ngram(1)</A>
|
|
options
|
|
<B>-decipher-lmw</B><B></B><B></B><B></B>
|
|
and
|
|
<B>-decipher-wtw</B><B></B><B></B><B></B>
|
|
are relevant, and others are ignored.
|
|
<B> -norescore </B>
|
|
and
|
|
<B> -filter </B>
|
|
may be used together to perform textual rewriting of N-best lists.
|
|
<DT><B> -lm-only </B>
|
|
<DD>
|
|
dumps out LM scores only, instead of complete N-best lists.
|
|
<DT><B>-count-oovs</B><B></B><B></B><B></B>
|
|
<DD>
|
|
writes the count of out-of-vocabulary and zero-probability words to
|
|
the output score files (instead of rescored N-best lists).
|
|
<DT><B> -limit-vocab </B>
|
|
<DD>
|
|
saves memory by arranging for
|
|
<A HREF="ngram.1.html">ngram(1)</A>
|
|
to load only those N-gram parameters that are relevant to the vocabulary
|
|
of the N-best lists to be rescored.
|
|
After determining the N-best vocabulary the
|
|
<B> -limit-vocab </B>
|
|
option is passed to
|
|
<A HREF="ngram.1.html">ngram(1)</A>.
|
|
<DT><B>-vocab-aliases</B><I> map</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
declares that certain words are to be treated as alternative spellings
|
|
of the same word for LM evaluation; see the same option for
|
|
<A HREF="ngram.1.html">ngram(1)</A>.
|
|
The
|
|
<I> map </I>
|
|
is filtered of unused words when used in conjunction with
|
|
<B>-limit-vocab</B>,<B></B><B></B><B></B>
|
|
and then passed on to
|
|
<A HREF="ngram.1.html">ngram(1)</A>.
|
|
<DT><B> -fast </B>
|
|
<DD>
|
|
performs rescoring using only functions built into
|
|
<A HREF="ngram.1.html">ngram(1)</A>.
|
|
This avoids some computational and I/O overhead and therefore runs faster,
|
|
but the options
|
|
<B>-filter</B>,<B></B><B></B><B></B>
|
|
<B>-pretty</B>,<B></B><B></B><B></B>
|
|
and
|
|
<B> -lm-only </B>
|
|
are not supported, and
|
|
<B> -nodecipherlm </B>
|
|
is obligatory.
|
|
</DD>
|
|
</DL>
|
|
<P>
|
|
<B> rescore-acoustic </B>
|
|
replaces the acoustic scores in a set of N-best lists by a weighted
|
|
combination of new scores.
|
|
The old N-best lists are given by either a directory
|
|
<I> old-score-dir </I>
|
|
or a filelist
|
|
<I>old-file-list</I>;<I></I><I></I><I></I>
|
|
<I> old-ac-weight </I>
|
|
is the weight given to the old scores.
|
|
Directories containing the new scores are listed alternating with the
|
|
corresponding weights; each score directory must contain one
|
|
file per waveform segment, each having the same file basenames as
|
|
the original N-best lists.
|
|
The new scores should appear in a single column per file, one per line.
|
|
The N-best lists containing the new combined acoustic scores are written to
|
|
<I>new-nbest-dir</I>.<I></I><I></I><I></I>
|
|
The optional
|
|
<I> max-nbest </I>
|
|
argument can be used to limit the length of the N-best lists output.
|
|
Also, When a new score file is encountered containing fewer than
|
|
<I> max-nbest </I>
|
|
lines, the missing scores are set to the lowest score encountered so far.
|
|
<P>
|
|
<B> rescore-reweight </B>
|
|
combines the scores in N-best lists with a set of weights and outputs
|
|
the 1-best hypotheses.
|
|
The N-best files are found in directory
|
|
<I> score-dir </I>
|
|
or listed in
|
|
<I>file-list</I>.<I></I><I></I><I></I>
|
|
Optional arguments set the language model weight
|
|
<I> lmw </I>
|
|
(default 8),
|
|
the word transition weight
|
|
<I> wtw </I>
|
|
(default 0),
|
|
and the maximum number
|
|
<I> max-nbest </I>
|
|
of hypotheses to consider (default all).
|
|
Optionally, any number of additional score directories and associated
|
|
weights
|
|
<I> score-dir1 score-weight1 score-dir2 score-weight2 </I>
|
|
... can be specified, following the
|
|
<I> wtw </I>
|
|
parameter.
|
|
These additional scores are combined with those contained in the
|
|
N-best lists themselves as in
|
|
<B> rescore-acoustic </B>
|
|
(using unit weight for the original acoustic scores).
|
|
The
|
|
<B> -multiwords </B>
|
|
and
|
|
<B> -multi-char </B>
|
|
options have the same function as for
|
|
<B>rescore-decipher</B>.<B></B><B></B><B></B>
|
|
The output format for 1-best hypotheses is
|
|
<PRE>
|
|
<I>sentid</I> <I>w1</I> <I>w2</I> ...
|
|
</PRE>
|
|
where
|
|
<I> sentid </I>
|
|
is the sentence ID derived from the N-best filename, followed by
|
|
the words.
|
|
<P>
|
|
<B> rescore-minimize-wer </B>
|
|
is similar to
|
|
<B> rescore-reweight </B>
|
|
but picks hypotheses using the word error minimization algorithm
|
|
of
|
|
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>.
|
|
<P>
|
|
<B> nbest2-to-nbest1 </B>
|
|
converts an N-best list in ``NBestList2.0'' format to ``NBestlist1.0'',
|
|
for the benefit of programs that have not yet been updated to deal with
|
|
the new format.
|
|
<P>
|
|
<B> nbest-rover </B>
|
|
combines hypotheses from multiple N-best lists at the word level,
|
|
by performing the same kind of word error minimization as
|
|
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>,
|
|
in a generalization of the ROVER algorithm.
|
|
<I> sentid-list </I>
|
|
is a file listing sentence IDs.
|
|
These must match the filenames in a set of N-best directories,
|
|
which are specified in a
|
|
<I>control-file</I>.<I></I><I></I><I></I>
|
|
The format for the latter is
|
|
<PRE>
|
|
<I>dir1</I> <I>lmw1</I> <I>wtw1</I> <I>w1</I> [<I>n1</I> [<I>s1</I>]]
|
|
<I>dir2</I> <I>lmw2</I> <I>wtw2</I> <I>w2</I> [<I>n2</I> [<I>s2</I>]]
|
|
...
|
|
</PRE>
|
|
Each line specifies an N-best directory, the language model and word transition
|
|
weights to be used in score combination, and a weight to be applied to the
|
|
posterior probabilities.
|
|
A weight of "=" denotes a value equal to the previous system and is used to encode
|
|
weight tying.
|
|
An optional next-to-last parameter for each N-best list allows the lists to be
|
|
truncated to the top <I>n1</I>, <I>n2</I>, etc., hypotheses.
|
|
The final optional parameter sets the posterior distribution scaling factor,
|
|
which defaults to the language model weight.
|
|
Optionally,
|
|
<I> control-file </I>
|
|
can also contain lines of the form
|
|
</PRE>
|
|
<I>dir</I> <I>w</I> <B>+</B>
|
|
</PRE>
|
|
These indicate that additional score files can be found in directory
|
|
<I> dir </I>
|
|
and that the scores found therein should be added to the following
|
|
N-best list set with weight
|
|
<I>w</I>.<I></I><I></I><I></I>
|
|
Several lines of this form may occur preceding a regular N-best
|
|
directory specification; the corresponding additive combination of multiple
|
|
scores is performed.
|
|
<BR>
|
|
If ``-'' is specified for
|
|
<I>sentid-list</I>,<I></I><I></I><I></I>
|
|
the sentence IDs are inferred from
|
|
the contents of the first directory <I>dir1</I> specified in
|
|
<I>control-file</I>.<I></I><I></I><I></I>
|
|
If
|
|
<I> posterior-file </I>
|
|
is specified on the command line, posterior word probability estimates are
|
|
written to that file.
|
|
<P>
|
|
Additional arguments are treated as options, in particular
|
|
<DL>
|
|
<DT><B> -missing-nbest </B>
|
|
<DD>
|
|
indicates that empty hypotheses are to be used for N-best lists that are missing
|
|
from the directory specified in the control file.
|
|
</DD>
|
|
</DL>
|
|
<P>
|
|
Any other additional arguments are passed to the underlying
|
|
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>
|
|
invocation.
|
|
<BR>
|
|
<B> nbest-rover </B>
|
|
can process N-best lists in any of the formats described in
|
|
<A HREF="nbest-format.5.html">nbest-format(5)</A>,
|
|
<I>as long as all N-best lists for a given utterance are in the same format</I>.
|
|
When Decipher formats are used only their acoustic scores are used.
|
|
<P>
|
|
<B> combine-rover-controls </B>
|
|
takes one or more
|
|
<B> nbest-rover </B>
|
|
control files as arguments and outputs a new control file that specifies
|
|
the combination of the input files.
|
|
Directory names in the input files are adjusted to reflect the relative
|
|
location of the input files,
|
|
unless the
|
|
<B> keeppaths=1 </B>
|
|
option is used.
|
|
Each input system is given equal weight,
|
|
unless the optional
|
|
<B>lambda=</B><I>weights</I><B></B><I></I><B></B><I></I><B></B>
|
|
argument is used to specify a space-separated list of system weights
|
|
(spaces in the weight vector need to be quoted on the command line).
|
|
The
|
|
<B>postscale=</B><I>S</I><B></B><I></I><B></B><I></I><B></B>
|
|
argument overrides the posterior scaling factor in all input systems with the value
|
|
<I>S</I>.<I></I><I></I><I></I>
|
|
<P>
|
|
<B> rover-control-weights </B>
|
|
either retrieves or changes the weights in an nbest-rover control file.
|
|
If the
|
|
<B> weights= </B>
|
|
argument is specified, the weights in the input control file are altered to the
|
|
values in the argument and a new control file is written to stdout.
|
|
Otherwise, the list of current weights is output as a single line.
|
|
<P>
|
|
<B> compute-best-rover-mix </B>
|
|
estimates the best weighting of a set of nbest system outputs for
|
|
combination with
|
|
<B>nbest-rover</B>.<B></B><B></B><B></B>
|
|
The required input file
|
|
<I> reference-posteriors-output </I>
|
|
is produced by running
|
|
<B> nbest-rover </B>
|
|
to record the posteriors of the reference word strings on a tuning set:
|
|
<BR>
|
|
<B>nbest-rover -</B> <I>control-file</I> /dev/null \
|
|
<BR>
|
|
<B>-refs</B> <I>references</I> \
|
|
<BR>
|
|
\fB-write-ref-posteriors\f <I>reference-posteriors-output</I>
|
|
<BR>
|
|
Initial weights are specified with
|
|
<B>lambda=</B><I>weights.</I><B></B><I></I><B></B><I></I><B></B>
|
|
<BR>
|
|
An additive constant for Laplace smoothing can be specified with
|
|
<B>addone=</B><I>c.</I><B></B><I></I><B></B><I></I><B></B>
|
|
<BR>
|
|
The
|
|
<B> tying= </B>
|
|
argument allows the system weights to be tied.
|
|
It should specify a string of positive integers (the bin numbers) with one value
|
|
for each system weight.
|
|
For example
|
|
<B> tying='1 1 2 3 3' </B>
|
|
means that the first two and the last two of five weights are to be tied (put in the same bin).
|
|
<BR>
|
|
The estimated weight vector can optionally be written to a file using
|
|
<B>write_weights=</B><I>file.</I><B></B><I></I><B></B><I></I><B></B>
|
|
The weights can then be inserted into the original
|
|
<I>control-file</I>,<I></I><I></I><I></I>
|
|
e.g., using
|
|
<B>rover-control-weights</B>.<B></B><B></B><B></B>
|
|
<P>
|
|
<B> rover-control-tying </B>
|
|
extracts the value for the
|
|
<B> compute-best-rover-mix </B>
|
|
<B> tying= </B>
|
|
argument from an existing nbest-rover control file.
|
|
<P>
|
|
<B> nbest-optimize-args-from-rover-control </B>
|
|
extracts information from existing nbest-rover control files that can be passed as
|
|
arguments to
|
|
<A HREF="nbest-optimize.1.html">nbest-optimize(1)</A>
|
|
for initializing the search.
|
|
Options allow printing only the score weights,
|
|
or only the list of additional scores directories.
|
|
<P>
|
|
<B> search-rover-combo </B>
|
|
searches for a good subset of systems to combine via
|
|
<B>nbest-rover</B>.<B></B><B></B><B></B>
|
|
It performs a greedy search starting with the system that gives the lowest individual error,
|
|
and then adds one system at a time until no further error reduction is possible.
|
|
The required argument <I>list-of-control-files</I> is a file listing the nbest-rover control files
|
|
representating the individual systems to be combined.
|
|
An nbest-rover control file is written to stdout representing the combined system.
|
|
Options are:
|
|
<DL>
|
|
<DT><B>-scorer</B><I> script</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
Specifies a script that evaluates a hypothesis file.
|
|
The script must take a single argument that is a hypothesis file in sentid format and output a single number
|
|
(the error rate) to stdout.
|
|
For example, the script could be based on parsing
|
|
<B> compute-sclite </B>
|
|
output, but must know where to find the reference file etc.
|
|
<DT><B>-weights</B><I> list-of-weights</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
Specifies the list of system weights to try when adding a system.
|
|
By default this is just 1, but can be a space-separated list of weights, such as "1 0.5 0.2 0.1".
|
|
<DT><B>-sentids</B><I> list</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
A list of sentence IDs to perform the evaluation on (as in the first argument to
|
|
<B>nbest-rover</B>),<B></B><B></B><B></B>
|
|
<DT><B>-datadir</B><I> dir</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
Where to place auxiliary data files.
|
|
By default this is
|
|
<B> SEARCH-DATA </B>
|
|
in the current directory.
|
|
<DT><B>-refs</B><I> refs</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
Triggers system weight optimization using
|
|
<B> compute-best-rover-mix </B>
|
|
for each system combination before evaluating
|
|
its error rate.
|
|
The file
|
|
<I> refs </I>
|
|
should point to a reference file in sentid format.
|
|
Note that these references are not used to evaluate the error rate of a system
|
|
(which is done within the scorer script, see above) but only to be
|
|
passed to
|
|
<B>compute-best-rover-mix</B>.<B></B><B></B><B></B>
|
|
This option overrides the
|
|
<B> -weights </B>
|
|
option since system weights are estimated.
|
|
<DT><B>-smooth-weight</B><I> S</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
Enables hierarchical weight smoothing.
|
|
Each weight estimate is interpolated with the previous estimate (with one fewer systems);
|
|
the previous weight vector gets weight
|
|
<I>S</I>.<I></I><I></I><I></I>
|
|
<DT><B>-J</B><I> n</I><B></B><I></I><B></B><I></I><B></B>
|
|
<DD>
|
|
Parallelize the evalation of system combinations with up to
|
|
<I> n </I>
|
|
parallel jobs.
|
|
This uses the included parallelization script
|
|
<B>rexport.gnumake</B>,<B></B><B></B><B></B>
|
|
but the environment variable
|
|
<B> REXPORT </B>
|
|
may be set to a command that takes a list of command lines as argument and executes them in an appropriate manner.
|
|
</DD>
|
|
</DL>
|
|
<P>
|
|
<B> nbest-posteriors </B>
|
|
rescales the scores in an N-best list to reflect (weighted) posterior
|
|
probabilities.
|
|
The output is the same N-best list with acoustic scores set to
|
|
the log (base 10) of the posterior hyp probabilities and LM scores set to zero.
|
|
<B>postscale=</B><I>S</I><B></B><I></I><B></B><I></I><B></B>
|
|
attenuates the posterior distribution by dividing combined log
|
|
scores by
|
|
<I> S </I>
|
|
(the default is
|
|
<I>S</I>=<I>lmw</I>).<I></I><I></I>
|
|
If
|
|
<B>weight=</B><I>W</I><B></B><I></I><B></B><I></I><B></B>
|
|
is specified the posteriors are multiplied by
|
|
<I>W</I>.<I></I><I></I><I></I>
|
|
<B>max_nbest=</B><I>M</I><B></B><I></I><B></B><I></I><B></B>
|
|
limits the number of hypotheses used to the top
|
|
<I>M</I>.<I></I><I></I><I></I>
|
|
This script is used mostly as a helper in
|
|
<B>nbest-rover</B>.<B></B><B></B><B></B>
|
|
<P>
|
|
<B> merge-nbest </B>
|
|
merges hypotheses from one or more N-best lists into a single list,
|
|
collapsing hypotheses that occur in more than one input list.
|
|
If all input lists use the same
|
|
<A HREF="nbest-format.5.html">nbest-format(5)</A>
|
|
then the output will also be in that format and contain the information
|
|
from the first list in which a hypothesis was encountered.
|
|
Otherwise, the output will be in SRI Decipher(TM) NBestList1.0 format
|
|
and contain acoustic scores and word strings only.
|
|
The
|
|
<B>max_nbest=</B><I>M</I><B></B><I></I><B></B><I></I><B></B>
|
|
option limits input to the first
|
|
<I> M </I>
|
|
hypotheses from each input list.
|
|
<B> multiwords=1 </B>
|
|
merges hypotheses that are identical after resolving multiwords, with
|
|
<B>multichar=</B><I>C</I><B></B><I></I><B></B><I></I><B></B>
|
|
defining a non-default multiword separator character.
|
|
<B> nopauses=1 </B>
|
|
merges hypotheses that are identical after removal of pause words.
|
|
<P>
|
|
<B> nbest-vocab </B>
|
|
outputs the vocabulary used in a set of N-best lists.
|
|
(The N-best files cannot be compressed, but may be concatenated and
|
|
supplied via stdin.)
|
|
<P>
|
|
<B> nbest-words </B>
|
|
strips any score and alignment information from N-best lists and outputs
|
|
only the words, one hypothesis per line.
|
|
<P>
|
|
<B> nbest-oov-counts </B>
|
|
computes the number of out-of-vocabulary words for each hypothesis in an N-best list,
|
|
relative to a vocabulary listed in
|
|
<I>vocabfile</I>.<I></I><I></I><I></I>
|
|
Optionally a vocabulary mapping from
|
|
<I> aliasfile </I>
|
|
is applied (as with the
|
|
<A HREF="ngram.1.html">ngram(1)</A>
|
|
<B> -vocab-aliases </B>
|
|
option).
|
|
The OOV counts are output on stdout and can be used as a score file for
|
|
N-best rescoring.
|
|
<P>
|
|
<B> nbest-error </B>
|
|
computes the overall oracle word error rate of a set of N-best lists
|
|
in directory
|
|
<I> score-dir </I>
|
|
or listed in
|
|
<I>file-list</I>.<I></I><I></I><I></I>
|
|
The reference answers are given in
|
|
<I> refs </I>
|
|
in the format output by
|
|
<B> rescore-reweight </B>
|
|
(see above).
|
|
Additional arguments are passed to the underlying invocation of
|
|
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>,
|
|
and can be used to limit the depth of the N-best list,
|
|
compute lattice error rather than N-best error, etc.
|
|
<P>
|
|
<B> sentid-to-sclite </B>
|
|
converts 1-best hypotheses and references in the format used here to
|
|
the ``trn'' format expected by the NIST
|
|
<A HREF="sclite.1.html">sclite(1)</A>
|
|
scoring software.
|
|
<P>
|
|
<B> sentid-to-ctm </B>
|
|
converts 1-best hypotheses and references in the format used here to NIST
|
|
<A HREF="ctm.5.html">ctm(5)</A>
|
|
format.
|
|
The script relies on an encoding of conversation IDs, channel, and utterance
|
|
time marks in the sentence IDs and may need adjustment to local conventions.
|
|
<P>
|
|
<B> fix-ctm </B>
|
|
converts output produced by the
|
|
<B> -output-ctm </B>
|
|
option of
|
|
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>
|
|
and
|
|
<A HREF="lattice-tool.1.html">lattice-tool(1)</A>
|
|
to a format suitable for scoring with NIST
|
|
<A HREF="sclite.1.html">sclite(1)</A>.
|
|
It, too, relies on information encoded in the sentids IDs and may need
|
|
adjustments.
|
|
<P>
|
|
<B> compute-sclite </B>
|
|
is a wrapper around
|
|
the NIST
|
|
<A HREF="sclite.1.html">sclite(1)</A>
|
|
scoring tool.
|
|
<I> refs </I>
|
|
and
|
|
<I> hyps </I>
|
|
are the reference and hypothesized transcripts, respectively.
|
|
The
|
|
<I> refs </I>
|
|
file can be either in "sentid" format or in
|
|
<A HREF="stm.5.html">stm(5)</A>
|
|
format. In the latter case,
|
|
<I> hyps </I>
|
|
will be converted to
|
|
<A HREF="ctm.5.html">ctm(5)</A>
|
|
format using the
|
|
<B> sentid-to-ctm </B>
|
|
helper script.
|
|
The
|
|
<I> hyps </I>
|
|
file can be either in "sentid" format or in
|
|
<A HREF="ctm.5.html">ctm(5)</A>
|
|
format.
|
|
More than one
|
|
<B> -h </B>
|
|
option can be given to combine the contents of multiple hypotheses files.
|
|
Optionally,
|
|
<B> -S </B>
|
|
specifies a
|
|
sorted list of sentence IDs
|
|
<I> subset </I>
|
|
to score.
|
|
Multiple
|
|
<B> -S </B>
|
|
options may be given, to form the intersection of several subsets.
|
|
<B> -multiwords </B>
|
|
or
|
|
<B> -M </B>
|
|
splits ``multiwords'' joined by underscores into their component words
|
|
prior to scoring.
|
|
<B> -noperiods </B>
|
|
deletes periods from the hypotheses prior to scoring
|
|
(typically used to bridge different conventions for spelled letters).
|
|
<B> -R </B>
|
|
preserves reject words in the hypotheses for scoring (as appropriate if
|
|
references also contain rejects).
|
|
<B> -g </B>
|
|
<I> glmfile </I>
|
|
enables filtering of references and hypotheses by the NIST
|
|
<B> csrfilt.sh </B>
|
|
script, controlled by the filter file
|
|
<I> glmfile </I>
|
|
(this is only possible with an stm reference file).
|
|
In that case, the
|
|
<B> -H </B>
|
|
option causes hesitations (as defined by the filter)
|
|
to be deleted from the output for scoring purposes.
|
|
<B> -v </B>
|
|
displays the complete command used to invoke
|
|
<B>sclite</B>.<B></B><B></B><B></B>
|
|
Any additional options are passed to
|
|
<B>sclite</B>,<B></B><B></B><B></B>
|
|
e.g., to control its output actions or alignment mode.
|
|
<P>
|
|
<B> compute-sclite-nbest </B>
|
|
runs
|
|
<B> compute-sclite </B>
|
|
on a set of N-best lists specified by
|
|
<I> file-list </I>
|
|
and deposits the error counts in a directory
|
|
<I>output-dir</I>.<I></I><I></I><I></I>
|
|
These error counts may be used with the
|
|
<A HREF="nbest-optimize.1.html">nbest-optimize(1)</A>
|
|
<B> -errors </B>
|
|
option to specify the hypothesis-level errors explicitly.
|
|
The references must be given in a file
|
|
<I> refs </I>
|
|
one per line, with the first word in each line matching
|
|
the file basename of the corresponding N-best list.
|
|
Additional options to be passed to
|
|
<B> compute-sclite </B>
|
|
(and ultimately to
|
|
<A HREF="sclite.1.html">sclite(1)</A>)
|
|
may be specified at the end of the command line.
|
|
The
|
|
<B> -filter </B>
|
|
option specifies a filtering
|
|
<I> script </I>
|
|
that edits the hypotheses before error computation.
|
|
<P>
|
|
<B> compare-sclite </B>
|
|
scores two sets of hypotheses
|
|
<I> hyps1 </I>
|
|
and
|
|
<I> hyps2 </I>
|
|
for the same test set and computes in
|
|
how many cases the first or second set had lower word error.
|
|
The remaining options are as for
|
|
<B>compute-sclite</B>.<B></B><B></B><B></B>
|
|
The script ignores hypotheses for sentence that do not appear in both
|
|
hypothesis files, to ensure comparable scoring results.
|
|
<H2> SEE ALSO </H2>
|
|
<A HREF="nbest-format.5.html">nbest-format(5)</A>, <A HREF="ngram.1.html">ngram(1)</A>, <A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>, <A HREF="nbest-optimize.1.html">nbest-optimize(1)</A>, <A HREF="sclite.1.html">sclite(1)</A>,
|
|
<A HREF="stm.5.html">stm(5)</A>, <A HREF="ctm.5.html">ctm(5)</A>.
|
|
<BR>
|
|
J.G. Fiscus, A Post-Processing System to Yield Reduced Word Error Rates:
|
|
Recognizer Output Voting Error Reduction (ROVER),
|
|
<I>Proc. IEEE Automatic Speech Recognition and Understanding Workshop</I>,
|
|
Santa Barbara, CA, 347-352, 1997.
|
|
<BR>
|
|
A. Stolcke et al., "The SRI March 2000 Hub-5 Conversational Speech
|
|
Transcription System",
|
|
<I>Proc. NIST Speech Transcription Workshop</I>, College Park, MD, 2000.
|
|
<H2> BUGS </H2>
|
|
<B> sentid-to-sclite </B>
|
|
has some assumptions about the structure of sentence IDs built-in and
|
|
may need to be modified for
|
|
<B> compute-sclite </B>
|
|
and
|
|
<B> compare-sclite </B>
|
|
to work.
|
|
<P>
|
|
<B> rescore-decipher </B>
|
|
<B> -pretty </B>
|
|
may not work correctly with the
|
|
<B> -limit-vocab </B>
|
|
option if the word mapping adds to the vocabulary subset used in the N-best
|
|
lists.
|
|
<H2> AUTHOR </H2>
|
|
Andreas Stolcke <stolcke@icsi.berkeley.edu>
|
|
<BR>
|
|
Copyright (c) 1995-2006 SRI International
|
|
<BR>
|
|
Copyright (c) 2011-2019 Andreas Stolcke
|
|
<BR>
|
|
Copyright (c) 2011-2018 Microsoft Corp.
|
|
</BODY>
|
|
</HTML>
|