competition update
This commit is contained in:
64
language_model/srilm-1.7.3/man/cat5/nbest-format.5
Normal file
64
language_model/srilm-1.7.3/man/cat5/nbest-format.5
Normal file
@@ -0,0 +1,64 @@
|
||||
nbest-format(5) nbest-format(5)
|
||||
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
nbest-format - File formats for N-best hypotheses lists
|
||||
|
||||
DDEESSCCRRIIPPTTIIOONN
|
||||
SRILM currently understands three different formats for lists of N-best
|
||||
hypotheses for rescoring or 1-best hypothesis extraction. The first
|
||||
two formats originated in the SRI Decipher(TM) recognition system, the
|
||||
third format is particular to SRILM.
|
||||
|
||||
The first format consists of the header
|
||||
NBestList1.0
|
||||
followed by one or more lines of the form
|
||||
(_s_c_o_r_e) _w_1 _w_2 _w_3 ...
|
||||
where _s_c_o_r_e is a composite acoustic/language model score from the rec-
|
||||
ognizer, on the bytelog scale. (A bytelog is a logarithm to base
|
||||
1.0001, divided by 1024 and rounded to an integer.) This format is
|
||||
output by the SRI Decipher(TM) recognizer, by the nnggrraamm --nnbbeesstt, and by
|
||||
nnbbeesstt--llaattttiiccee --wwrriittee--nnbbeesstt --ddeecciipphheerr--nnbbeesstt.
|
||||
|
||||
The second Decipher(TM) format is an extension of the first format that
|
||||
encodes word-level scores and time alignments. It is marked by a
|
||||
header of the form
|
||||
NBestList2.0
|
||||
The hypotheses are in the format
|
||||
(_s_c_o_r_e) _w_1 ( st: _s_t_1 et: _e_t_1 g: _g_1 a: _a_1 ) _w_2 ...
|
||||
where words are followed by start and end times, language model and
|
||||
acoustic scores (bytelog-scaled), respectively. This format may also
|
||||
contain scores and time marks for sub-word units (phones and HMM
|
||||
states), in the same format as above, but with the _w's denoting phone
|
||||
and state names. Sub-word units will have time marks that are con-
|
||||
tained in the duration of the preceding word units, and may thus be
|
||||
easily identified.
|
||||
|
||||
The third format understood by SRILM lists hypotheses in the format
|
||||
_a_s_c_o_r_e _l_s_c_o_r_e _n_w_o_r_d_s _w_1 _w_2 _w_3 ...
|
||||
where the first three columns contain the acoustic model log probabil-
|
||||
ity, the language model log probability, and the number of words in the
|
||||
hypothesis string, respectively. All scores are logarithms base 10.
|
||||
(This format must not be preceded by an ``NBestList'' header.) This
|
||||
format is output by the nnggrraamm --rreessccoorree and by nnbbeesstt--llaattttiiccee --wwrriittee--
|
||||
nnbbeesstt without the --ddeecciipphheerr--nnbbeesstt option.
|
||||
|
||||
SSEEEE AALLSSOO
|
||||
ngram(1), nbest-lattice(1), segment-nbest(1), nbest-scripts(1), pfsg-
|
||||
scripts(1).
|
||||
|
||||
BBUUGGSS
|
||||
All these formats are somewhat ad hoc and could use a more rational
|
||||
design. The ``NBestList1.0'' format is particularly cumbersome because
|
||||
it conflates acoustic and language model scores.
|
||||
A generalization to an arbitrary number of separate scores would be
|
||||
nice.
|
||||
|
||||
AAUUTTHHOORR
|
||||
Manual page written by Andreas Stolcke <stolcke@speech.sri.com>.
|
||||
Copyright 1999-2001 SRI International
|
||||
|
||||
|
||||
|
||||
SRILM File Formats $Date: 2007/12/19 22:08:05 $ nbest-format(5)
|
||||
Reference in New Issue
Block a user