181 lines
		
	
	
		
			5.5 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			181 lines
		
	
	
		
			5.5 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| .\" $Id: wlat-format.5,v 1.10 2019/02/06 09:53:12 stolcke Exp $
 | |
| .TH wlat-format 5 "$Date: 2019/02/06 09:53:12 $" "SRILM File Formats"
 | |
| .SH NAME
 | |
| wlat-format \- File format for SRILM word posterior lattices
 | |
| .SH SYNOPSIS
 | |
| Word lattices:
 | |
| .nf
 | |
| \fBversion 2\fP
 | |
| \fBname\fP \fIs\fP
 | |
| \fBinitial\fP \fIi\fP
 | |
| \fBfinal\fP \fIf\fP
 | |
| \fBnode\fP \fIn\fP \fIw\fP \fIa\fP \fIp\fP \fIn1\fP \fIp1\fP \fIn2\fP \fIp2\fP ...
 | |
| \&...
 | |
| .fi
 | |
| .PP
 | |
| Word meshes (confusion networks):
 | |
| .nf
 | |
| \fBname\fP \fIs\fP
 | |
| \fBnumaligns\fP \fIN\fP
 | |
| \fBposterior\fP \fIP\fP
 | |
| \fBalign\fP \fIa\fP \fIw1\fP \fIp1\fP \fIw2\fP \fIp2\fP ...
 | |
| \fBreference\fP \fIa\fP \fIw\fP
 | |
| \fBhyps\fP \fIa\fP \fIw\fP \fIh1\fP \fIh2\fP ...
 | |
| \fBinfo\fP \fIa\fP \fIw\fP \fIstart\fP \fIdur\fP \fIascore\fP \fIgscore\fP \fIphones\fP \fIphonedurs\fP
 | |
| \fBtime\fP \fIa\fP \fIt\fP
 | |
| \&...
 | |
| .fi
 | |
| .SH DESCRIPTION
 | |
| Word posterior lattices and meshes are lattices generated by aligning 
 | |
| N-best hypotheses with
 | |
| .BR nbest-lattice (1),
 | |
| or by aligning PFSG or HTK lattices with
 | |
| .BR lattice-tool (1).
 | |
| They compactly encode possible word hypotheses sequences and their
 | |
| posterior probabilities.
 | |
| (Word meshes have become generally known as ``confusion networks'' or
 | |
| ``sausages.'')
 | |
| .PP
 | |
| A word lattice is a partially ordered directed graph with nodes representing
 | |
| word hypotheses.
 | |
| Nodes are identified by non-negative integers.
 | |
| The file format specifies the initial node
 | |
| .IR i ,
 | |
| the final node
 | |
| .IR f ,
 | |
| and any number of additional nodes 
 | |
| .IR n .
 | |
| For each node
 | |
| .I n
 | |
| the following associated information is given on the same line:
 | |
| the word identity 
 | |
| .I w
 | |
| (the string ``NULL'' is used with initial and final nodes),
 | |
| the alignment position 
 | |
| .I a 
 | |
| (identical values in this field identify hypotheses that occur at the
 | |
| same position),
 | |
| and the word posterior probability
 | |
| .IR p .
 | |
| Following these values, zero or more transitions to successor nodes
 | |
| are specified, each given by the node index
 | |
| .I ni
 | |
| and the transition posterior probability
 | |
| .IR pi .
 | |
| In a properly normalized word lattice the transition posteriors
 | |
| .I pi
 | |
| sum up to the node posterior
 | |
| .IR p .
 | |
| .PP
 | |
| Word meshes represent a more constrained lattice format in which
 | |
| word hypotheses are in a total order.
 | |
| A mesh contains a number of alignment positions, and a set of 
 | |
| mutually exclusive word hypotheses in each position (the ``confusion sets'').
 | |
| The word mesh represents all sentence hypotheses that can be 
 | |
| generated by freely combining word hypotheses at each position.
 | |
| The file format specifies the number of alignment positions
 | |
| .IR A 
 | |
| and the total posterior probability mass 
 | |
| .I P
 | |
| contained in the lattice,
 | |
| followed by one or more confusion set specifications.
 | |
| For each alignment position 
 | |
| .IR a ,
 | |
| the hypothesized words
 | |
| .I wi
 | |
| and their posterior probabilities
 | |
| .I pi
 | |
| are listed in alternation.
 | |
| The pseudo-word string
 | |
| .B *DELETE*
 | |
| represents an empty hypothesis.
 | |
| .PP
 | |
| Optionally, the word mesh format encodes additional information about
 | |
| the hypothesis alignment from which it resulted.
 | |
| The keyword
 | |
| .B reference 
 | |
| specifies the correct word
 | |
| .I w
 | |
| that was aligned at position
 | |
| .IR a .
 | |
| The keyword
 | |
| .B hyps
 | |
| is used to list the sentence hypotheses of which a certain word 
 | |
| hypothesis was a part.
 | |
| The word hypothesis is identified by an alignment postion 
 | |
| .I a
 | |
| and the word string
 | |
| .IR w ,
 | |
| and is followed by the integer IDs 
 | |
| .I hi
 | |
| (typically, the N-best ranks)
 | |
| of the associated sentence hypotheses.
 | |
| .PP
 | |
| As another optional element, the word mesh can contain word-level acoustic and
 | |
| temporal information,
 | |
| following the keyword 
 | |
| .BR info ,
 | |
| the alignment position
 | |
| .IR a ,
 | |
| and the word identity
 | |
| .IR w .
 | |
| This information is derived by 
 | |
| .BR nbest-lattice (1)
 | |
| from word- and phone-level backtraces of N-best 
 | |
| hypotheses (as represented in Decipher NBestList2.0 format).
 | |
| The details of this information are defined in the SRILM class 
 | |
| .B NBestWordInfo
 | |
| and subject to change, but currently include the following.
 | |
| .IR start :
 | |
| word start time (in seconds from the beginning of the waveform);
 | |
| .IR dur :
 | |
| word duration (in seconds);
 | |
| .IR ascore :
 | |
| acoustic model likelihood (log base 10);
 | |
| .IR gscore :
 | |
| grammar (LM and pronunciation) score (log base 10);
 | |
| .IR phones :
 | |
| sequence of phones in word (separated by colons);
 | |
| .IR phonedurs :
 | |
| sequence of phone durations (in numbers of frames, separated by colons).
 | |
| When word meshes are derived from HTK format lattices, pronunciation field
 | |
| will consist of the HTK phone alignment information, which encodes both
 | |
| phone sequence and durations; the phone duration field in turn is used
 | |
| to encode the duration model scores, if present.
 | |
| .B Note:
 | |
| The encoded information pertains to the word hypothesis with the highest
 | |
| posterior probability among all hypotheses of the same word aligned
 | |
| to a given word mesh position.
 | |
| .PP
 | |
| The
 | |
| .B time
 | |
| keyword is used for debugging purposes and encodes the estimated timestamp
 | |
| .I t
 | |
| of an alignment position
 | |
| .I a
 | |
| when the input contains backtrace information.
 | |
| It is ignored when reading in word meshes.
 | |
| .PP
 | |
| Both formats optionally encode the associated utterance IDs in the
 | |
| .B name
 | |
| field.
 | |
| Word lattices and meshes can be converted to PFSG format using
 | |
| the script
 | |
| .BR wlat-to-pfsg .
 | |
| .SH "SEE ALSO"
 | |
| nbest-lattice(1), lattice-tool(1),
 | |
| pfsg-scripts(1), pfsg-format(5), nbest-format(5).
 | |
| .br
 | |
| L. Mangu, E. Brill, & A. Stolcke, ``Finding consensus in speech recognition:
 | |
| word error minimization and other applications of confusion networks,''
 | |
| \fIComputer Speech and Language\fP 14(4), 373-400, 2000.
 | |
| .SH BUGS
 | |
| Detailed alignment and acoustic information is so far only implemented
 | |
| for word meshes, although conceptually it would apply equally to word lattices.
 | |
| .SH AUTHOR
 | |
| Andreas Stolcke <andreas.stolcke@microsoft.com>
 | |
| .br
 | |
| Copyright 2001-2011 SRI International
 | |
| .br
 | |
| Copyright 2011-2019 Microsoft Corp.
 | 
