competition update

This commit is contained in:
nckcard
2025-07-02 12:18:09 -07:00
parent 9e17716a4a
commit 77dbcf868f
2615 changed files with 1648116 additions and 125 deletions

View File

@@ -0,0 +1,76 @@
File(3) Library Functions Manual File(3)
NNAAMMEE
File - Wrapper for stdio streams
SSYYNNOOPPSSIISS
##iinncclluuddee <<FFiillee..hh>>
DDEESSCCRRIIPPTTIIOONN
The FFiillee class provides a simple wrapper around stdio streams for use
with C++. It provides two kinds of convenience: Firstly, constructors
and destructors manage opening and closing of the stream. The stream
is checked for errors on closing, and the default behavior is to exit()
with an error message if a problem was found. Secondly, the getline()
method can be used for line-oriented input. It strips comments and
keeps track of input line numbers for error reporting.
CCLLAASSSS MMEEMMBBEERRSS
FFiillee((ccoonnsstt cchhaarr **_n_a_m_e,, ccoonnsstt cchhaarr **_m_o_d_e,, iinntt _e_x_i_t_O_n_E_r_r_o_r == 11))
FFiillee((FFIILLEE **_f_p == 00,, iinntt _e_x_i_t_O_n_E_r_r_o_r == 11))
A File object can be initialized with either a filename or an
existing stdio stream. In the first case, the file is opened
according to _m_o_d_e (as if by ffooppeenn(3)). The _e_x_i_t_O_n_E_r_r_o_r flag
determines whether I/O errors should be treated as fatal.
~~FFiillee(())
Destroying a File object implies closing the associated stream.
cchhaarr **ggeettlliinnee(())
Returns the next line from the input, stored in a static buffer
of up to mmaaxxLLiinneeLLeennggtthh characters. Empty lines and lines start-
ing with ## are skipped.
iinntt cclloossee(())
Closes the stream without destroying the File object. Returns
non-zero is an error condition occurs.
iinntt eerrrroorr(())
Returns a non-zero value if an error condition occurred on the
stream.
ooppeerraattoorr FFIILLEE **(())
A File object can be cast to FFIILLEE ** to access the underlying
stdio stream.
oossttrreeaamm &&ppoossiittiioonn((oossttrreeaamm &&_s_t_r_e_a_m == cceerrrr))
Outputs the current line number on _s_t_r_e_a_m. The _s_t_r_e_a_m is
returned so it can be used as the left operand of the <<<< opera-
tor.
ccoonnsstt cchhaarr **nnaammee
The filename used in creating the File object.
ccoonnsstt uunnssiiggnneedd lliinneennoo
The current line number as maintained by ggeettlliinnee(()).
iinntt eexxiittOOnnEErrrroorr
When set to ttrruuee this causes errors on the stream to be handled
by program termination (after printing an error message).
SSEEEE AALLSSOO
stdio(3)
BBUUGGSS
Many other potentially useful functions are not provided (yet).
AAUUTTHHOORR
Andreas Stolcke <stolcke@icsi.berkeley.edu>
Copyright (c) 1995-1996 SRI International
SRILM $Date: 2019/09/09 22:35:37 $ File(3)

View File

@@ -0,0 +1,147 @@
LM(3) Library Functions Manual LM(3)
NNAAMMEE
LM - Generic language model
SSYYNNOOPPSSIISS
##iinncclluuddee <<LLMM..hh>>
DDEESSCCRRIIPPTTIIOONN
The LLMM class specifies a minimal language model interface and provides
some generic utilities.
LLMM inherits from DDeebbuugg, and the debugging level of an LM object deter-
mines if and how much verbose information various is printed by various
functions.
CCLLAASSSS MMEEMMBBEERRSS
LLMM((VVooccaabb &&_v_o_c_a_b))
Initializeing an LM object requries specifying the vocabulary
over which the LM is defined. The _v_o_c_a_b object can be shared
among different LM instances. The LM object can modify _v_o_c_a_b as
a side-effect, e.g., as a result of reading an LM from a file.
LLooggPP wwoorrddPPrroobb((VVooccaabbIInnddeexx _w_o_r_d,, ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
LLooggPP wwoorrddPPrroobb((VVooccaabbSSttrriinngg _w_o_r_d,, ccoonnsstt VVooccaabbSSttrriinngg **_c_o_n_t_e_x_t))
Returns the conditional log probability of _w_o_r_d given a history.
The history is given in reversed order (most recent word first)
in _c_o_n_t_e_x_t, and terminated by VVooccaabb__NNoonnee. Word or history can
be specified either by strings or indices. All functional LM
subclasses have to implement at least the first version.
LLooggPP wwoorrddPPrroobbRReeccoommppuuttee((VVooccaabbIInnddeexx _w_o_r_d,, ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
Returns the same conditional log probability as wwoorrddPPrroobb(()), but
on the promise that _c_o_n_t_e_x_t is identical to the last call to
wwoorrddPPrroobb(()). This often allows for efficient implementation to
speed up repeated lookups in the same context.
LLooggPP sseenntteenncceePPrroobb((ccoonnsstt VVooccaabbIInnddeexx **_s_e_n_t_e_n_c_e,, TTeexxttSSttaattss &&_s_t_a_t_s))
LLooggPP sseenntteenncceePPrroobb((ccoonnsstt VVooccaabbSSttrriinngg **_s_e_n_t_e_n_c_e,, TTeexxttSSttaattss &&_s_t_a_t_s))
Returns the total log probability of a string of word (a sen-
tence). The data in the _s_t_a_t_s object is incremented to reflect
the statistics of the sentence.
uunnssiiggnneedd ppppllFFiillee((FFiillee &&_f_i_l_e,, TTeexxttSSttaattss &&_s_t_a_t_s,, ccoonnsstt cchhaarr **_e_s_c_a_p_e_S_t_r_i_n_g
== 00))
Reads sentences from _f_i_l_e, computing their probabilities and
aggregate perplexity, and updating the _s_t_a_t_s. The debugging
state of the LM object determines how much information is
printed to stderr. debuglevel 0: total statistics only; debu-
glevel 1: per-sentence statistics; debuglevel 2: word probabili-
ties; debuglevel 3 and greater: LM specific information.
Lines in _f_i_l_e that start with _e_s_c_a_p_e_S_t_r_i_n_g are copied to the
output. This allows extra information in the input file to be
passed through unchanged.
uunnssiiggnneedd rreessccoorreeFFiillee((FFiillee &&_f_i_l_e,, ddoouubbllee _l_m_S_c_a_l_e,, ddoouubbllee _w_t_S_c_a_l_e,, LLMM
&&_o_l_d_L_M,, ddoouubbllee _o_l_d_L_m_S_c_a_l_e,, ddoouubbllee _o_l_d_W_t_S_c_a_l_e,, ccoonnsstt cchhaarr **_e_s_c_a_p_e_S_t_r_i_n_g
== 00))
Reads N-best hypotheses and scores from _f_i_l_e, replaces the LM
scores with new ones computed from the current model, and prints
the new scores (including hypotheses) to stdout. _l_m_S_c_a_l_e and
_w_t_S_c_o_r_e are the LM and word transition weights, respectively.
_o_l_d_L_M is the LM whose scores are included in the aggregate
scores read from the input (provided so that they can be sub-
tracted out), and _o_l_d_L_m_S_c_a_l_e and _o_l_d_W_t_S_c_a_l_e are the old LM and
word transition weights, respectively.
Lines in _f_i_l_e that start with _e_s_c_a_p_e_S_t_r_i_n_g are copied to the
output.
vvooiidd sseettSSttaattee((ccoonnsstt cchhaarr **_s_t_a_t_e))
This is a generic interface to change the internal ``state'' of
a LM. The default implementation of this function does nothing,
but certain LM subclass implementation may interpret the _s_t_a_t_e
string to assume different internal configurations.
PPrroobb wwoorrddPPrroobbSSuumm((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
Returns the sum of all word probabilities in _c_o_n_t_e_x_t. Useful
for checking the well-definedness of a model.
VVooccaabbIInnddeexx ggeenneerraatteeWWoorrdd((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
Returns a word index from the vocabulary, randomly generated
according to the conditional probabilities in _c_o_n_t_e_x_t.
VVooccaabbIInnddeexx **ggeenneerraatteeSSeenntteennccee((uunnssiiggnneedd _m_a_x_W_o_r_d_s == mmaaxxWWoorrddssPPeerrLLiinnee,,
VVooccaabbIInnddeexx **_s_e_n_t_e_n_c_e == 00))
VVooccaabbSSttrriinngg **ggeenneerraatteeSSeenntteennccee((uunnssiiggnneedd _m_a_x_W_o_r_d_s == mmaaxxWWoorrddssPPeerrLLiinnee,,
VVooccaabbSSttrriinngg **_s_e_n_t_e_n_c_e == 00))
Generates a random sentence of length up to _m_a_x_W_o_r_d_s. The
result is placed in _s_e_n_t_e_n_c_e if specified, or in a static buffer
otherwise.
vvooiidd **ccoonntteexxttIIDD((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
Returns an implementation-dependent value that identifies a the
word context used to compute a conditional probability. (The
context actually used may be shorted that what is specified in
_c_o_n_t_e_x_t).
BBoooolleeaann iissNNoonnWWoorrdd((VVooccaabbIInnddeexx _w_o_r_d))
Return ttrruuee if _w_o_r_d is a regular word in the LM, i.e., one that
the LM computes probabilities for (as opposed to non-event tag
such as sentence-start).
BBoooolleeaann rreeaadd((FFiillee &&_f_i_l_e,, BBoooolleeaann _l_i_m_i_t_V_o_c_a_b == ffaallssee))
Read a LM from _f_i_l_e. Return ttrruuee is the file contents was for-
mated correctly and an internal LM representation could be suc-
cessfully constructed from it. The optional 2nd argument con-
trols whether words not already in the vocabulary are to be
added automatically.
vvooiidd wwrriittee((FFiillee &&_f_i_l_e))
Writes the LM to _f_i_l_e in a format that can be read back by
rreeaadd(()).
VVooccaabb &&vvooccaabb
The vocabulary object associated with LM (set at initializa-
tion).
VVooccaabbIInnddeexx nnooiisseeIInnddeexx
The index of the noise tag, i.e., a word that is skipped when
computing probabilities.
ccoonnsstt cchhaarr **ssttaatteeTTaagg
A string introducing ``state'' information that should be passed
to the LM. Input lines starting with this tag are handed to
sseettSSttaattee(()) bbyy ppppllFFiillee(()) aanndd rreessccoorreeFFiillee(())..
BBoooolleeaann rreevveerrsseeWWoorrddss
If set to ttrruuee, the LM reverses word order before computing sen-
tence probabilities. This means wwoorrddPPrroobb(()) is expected to com-
pute conditional probabilities based on _r_i_g_h_t contexts.
SSEEEE AALLSSOO
Vocab(3).
BBUUGGSS
AAUUTTHHOORR
Andreas Stolcke <stolcke@icsi.berkeley.edu>.
Copyright (c) 1995-1996 SRI International
SRILM $Date: 2019/09/09 22:35:37 $ LM(3)

View File

@@ -0,0 +1,79 @@
Prob(3) Library Functions Manual Prob(3)
NNAAMMEE
Prob - Probabilities for SRILM
SSYYNNOOPPSSIISS
##iinncclluuddee <<PPrroobb..hh>>
DDEESSCCRRIIPPTTIIOONN
PPrroobb is a collection of types, constants and utility functions for han-
dling probabilities in the SRILM library.
TTYYPPEESS
PPrroobb A floating point number representing a probability.
LLooggPP Logarithm to base 10 of a probability.
CCOONNSSTTAANNTTSS
LLooggPP__ZZeerroo
Log of probability 0.
LLooggPP__IInnff
Log of probability infinity (not a legal probability, of
course).
LLooggPP__OOnnee
Log of probability 1.
LLooggPP__PPrreecciissiioonn
The number of significant digits in a LogP
PPrroobb__EEppssiilloonn
A positive value close to 0; probability sums less than this
should be considered effectively zero.
FFUUNNCCTTIIOONNSS
BBoooolleeaann ppaarrsseeLLooggPP((ccoonnsstt cchhaarr **_s_t_r_i_n_g,, LLooggPP &&_p_r_o_b))
Converts a floating point string representation into a LogP.
Returns ttrruuee iff the number was parsed correctly. This function
should be much faster than generic C library functions for
floating point parsing. Also, it parses singular LogP's
(plus/minus infinity) correctly.
PPrroobb LLooggPPttooPPPPLL((LLooggPP _p_r_o_b))
Converts a LogP into a perplexity (PPL).
PPrroobbTTooLLooggPP((PPrroobb prob))
Converts a probability into a LogP.
LLooggPP MMiixxLLooggPP((LLooggPP _p_r_o_b_1,, LLooggPP _p_r_o_b_2,, ddoouubbllee _l_a_m_b_d_a))
Computes the LogP resulting from interpolating two LogP's. If
_p_1 and _p_2 are probabilities corresponding to _p_r_o_b_1 and _p_r_o_b_2,
respectively, then the result is the LogP corresponding to
_l_a_m_b_d_a * _p_1 + (1 - _l_a_m_b_d_a) * _p_2.
The following functions deal with _b_y_t_e_l_o_g_s. Bytelogs are logarithms
scaled to represent probabilties and likelihoods as a short integer in
SRI's DECIPHER(TM) recognizer (bytelog(_p) = log(_p) * 10000.5 / 1024).
ddoouubbllee PPrroobbTTooBByytteelloogg((PPrroobb _p_r_o_b))
Converts a probability to a bytelog.
ddoouubbllee LLooggPPttooBByytteelloogg((LLooggPP _p_r_o_b))
Convert a LogP to a bytelog.
LLooggPP BByytteellooggTTooLLooggPP((ddoouubbllee _b_y_t_e_l_o_g))
Convert a bytelog to a LogP.
SSEEEE AALLSSOO
BBUUGGSS
AAUUTTHHOORR
Andreas Stolcke <stolcke@icsi.berkeley.edu>
Copyright (c) 1995-1996 SRI International
SRILM $Date: 2019/09/09 22:35:37 $ Prob(3)

View File

@@ -0,0 +1,241 @@
Vocab(3) Library Functions Manual Vocab(3)
NNAAMMEE
Vocab - Vocabulary indexing for SRILM
SSYYNNOOPPSSIISS
##iinncclluuddee <<VVooccaabb..hh>>
DDEESSCCRRIIPPTTIIOONN
The VVooccaabb class represents sets of string tokens as typically used for
vocabularies, word class names, etc. Additionally, Vocab provides a
mapping from such string tokens (type VVooccaabbSSttrriinngg) to integers (type
VVooccaabbIInnddeexx). VocabIndex values are typically used to index words in
language models to conserve space and speed up comparisons etc. Thus,
VVooccaabb essentially implements a symbol table into which strings can be
``interned.''
TTYYPPEESS
VVooccaabbIInnddeexx
A non-negative integer for representing a string internally.
VVooccaabbSSttrriinngg
A character array representing a vocabulary item (e.g., a word).
CCOONNSSTTAANNTTSS
mmaaxxWWoorrddLLeennggtthh
Maximum number of characters in a VocabString.
VVooccaabb__NNoonnee
A special VocabIndex used to denote no vocabulary item and to
terminate VocabIndex arrays.
VVooccaabb__UUnnkknnoowwnn
VVooccaabb__SSeennttSSttaarrtt
VVooccaabb__SSeennttEEnndd
VVooccaabb__PPaauussee
Default VocabString values for some common, predefined vocabu-
lary items: unknown word, sentence begin, sentence end, and
pause, respectively.
CCLLAASSSS MMEEMMBBEERRSS
VVooccaabb((VVooccaabbIInnddeexx _s_t_a_r_t == 00,, VVooccaabbIInnddeexx _e_n_d == 00xx77ffffffffffffff))
When initializing a Vocab object, _s_t_a_r_t and _e_n_d optionally set
the minimum and maximum VocabIndex values assigned by the vocab-
ulary. Indices are allocated in increasing order starting at
_s_t_a_r_t.
VVooccaabbIInnddeexx aaddddWWoorrdd((VVooccaabbSSttrriinngg _n_a_m_e))
Looks up the index of a word string _n_a_m_e, adding the word if not
already part of the vocabulary.
VVooccaabbSSttrriinngg ggeettWWoorrdd((VVooccaabbIInnddeexx _i_n_d_e_x))
Returns the VocabString for _i_n_d_e_x, or 0 if the index isn't
defined.
ggeettIInnddeexx((VVooccaabbSSttrriinngg _n_a_m_e))
Returns the VocabIndex for word _n_a_m_e, or VVooccaabb__NNoonnee if the word
isn't defined. (Unlike aaddddWWoorrdd(()), this will not extend the
vocabulary if the word is undefined.)
vvooiidd rreemmoovvee((VVooccaabbSSttrriinngg _n_a_m_e))
vvooiidd rreemmoovvee((VVooccaabbIInnddeexx _i_n_d_e_x))
Deletes a vocabulary item, either by name or by index.
uunnssiiggnneedd iinntt nnuummWWoorrddss(())
Returns the number of current vocabulary entries.
VVooccaabbIInnddeexx hhiigghhIInnddeexx(())
Returns the highest VocabIndex value assigned so far. The next
word added will receive an index that is one greater. When
allocating various meaningful vocabulary subsets into contiguous
ranges, this function can be used to determine the corresponding
boundaries in VocabIndex space, and then use these values to
test subset membership etc.
VVooccaabbIInnddeexx uunnkkIInnddeexx
The index of the unknown word (by default assigned to
VVooccaabb__UUnnkknnoowwnn).
VVooccaabbIInnddeexx ssssIInnddeexx
The index of the sentence-start tag (by default assignedrto
VVooccaabb__SSeennttSSttaarrtt).
VVooccaabbIInnddeexx sseeIInnddeexx
The index of the sentence-end tag (by default assigned to
VVooccaabb__SSeennttEEnndd).
VVooccaabbIInnddeexx ppaauusseeIInnddeexx
The index of the pause tag (by default assigned to VVooccaabb__PPaauussee).
BBoooolleeaann uunnkkIIssWWoorrdd
When ttrruuee, the unknown word is considered a regular word
(default ffaallssee).
BBoooolleeaann ttooLLoowweerr
When ttrruuee, all word strings are mapped to lowercase. This is
convenient to combine vocabularies, language models, etc., whose
vocabularies differ only in the case convention (default ffaallssee).
BBoooolleeaann iissNNoonnEEvveenntt((VVooccaabbSSttrriinngg _w_o_r_d))
BBoooolleeaann iissNNoonnEEvveenntt((VVooccaabbIInnddeexx _w_o_r_d))
Tests a word string or index for being an ``non-event'', i.e., a
token that is not assigned probability in a language model. By
default, sentence-start, pauses, and unknown words are non-
events.
uunnssiiggnneedd rreeaadd((FFiillee &&_f_i_l_e))
Reads word strings from a file and adds them to the vocabulary.
For convenience, only the first word on each line is significant
(so extra information could be contained in such a file).
Returns the number of words read.
vvooiidd wwrriittee((FFiillee &&_f_i_l_e,, BBoooolleeaann _s_o_r_t_e_d == ttrruuee))
Write the vocabulary strings to a file in a format compatible
with rreeaadd(()). The _s_o_r_t_e_d argument controls whether the output is
lexicographically sorted.
Often times one wants to manipulate not single vocabulary items, but
strings of them, e.g., to represent sentences. Word strings are repre-
sented as self-delimiting arrays of type VVooccaabbSSttrriinngg ** or VVooccaabbIInnddeexx **.
The last element in a string is 0 or VVooccaabb__NNoonnee, respectively.
uunnssiiggnneedd ggeettWWoorrddss((ccoonnsstt VVooccaabbIInnddeexx **_w_i_d_s,, VVooccaabbSSttrriinngg **_w_o_r_d_s,, uunnssiiggnneedd
_m_a_x))
Extends ggeettWWoorrdd(()) to strings of word. The result is placed in
_w_o_r_d_s, which must have room for at least _m_a_x words. Returns the
actual number of indices in _w_i_d_s.
uunnssiiggnneedd aaddddWWoorrddss((ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s,, VVooccaabbIInnddeexx **_w_i_d_s,, uunnssiiggnneedd
_m_a_x))
Extends aaddddWWoorrdd(()) to strings of indices. The result is placed
in _w_i_d_s, which must have room for at least _m_a_x indices. Returns
the actual number of words in _w_o_r_d_s.
uunnssiiggnneedd ggeettIInnddiicceess((ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s,, VVooccaabbIInnddeexx **_w_i_d_s,,
uunnssiiggnneedd _m_a_x))
Extends ggeettIInnddeexx(()) to strings of indices. The result is placed
in _w_i_d_s, which must have room for at least _m_a_x indices. Returns
the actual number of words in _w_o_r_d_s.
FFUUNNCCTTIIOONNSS
The following static member functions are utilities to manipulate
strings of vocabulary items, independent of a particular vocabulary.
uunnssiiggnneedd ppaarrsseeWWoorrddss((cchhaarr **_l_i_n_e,, VVooccaabbSSttrriinngg **_w_o_r_d_s,, uunnssiiggnneedd _m_a_x))
Parses a character string _l_i_n_e into whitespace-delimited words.
On return, _w_o_r_d_s contains pointers to null-terminated substrings
of _l_i_n_e (whose contents is modified in the process). _w_o_r_d_s must
have room for at least _m_a_x pointers. Returns the actual number
of words parsed.
uunnssiiggnneedd lleennggtthh((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s))
uunnssiiggnneedd lleennggtthh((ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))
Returns the number items in a word string.
BBoooolleeaann ccoonnttaaiinnss((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s,, VVooccaabbIInnddeexx _w_o_r_d))
Returns _t_r_u_e if the _w_o_r_d occurs among _w_o_r_d_s.
VVooccaabbIInnddeexx **rreevveerrssee((VVooccaabbIInnddeexx **_w_o_r_d_s))
VVooccaabbSSttrriinngg **rreevveerrssee((VVooccaabbSSttrriinngg **_w_o_r_d_s))
Reverses a string of words in place (and returns it as a
result).
vvooiidd wwrriittee((FFiillee &&_f_i_l_e,, ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))
Writes a string of space-delimited words to a file.
iinntt ccoommppaarree((VVooccaabbIInnddeexx _w_o_r_d_1,, VVooccaabbIInnddeexx _w_o_r_d_2))
iinntt ccoommppaarree((VVooccaabbSSttrriinngg _w_o_r_d_1,, VVooccaabbSSttrriinngg _w_o_r_d_2))
Compares two vocabulary items lexicographically. Returns -1, 0,
+1 for less than, equal, or greater than, respectively.
iinntt ccoommppaarree((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_1,, ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_2))
iinntt ccoommppaarree((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_1,, ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_2))
Extends the order of _c_o_m_p_a_r_e_(_) to strings of words.
For compatibilty with the C library calling conventions, ccoommppaarree(()) can-
not be a member function of a Vocab object. For index-based compar-
isons the associated vocabulary needs to be set globally. This is
achieved by calling the ccoommppaarreeIInnddeexx(()) member function of a Vocab
object.
oossttrreeaamm &&ooppeerraattoorr<<<< ((oossttrreeaamm &&,, ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))
oossttrreeaamm &&ooppeerraattoorr<<<< ((oossttrreeaamm &&,, ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s))
These operators output strings of words to a stream. For the
second variant, the Vocab object used for interpreting indices
needs to be identified globally by calling the _u_s_e_(_) member
function on the object.
IITTEERRAATTOORRSS
The VVooccaabbIItteerr class provides iteration over vocabularies. An iteration
returns the elements of a Vocab in some unspecified, but deterministic
order.
When copied or used in initialization of other objects, VocabIter
objects retain the current ``position'' in an iteration. This allows
nested iterations that enumerate all pairs of distinct elements, etc.
NOTE: While an iteration over a Vocab object is ongoing, no modifica-
tions are allowed to the object, _e_x_c_e_p_t removal of the ``current''
vocabulary item.
VVooccaabbIItteerr((VVooccaabb &&_v_o_c_a_b,, BBoooolleeaann _s_o_r_t_e_d == ffaallssee))
Creates an iteration over _v_o_c_a_b. If _s_o_r_t_e_d is set to ttrruuee the
vocabulary items will be enumerated in lexicographic order.
vvooiidd iinniitt(())
Reinitializes the iteration to its beginning.
VVooccaabbSSttrriinngg nneexxtt(())
VVooccaabbSSttrriinngg nneexxtt((VVooccaabbIInnddeexx &&_i_n_d_e_x))
Steps the iteration and returns the next word string. Option-
ally, the associated word index is returned in _i_n_d_e_x. Returns 0
if the vocabulary is exhausted.
SSEEEE AALLSSOO
LM(3), File(3)
BBUUGGSS
There is no good way to synchronize VocabIndex values across multiple
Vocab objects.
AAUUTTHHOORR
Andreas Stolcke <stolcke@icsi.berkeley.edu>
Copyright (c) 1995-1996 SRI International
SRILM $Date: 2019/09/09 22:35:37 $ Vocab(3)