competition update
This commit is contained in:
76
language_model/srilm-1.7.3/man/cat3/File.3
Normal file
76
language_model/srilm-1.7.3/man/cat3/File.3
Normal file
@@ -0,0 +1,76 @@
|
||||
File(3) Library Functions Manual File(3)
|
||||
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
File - Wrapper for stdio streams
|
||||
|
||||
SSYYNNOOPPSSIISS
|
||||
##iinncclluuddee <<FFiillee..hh>>
|
||||
|
||||
DDEESSCCRRIIPPTTIIOONN
|
||||
The FFiillee class provides a simple wrapper around stdio streams for use
|
||||
with C++. It provides two kinds of convenience: Firstly, constructors
|
||||
and destructors manage opening and closing of the stream. The stream
|
||||
is checked for errors on closing, and the default behavior is to exit()
|
||||
with an error message if a problem was found. Secondly, the getline()
|
||||
method can be used for line-oriented input. It strips comments and
|
||||
keeps track of input line numbers for error reporting.
|
||||
|
||||
CCLLAASSSS MMEEMMBBEERRSS
|
||||
FFiillee((ccoonnsstt cchhaarr **_n_a_m_e,, ccoonnsstt cchhaarr **_m_o_d_e,, iinntt _e_x_i_t_O_n_E_r_r_o_r == 11))
|
||||
|
||||
FFiillee((FFIILLEE **_f_p == 00,, iinntt _e_x_i_t_O_n_E_r_r_o_r == 11))
|
||||
A File object can be initialized with either a filename or an
|
||||
existing stdio stream. In the first case, the file is opened
|
||||
according to _m_o_d_e (as if by ffooppeenn(3)). The _e_x_i_t_O_n_E_r_r_o_r flag
|
||||
determines whether I/O errors should be treated as fatal.
|
||||
|
||||
~~FFiillee(())
|
||||
Destroying a File object implies closing the associated stream.
|
||||
|
||||
cchhaarr **ggeettlliinnee(())
|
||||
Returns the next line from the input, stored in a static buffer
|
||||
of up to mmaaxxLLiinneeLLeennggtthh characters. Empty lines and lines start-
|
||||
ing with ## are skipped.
|
||||
|
||||
iinntt cclloossee(())
|
||||
Closes the stream without destroying the File object. Returns
|
||||
non-zero is an error condition occurs.
|
||||
|
||||
iinntt eerrrroorr(())
|
||||
Returns a non-zero value if an error condition occurred on the
|
||||
stream.
|
||||
|
||||
ooppeerraattoorr FFIILLEE **(())
|
||||
A File object can be cast to FFIILLEE ** to access the underlying
|
||||
stdio stream.
|
||||
|
||||
oossttrreeaamm &&ppoossiittiioonn((oossttrreeaamm &&_s_t_r_e_a_m == cceerrrr))
|
||||
Outputs the current line number on _s_t_r_e_a_m. The _s_t_r_e_a_m is
|
||||
returned so it can be used as the left operand of the <<<< opera-
|
||||
tor.
|
||||
|
||||
ccoonnsstt cchhaarr **nnaammee
|
||||
The filename used in creating the File object.
|
||||
|
||||
ccoonnsstt uunnssiiggnneedd lliinneennoo
|
||||
The current line number as maintained by ggeettlliinnee(()).
|
||||
|
||||
iinntt eexxiittOOnnEErrrroorr
|
||||
When set to ttrruuee this causes errors on the stream to be handled
|
||||
by program termination (after printing an error message).
|
||||
|
||||
SSEEEE AALLSSOO
|
||||
stdio(3)
|
||||
|
||||
BBUUGGSS
|
||||
Many other potentially useful functions are not provided (yet).
|
||||
|
||||
AAUUTTHHOORR
|
||||
Andreas Stolcke <stolcke@icsi.berkeley.edu>
|
||||
Copyright (c) 1995-1996 SRI International
|
||||
|
||||
|
||||
|
||||
SRILM $Date: 2019/09/09 22:35:37 $ File(3)
|
||||
147
language_model/srilm-1.7.3/man/cat3/LM.3
Normal file
147
language_model/srilm-1.7.3/man/cat3/LM.3
Normal file
@@ -0,0 +1,147 @@
|
||||
LM(3) Library Functions Manual LM(3)
|
||||
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
LM - Generic language model
|
||||
|
||||
SSYYNNOOPPSSIISS
|
||||
##iinncclluuddee <<LLMM..hh>>
|
||||
|
||||
DDEESSCCRRIIPPTTIIOONN
|
||||
The LLMM class specifies a minimal language model interface and provides
|
||||
some generic utilities.
|
||||
|
||||
LLMM inherits from DDeebbuugg, and the debugging level of an LM object deter-
|
||||
mines if and how much verbose information various is printed by various
|
||||
functions.
|
||||
|
||||
CCLLAASSSS MMEEMMBBEERRSS
|
||||
LLMM((VVooccaabb &&_v_o_c_a_b))
|
||||
Initializeing an LM object requries specifying the vocabulary
|
||||
over which the LM is defined. The _v_o_c_a_b object can be shared
|
||||
among different LM instances. The LM object can modify _v_o_c_a_b as
|
||||
a side-effect, e.g., as a result of reading an LM from a file.
|
||||
|
||||
LLooggPP wwoorrddPPrroobb((VVooccaabbIInnddeexx _w_o_r_d,, ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
|
||||
|
||||
LLooggPP wwoorrddPPrroobb((VVooccaabbSSttrriinngg _w_o_r_d,, ccoonnsstt VVooccaabbSSttrriinngg **_c_o_n_t_e_x_t))
|
||||
Returns the conditional log probability of _w_o_r_d given a history.
|
||||
The history is given in reversed order (most recent word first)
|
||||
in _c_o_n_t_e_x_t, and terminated by VVooccaabb__NNoonnee. Word or history can
|
||||
be specified either by strings or indices. All functional LM
|
||||
subclasses have to implement at least the first version.
|
||||
|
||||
LLooggPP wwoorrddPPrroobbRReeccoommppuuttee((VVooccaabbIInnddeexx _w_o_r_d,, ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
|
||||
Returns the same conditional log probability as wwoorrddPPrroobb(()), but
|
||||
on the promise that _c_o_n_t_e_x_t is identical to the last call to
|
||||
wwoorrddPPrroobb(()). This often allows for efficient implementation to
|
||||
speed up repeated lookups in the same context.
|
||||
|
||||
LLooggPP sseenntteenncceePPrroobb((ccoonnsstt VVooccaabbIInnddeexx **_s_e_n_t_e_n_c_e,, TTeexxttSSttaattss &&_s_t_a_t_s))
|
||||
|
||||
LLooggPP sseenntteenncceePPrroobb((ccoonnsstt VVooccaabbSSttrriinngg **_s_e_n_t_e_n_c_e,, TTeexxttSSttaattss &&_s_t_a_t_s))
|
||||
Returns the total log probability of a string of word (a sen-
|
||||
tence). The data in the _s_t_a_t_s object is incremented to reflect
|
||||
the statistics of the sentence.
|
||||
|
||||
uunnssiiggnneedd ppppllFFiillee((FFiillee &&_f_i_l_e,, TTeexxttSSttaattss &&_s_t_a_t_s,, ccoonnsstt cchhaarr **_e_s_c_a_p_e_S_t_r_i_n_g
|
||||
== 00))
|
||||
Reads sentences from _f_i_l_e, computing their probabilities and
|
||||
aggregate perplexity, and updating the _s_t_a_t_s. The debugging
|
||||
state of the LM object determines how much information is
|
||||
printed to stderr. debuglevel 0: total statistics only; debu-
|
||||
glevel 1: per-sentence statistics; debuglevel 2: word probabili-
|
||||
ties; debuglevel 3 and greater: LM specific information.
|
||||
Lines in _f_i_l_e that start with _e_s_c_a_p_e_S_t_r_i_n_g are copied to the
|
||||
output. This allows extra information in the input file to be
|
||||
passed through unchanged.
|
||||
|
||||
uunnssiiggnneedd rreessccoorreeFFiillee((FFiillee &&_f_i_l_e,, ddoouubbllee _l_m_S_c_a_l_e,, ddoouubbllee _w_t_S_c_a_l_e,, LLMM
|
||||
&&_o_l_d_L_M,, ddoouubbllee _o_l_d_L_m_S_c_a_l_e,, ddoouubbllee _o_l_d_W_t_S_c_a_l_e,, ccoonnsstt cchhaarr **_e_s_c_a_p_e_S_t_r_i_n_g
|
||||
== 00))
|
||||
Reads N-best hypotheses and scores from _f_i_l_e, replaces the LM
|
||||
scores with new ones computed from the current model, and prints
|
||||
the new scores (including hypotheses) to stdout. _l_m_S_c_a_l_e and
|
||||
_w_t_S_c_o_r_e are the LM and word transition weights, respectively.
|
||||
_o_l_d_L_M is the LM whose scores are included in the aggregate
|
||||
scores read from the input (provided so that they can be sub-
|
||||
tracted out), and _o_l_d_L_m_S_c_a_l_e and _o_l_d_W_t_S_c_a_l_e are the old LM and
|
||||
word transition weights, respectively.
|
||||
Lines in _f_i_l_e that start with _e_s_c_a_p_e_S_t_r_i_n_g are copied to the
|
||||
output.
|
||||
|
||||
vvooiidd sseettSSttaattee((ccoonnsstt cchhaarr **_s_t_a_t_e))
|
||||
This is a generic interface to change the internal ``state'' of
|
||||
a LM. The default implementation of this function does nothing,
|
||||
but certain LM subclass implementation may interpret the _s_t_a_t_e
|
||||
string to assume different internal configurations.
|
||||
|
||||
PPrroobb wwoorrddPPrroobbSSuumm((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
|
||||
Returns the sum of all word probabilities in _c_o_n_t_e_x_t. Useful
|
||||
for checking the well-definedness of a model.
|
||||
|
||||
VVooccaabbIInnddeexx ggeenneerraatteeWWoorrdd((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
|
||||
Returns a word index from the vocabulary, randomly generated
|
||||
according to the conditional probabilities in _c_o_n_t_e_x_t.
|
||||
|
||||
VVooccaabbIInnddeexx **ggeenneerraatteeSSeenntteennccee((uunnssiiggnneedd _m_a_x_W_o_r_d_s == mmaaxxWWoorrddssPPeerrLLiinnee,,
|
||||
VVooccaabbIInnddeexx **_s_e_n_t_e_n_c_e == 00))
|
||||
|
||||
VVooccaabbSSttrriinngg **ggeenneerraatteeSSeenntteennccee((uunnssiiggnneedd _m_a_x_W_o_r_d_s == mmaaxxWWoorrddssPPeerrLLiinnee,,
|
||||
VVooccaabbSSttrriinngg **_s_e_n_t_e_n_c_e == 00))
|
||||
Generates a random sentence of length up to _m_a_x_W_o_r_d_s. The
|
||||
result is placed in _s_e_n_t_e_n_c_e if specified, or in a static buffer
|
||||
otherwise.
|
||||
|
||||
vvooiidd **ccoonntteexxttIIDD((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t))
|
||||
Returns an implementation-dependent value that identifies a the
|
||||
word context used to compute a conditional probability. (The
|
||||
context actually used may be shorted that what is specified in
|
||||
_c_o_n_t_e_x_t).
|
||||
|
||||
BBoooolleeaann iissNNoonnWWoorrdd((VVooccaabbIInnddeexx _w_o_r_d))
|
||||
Return ttrruuee if _w_o_r_d is a regular word in the LM, i.e., one that
|
||||
the LM computes probabilities for (as opposed to non-event tag
|
||||
such as sentence-start).
|
||||
|
||||
BBoooolleeaann rreeaadd((FFiillee &&_f_i_l_e,, BBoooolleeaann _l_i_m_i_t_V_o_c_a_b == ffaallssee))
|
||||
Read a LM from _f_i_l_e. Return ttrruuee is the file contents was for-
|
||||
mated correctly and an internal LM representation could be suc-
|
||||
cessfully constructed from it. The optional 2nd argument con-
|
||||
trols whether words not already in the vocabulary are to be
|
||||
added automatically.
|
||||
|
||||
vvooiidd wwrriittee((FFiillee &&_f_i_l_e))
|
||||
Writes the LM to _f_i_l_e in a format that can be read back by
|
||||
rreeaadd(()).
|
||||
|
||||
VVooccaabb &&vvooccaabb
|
||||
The vocabulary object associated with LM (set at initializa-
|
||||
tion).
|
||||
|
||||
VVooccaabbIInnddeexx nnooiisseeIInnddeexx
|
||||
The index of the noise tag, i.e., a word that is skipped when
|
||||
computing probabilities.
|
||||
|
||||
ccoonnsstt cchhaarr **ssttaatteeTTaagg
|
||||
A string introducing ``state'' information that should be passed
|
||||
to the LM. Input lines starting with this tag are handed to
|
||||
sseettSSttaattee(()) bbyy ppppllFFiillee(()) aanndd rreessccoorreeFFiillee(())..
|
||||
|
||||
BBoooolleeaann rreevveerrsseeWWoorrddss
|
||||
If set to ttrruuee, the LM reverses word order before computing sen-
|
||||
tence probabilities. This means wwoorrddPPrroobb(()) is expected to com-
|
||||
pute conditional probabilities based on _r_i_g_h_t contexts.
|
||||
|
||||
SSEEEE AALLSSOO
|
||||
Vocab(3).
|
||||
|
||||
BBUUGGSS
|
||||
AAUUTTHHOORR
|
||||
Andreas Stolcke <stolcke@icsi.berkeley.edu>.
|
||||
Copyright (c) 1995-1996 SRI International
|
||||
|
||||
|
||||
|
||||
SRILM $Date: 2019/09/09 22:35:37 $ LM(3)
|
||||
79
language_model/srilm-1.7.3/man/cat3/Prob.3
Normal file
79
language_model/srilm-1.7.3/man/cat3/Prob.3
Normal file
@@ -0,0 +1,79 @@
|
||||
Prob(3) Library Functions Manual Prob(3)
|
||||
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
Prob - Probabilities for SRILM
|
||||
|
||||
SSYYNNOOPPSSIISS
|
||||
##iinncclluuddee <<PPrroobb..hh>>
|
||||
|
||||
DDEESSCCRRIIPPTTIIOONN
|
||||
PPrroobb is a collection of types, constants and utility functions for han-
|
||||
dling probabilities in the SRILM library.
|
||||
|
||||
TTYYPPEESS
|
||||
PPrroobb A floating point number representing a probability.
|
||||
|
||||
LLooggPP Logarithm to base 10 of a probability.
|
||||
|
||||
CCOONNSSTTAANNTTSS
|
||||
LLooggPP__ZZeerroo
|
||||
Log of probability 0.
|
||||
|
||||
LLooggPP__IInnff
|
||||
Log of probability infinity (not a legal probability, of
|
||||
course).
|
||||
|
||||
LLooggPP__OOnnee
|
||||
Log of probability 1.
|
||||
|
||||
LLooggPP__PPrreecciissiioonn
|
||||
The number of significant digits in a LogP
|
||||
|
||||
PPrroobb__EEppssiilloonn
|
||||
A positive value close to 0; probability sums less than this
|
||||
should be considered effectively zero.
|
||||
|
||||
FFUUNNCCTTIIOONNSS
|
||||
BBoooolleeaann ppaarrsseeLLooggPP((ccoonnsstt cchhaarr **_s_t_r_i_n_g,, LLooggPP &&_p_r_o_b))
|
||||
Converts a floating point string representation into a LogP.
|
||||
Returns ttrruuee iff the number was parsed correctly. This function
|
||||
should be much faster than generic C library functions for
|
||||
floating point parsing. Also, it parses singular LogP's
|
||||
(plus/minus infinity) correctly.
|
||||
|
||||
PPrroobb LLooggPPttooPPPPLL((LLooggPP _p_r_o_b))
|
||||
Converts a LogP into a perplexity (PPL).
|
||||
|
||||
PPrroobbTTooLLooggPP((PPrroobb prob))
|
||||
Converts a probability into a LogP.
|
||||
|
||||
LLooggPP MMiixxLLooggPP((LLooggPP _p_r_o_b_1,, LLooggPP _p_r_o_b_2,, ddoouubbllee _l_a_m_b_d_a))
|
||||
Computes the LogP resulting from interpolating two LogP's. If
|
||||
_p_1 and _p_2 are probabilities corresponding to _p_r_o_b_1 and _p_r_o_b_2,
|
||||
respectively, then the result is the LogP corresponding to
|
||||
_l_a_m_b_d_a * _p_1 + (1 - _l_a_m_b_d_a) * _p_2.
|
||||
|
||||
The following functions deal with _b_y_t_e_l_o_g_s. Bytelogs are logarithms
|
||||
scaled to represent probabilties and likelihoods as a short integer in
|
||||
SRI's DECIPHER(TM) recognizer (bytelog(_p) = log(_p) * 10000.5 / 1024).
|
||||
|
||||
ddoouubbllee PPrroobbTTooBByytteelloogg((PPrroobb _p_r_o_b))
|
||||
Converts a probability to a bytelog.
|
||||
|
||||
ddoouubbllee LLooggPPttooBByytteelloogg((LLooggPP _p_r_o_b))
|
||||
Convert a LogP to a bytelog.
|
||||
|
||||
LLooggPP BByytteellooggTTooLLooggPP((ddoouubbllee _b_y_t_e_l_o_g))
|
||||
Convert a bytelog to a LogP.
|
||||
|
||||
SSEEEE AALLSSOO
|
||||
BBUUGGSS
|
||||
AAUUTTHHOORR
|
||||
Andreas Stolcke <stolcke@icsi.berkeley.edu>
|
||||
Copyright (c) 1995-1996 SRI International
|
||||
|
||||
|
||||
|
||||
SRILM $Date: 2019/09/09 22:35:37 $ Prob(3)
|
||||
241
language_model/srilm-1.7.3/man/cat3/Vocab.3
Normal file
241
language_model/srilm-1.7.3/man/cat3/Vocab.3
Normal file
@@ -0,0 +1,241 @@
|
||||
Vocab(3) Library Functions Manual Vocab(3)
|
||||
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
Vocab - Vocabulary indexing for SRILM
|
||||
|
||||
SSYYNNOOPPSSIISS
|
||||
##iinncclluuddee <<VVooccaabb..hh>>
|
||||
|
||||
DDEESSCCRRIIPPTTIIOONN
|
||||
The VVooccaabb class represents sets of string tokens as typically used for
|
||||
vocabularies, word class names, etc. Additionally, Vocab provides a
|
||||
mapping from such string tokens (type VVooccaabbSSttrriinngg) to integers (type
|
||||
VVooccaabbIInnddeexx). VocabIndex values are typically used to index words in
|
||||
language models to conserve space and speed up comparisons etc. Thus,
|
||||
VVooccaabb essentially implements a symbol table into which strings can be
|
||||
``interned.''
|
||||
|
||||
TTYYPPEESS
|
||||
VVooccaabbIInnddeexx
|
||||
A non-negative integer for representing a string internally.
|
||||
|
||||
VVooccaabbSSttrriinngg
|
||||
A character array representing a vocabulary item (e.g., a word).
|
||||
|
||||
CCOONNSSTTAANNTTSS
|
||||
mmaaxxWWoorrddLLeennggtthh
|
||||
Maximum number of characters in a VocabString.
|
||||
|
||||
VVooccaabb__NNoonnee
|
||||
A special VocabIndex used to denote no vocabulary item and to
|
||||
terminate VocabIndex arrays.
|
||||
|
||||
VVooccaabb__UUnnkknnoowwnn
|
||||
|
||||
VVooccaabb__SSeennttSSttaarrtt
|
||||
|
||||
VVooccaabb__SSeennttEEnndd
|
||||
|
||||
VVooccaabb__PPaauussee
|
||||
Default VocabString values for some common, predefined vocabu-
|
||||
lary items: unknown word, sentence begin, sentence end, and
|
||||
pause, respectively.
|
||||
|
||||
CCLLAASSSS MMEEMMBBEERRSS
|
||||
VVooccaabb((VVooccaabbIInnddeexx _s_t_a_r_t == 00,, VVooccaabbIInnddeexx _e_n_d == 00xx77ffffffffffffff))
|
||||
When initializing a Vocab object, _s_t_a_r_t and _e_n_d optionally set
|
||||
the minimum and maximum VocabIndex values assigned by the vocab-
|
||||
ulary. Indices are allocated in increasing order starting at
|
||||
_s_t_a_r_t.
|
||||
|
||||
VVooccaabbIInnddeexx aaddddWWoorrdd((VVooccaabbSSttrriinngg _n_a_m_e))
|
||||
Looks up the index of a word string _n_a_m_e, adding the word if not
|
||||
already part of the vocabulary.
|
||||
|
||||
VVooccaabbSSttrriinngg ggeettWWoorrdd((VVooccaabbIInnddeexx _i_n_d_e_x))
|
||||
Returns the VocabString for _i_n_d_e_x, or 0 if the index isn't
|
||||
defined.
|
||||
|
||||
ggeettIInnddeexx((VVooccaabbSSttrriinngg _n_a_m_e))
|
||||
Returns the VocabIndex for word _n_a_m_e, or VVooccaabb__NNoonnee if the word
|
||||
isn't defined. (Unlike aaddddWWoorrdd(()), this will not extend the
|
||||
vocabulary if the word is undefined.)
|
||||
|
||||
vvooiidd rreemmoovvee((VVooccaabbSSttrriinngg _n_a_m_e))
|
||||
|
||||
vvooiidd rreemmoovvee((VVooccaabbIInnddeexx _i_n_d_e_x))
|
||||
Deletes a vocabulary item, either by name or by index.
|
||||
|
||||
uunnssiiggnneedd iinntt nnuummWWoorrddss(())
|
||||
Returns the number of current vocabulary entries.
|
||||
|
||||
VVooccaabbIInnddeexx hhiigghhIInnddeexx(())
|
||||
Returns the highest VocabIndex value assigned so far. The next
|
||||
word added will receive an index that is one greater. When
|
||||
allocating various meaningful vocabulary subsets into contiguous
|
||||
ranges, this function can be used to determine the corresponding
|
||||
boundaries in VocabIndex space, and then use these values to
|
||||
test subset membership etc.
|
||||
|
||||
VVooccaabbIInnddeexx uunnkkIInnddeexx
|
||||
The index of the unknown word (by default assigned to
|
||||
VVooccaabb__UUnnkknnoowwnn).
|
||||
|
||||
VVooccaabbIInnddeexx ssssIInnddeexx
|
||||
The index of the sentence-start tag (by default assignedrto
|
||||
VVooccaabb__SSeennttSSttaarrtt).
|
||||
|
||||
VVooccaabbIInnddeexx sseeIInnddeexx
|
||||
The index of the sentence-end tag (by default assigned to
|
||||
VVooccaabb__SSeennttEEnndd).
|
||||
|
||||
VVooccaabbIInnddeexx ppaauusseeIInnddeexx
|
||||
The index of the pause tag (by default assigned to VVooccaabb__PPaauussee).
|
||||
|
||||
BBoooolleeaann uunnkkIIssWWoorrdd
|
||||
When ttrruuee, the unknown word is considered a regular word
|
||||
(default ffaallssee).
|
||||
|
||||
BBoooolleeaann ttooLLoowweerr
|
||||
When ttrruuee, all word strings are mapped to lowercase. This is
|
||||
convenient to combine vocabularies, language models, etc., whose
|
||||
vocabularies differ only in the case convention (default ffaallssee).
|
||||
|
||||
BBoooolleeaann iissNNoonnEEvveenntt((VVooccaabbSSttrriinngg _w_o_r_d))
|
||||
|
||||
BBoooolleeaann iissNNoonnEEvveenntt((VVooccaabbIInnddeexx _w_o_r_d))
|
||||
Tests a word string or index for being an ``non-event'', i.e., a
|
||||
token that is not assigned probability in a language model. By
|
||||
default, sentence-start, pauses, and unknown words are non-
|
||||
events.
|
||||
|
||||
uunnssiiggnneedd rreeaadd((FFiillee &&_f_i_l_e))
|
||||
Reads word strings from a file and adds them to the vocabulary.
|
||||
For convenience, only the first word on each line is significant
|
||||
(so extra information could be contained in such a file).
|
||||
Returns the number of words read.
|
||||
|
||||
vvooiidd wwrriittee((FFiillee &&_f_i_l_e,, BBoooolleeaann _s_o_r_t_e_d == ttrruuee))
|
||||
Write the vocabulary strings to a file in a format compatible
|
||||
with rreeaadd(()). The _s_o_r_t_e_d argument controls whether the output is
|
||||
lexicographically sorted.
|
||||
|
||||
Often times one wants to manipulate not single vocabulary items, but
|
||||
strings of them, e.g., to represent sentences. Word strings are repre-
|
||||
sented as self-delimiting arrays of type VVooccaabbSSttrriinngg ** or VVooccaabbIInnddeexx **.
|
||||
The last element in a string is 0 or VVooccaabb__NNoonnee, respectively.
|
||||
|
||||
uunnssiiggnneedd ggeettWWoorrddss((ccoonnsstt VVooccaabbIInnddeexx **_w_i_d_s,, VVooccaabbSSttrriinngg **_w_o_r_d_s,, uunnssiiggnneedd
|
||||
_m_a_x))
|
||||
Extends ggeettWWoorrdd(()) to strings of word. The result is placed in
|
||||
_w_o_r_d_s, which must have room for at least _m_a_x words. Returns the
|
||||
actual number of indices in _w_i_d_s.
|
||||
|
||||
uunnssiiggnneedd aaddddWWoorrddss((ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s,, VVooccaabbIInnddeexx **_w_i_d_s,, uunnssiiggnneedd
|
||||
_m_a_x))
|
||||
Extends aaddddWWoorrdd(()) to strings of indices. The result is placed
|
||||
in _w_i_d_s, which must have room for at least _m_a_x indices. Returns
|
||||
the actual number of words in _w_o_r_d_s.
|
||||
|
||||
uunnssiiggnneedd ggeettIInnddiicceess((ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s,, VVooccaabbIInnddeexx **_w_i_d_s,,
|
||||
uunnssiiggnneedd _m_a_x))
|
||||
Extends ggeettIInnddeexx(()) to strings of indices. The result is placed
|
||||
in _w_i_d_s, which must have room for at least _m_a_x indices. Returns
|
||||
the actual number of words in _w_o_r_d_s.
|
||||
|
||||
FFUUNNCCTTIIOONNSS
|
||||
The following static member functions are utilities to manipulate
|
||||
strings of vocabulary items, independent of a particular vocabulary.
|
||||
|
||||
uunnssiiggnneedd ppaarrsseeWWoorrddss((cchhaarr **_l_i_n_e,, VVooccaabbSSttrriinngg **_w_o_r_d_s,, uunnssiiggnneedd _m_a_x))
|
||||
Parses a character string _l_i_n_e into whitespace-delimited words.
|
||||
On return, _w_o_r_d_s contains pointers to null-terminated substrings
|
||||
of _l_i_n_e (whose contents is modified in the process). _w_o_r_d_s must
|
||||
have room for at least _m_a_x pointers. Returns the actual number
|
||||
of words parsed.
|
||||
|
||||
uunnssiiggnneedd lleennggtthh((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s))
|
||||
|
||||
uunnssiiggnneedd lleennggtthh((ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))
|
||||
Returns the number items in a word string.
|
||||
|
||||
BBoooolleeaann ccoonnttaaiinnss((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s,, VVooccaabbIInnddeexx _w_o_r_d))
|
||||
Returns _t_r_u_e if the _w_o_r_d occurs among _w_o_r_d_s.
|
||||
|
||||
VVooccaabbIInnddeexx **rreevveerrssee((VVooccaabbIInnddeexx **_w_o_r_d_s))
|
||||
|
||||
VVooccaabbSSttrriinngg **rreevveerrssee((VVooccaabbSSttrriinngg **_w_o_r_d_s))
|
||||
Reverses a string of words in place (and returns it as a
|
||||
result).
|
||||
|
||||
vvooiidd wwrriittee((FFiillee &&_f_i_l_e,, ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))
|
||||
Writes a string of space-delimited words to a file.
|
||||
|
||||
iinntt ccoommppaarree((VVooccaabbIInnddeexx _w_o_r_d_1,, VVooccaabbIInnddeexx _w_o_r_d_2))
|
||||
|
||||
iinntt ccoommppaarree((VVooccaabbSSttrriinngg _w_o_r_d_1,, VVooccaabbSSttrriinngg _w_o_r_d_2))
|
||||
Compares two vocabulary items lexicographically. Returns -1, 0,
|
||||
+1 for less than, equal, or greater than, respectively.
|
||||
|
||||
iinntt ccoommppaarree((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_1,, ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_2))
|
||||
|
||||
iinntt ccoommppaarree((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_1,, ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s_2))
|
||||
Extends the order of _c_o_m_p_a_r_e_(_) to strings of words.
|
||||
|
||||
For compatibilty with the C library calling conventions, ccoommppaarree(()) can-
|
||||
not be a member function of a Vocab object. For index-based compar-
|
||||
isons the associated vocabulary needs to be set globally. This is
|
||||
achieved by calling the ccoommppaarreeIInnddeexx(()) member function of a Vocab
|
||||
object.
|
||||
|
||||
oossttrreeaamm &&ooppeerraattoorr<<<< ((oossttrreeaamm &&,, ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))
|
||||
|
||||
oossttrreeaamm &&ooppeerraattoorr<<<< ((oossttrreeaamm &&,, ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s))
|
||||
These operators output strings of words to a stream. For the
|
||||
second variant, the Vocab object used for interpreting indices
|
||||
needs to be identified globally by calling the _u_s_e_(_) member
|
||||
function on the object.
|
||||
|
||||
IITTEERRAATTOORRSS
|
||||
The VVooccaabbIItteerr class provides iteration over vocabularies. An iteration
|
||||
returns the elements of a Vocab in some unspecified, but deterministic
|
||||
order.
|
||||
|
||||
When copied or used in initialization of other objects, VocabIter
|
||||
objects retain the current ``position'' in an iteration. This allows
|
||||
nested iterations that enumerate all pairs of distinct elements, etc.
|
||||
|
||||
NOTE: While an iteration over a Vocab object is ongoing, no modifica-
|
||||
tions are allowed to the object, _e_x_c_e_p_t removal of the ``current''
|
||||
vocabulary item.
|
||||
|
||||
VVooccaabbIItteerr((VVooccaabb &&_v_o_c_a_b,, BBoooolleeaann _s_o_r_t_e_d == ffaallssee))
|
||||
Creates an iteration over _v_o_c_a_b. If _s_o_r_t_e_d is set to ttrruuee the
|
||||
vocabulary items will be enumerated in lexicographic order.
|
||||
|
||||
vvooiidd iinniitt(())
|
||||
Reinitializes the iteration to its beginning.
|
||||
|
||||
VVooccaabbSSttrriinngg nneexxtt(())
|
||||
|
||||
VVooccaabbSSttrriinngg nneexxtt((VVooccaabbIInnddeexx &&_i_n_d_e_x))
|
||||
Steps the iteration and returns the next word string. Option-
|
||||
ally, the associated word index is returned in _i_n_d_e_x. Returns 0
|
||||
if the vocabulary is exhausted.
|
||||
|
||||
SSEEEE AALLSSOO
|
||||
LM(3), File(3)
|
||||
|
||||
BBUUGGSS
|
||||
There is no good way to synchronize VocabIndex values across multiple
|
||||
Vocab objects.
|
||||
|
||||
AAUUTTHHOORR
|
||||
Andreas Stolcke <stolcke@icsi.berkeley.edu>
|
||||
Copyright (c) 1995-1996 SRI International
|
||||
|
||||
|
||||
|
||||
SRILM $Date: 2019/09/09 22:35:37 $ Vocab(3)
|
||||
Reference in New Issue
Block a user