competition update

2025-07-02 12:18:09 -07:00
parent 9e17716a4a
commit 77dbcf868f
2615 changed files with 1648116 additions and 125 deletions
--- a/language_model/srilm-1.7.3/lm/test/doc/FAQ
+++ b/language_model/srilm-1.7.3/lm/test/doc/FAQ
@@ -0,0 +1,12 @@
+
+This file has been superceded by the "srilm-faq" manual page.
+View with
+
+	man -M$SRILM/man srilm-faq
+
+or online at
+
+        http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.html
+
+Please send corrections and additions to the author.
+
--- a/language_model/srilm-1.7.3/lm/test/doc/FSM
+++ b/language_model/srilm-1.7.3/lm/test/doc/FSM
@@ -0,0 +1,21 @@
+
+Conversion between Decipher PFSG (probabilistic finite state grammars) and
+AT&T FSA (finite-state acceptor) formats
+
+1. Convert PFSG to FSA format
+
+	pfsg-to-fsm symbolfile=foo.syms foo.pfsg > foo.fsm
+
+2. Compile FSA
+	
+	fsmcompile foo.fsm > foo.fsmc
+
+3. Operate on FSA, e.g., determinize and minimize
+
+	fsmdeterminize foo.fsmc | fsmminimize > foo.min.fsmc
+
+4. Print and convert back to PFSG
+
+	fsmprint -i foo.syms foo.min.fsmc | \
+		fsm-to-pfsg > foo.min.pfsg
+
--- a/language_model/srilm-1.7.3/lm/test/doc/README.linux
+++ b/language_model/srilm-1.7.3/lm/test/doc/README.linux
@@ -0,0 +1,42 @@
+
+The default Linux Makefile (common/Makefile.machine.i686) uses /usr/bin/gawk
+as the gawk interpreter.  To ensure correct working in UTF-8 locales run
+/usr/bin/gawk --version and make sure it is at least version 3.1.5.
+If not, it is recommended to install the latest version of gawk in
+/usr/local/bin and update the GAWK variable accordingly.
+
+-----------------------------------------------------------------------------
+From:    Alex Franz <alex@google.com>
+Date:    Fri, 06 Oct 2000 16:14:03 PDT
+
+And, here are the details of the malloc problem that I had with the SRI
+LM toolkit:
+
+I compiled it with gcc under Redhat Linux V. 6.2 (or thereabouts). 
+The malloc routine has problems allocating large numbers of 
+small pieces of memory. For me, it usually refuses to allocate
+any more memory once it has allocated about 600 MBytes of memory,
+even though the machine has 2 GBytes of real memory.
+
+This causes a problem when you are trying to build language models
+with large vocabularies. Even though I used -DUSE_SARRAY_TRIE -DUSE_SARRAY
+to use arrays instead of hash tables, it ran out of memory when I was
+trying to use very large vocabulary sizes.
+
+The solution that worked for me was to use Wolfram Gloger's ptmalloc package 
+for memory management instead. You can download it from 
+
+  http://malloc.de/en/index.html
+
+(The page suggests that it is part of the Gnu C library, but I had to
+compile it myself and explicitly link it with the executables.)
+
+One more thing you can do is call the function
+
+  mallopt(M_MMAP_MAX, n);  
+
+with a sufficiently large n; this tells malloc to allow you to
+obtain a large amount of memory.
+
+------------------------------------------------------------------------------
+
--- a/language_model/srilm-1.7.3/lm/test/doc/README.macosx
+++ b/language_model/srilm-1.7.3/lm/test/doc/README.macosx
--- a/language_model/srilm-1.7.3/lm/test/doc/README.windows-cygwin
+++ b/language_model/srilm-1.7.3/lm/test/doc/README.windows-cygwin
@@ -0,0 +1,50 @@
+
+SRILM version 1.3 and higher has been successfully built and tested using 
+the CYGWIN environment (http://www.cygwin.com) as of Feb 11, 2002.
+The test used CYGWIN DLL 1.3.9 and gcc 2.95.3 (but note the warning
+in doc/README.x86 regarding a comiler bug), and ran successfully
+on a Windows 98 and a Windows 2000 Professional system.
+
+The following special measures were taken to ensure a successful build:
+
+- Make sure the make, gcc, binutils, libiconv, gzip, tcltk, and gawk packages
+  are installed with CYGWIN.  To run the tests you will also need the
+  diffutils and time packages.
+
+After installation, set your bash environment as follows
+
+    export SRILM=/cygdrive/c/srilm13   # or similar
+		# do NOT use backslash in path names SRILM=C:\... 
+    export MACHINE_TYPE=cygwin
+    export PATH=$PATH:$SRILM/bin:$SRILM/bin/cygwin	# mentioned in INSTALL
+    export MANPATH=$MANPATH:$SRILM/man     		# mentioned in INSTALL
+
+or the equivalent for other shells.
+
+As of version 1.4.5, SRILM can also be built in the MinGW environment
+(http://www.mingw.org).  For this the default (cygwin) has to be overridden
+using
+
+    make MACHINE_TYPE=win32
+
+For 64bit binaries use 
+
+    make MACHINE_TYPE=win64
+
+Of course the corresponding versions of the MinGW development environment
+(C, C++, binutils) have to be installed in Cygwin.  Make sure the Cygwin
+installation is generally up-to-date.
+
+It may be necessary to include the following directories in the PATH
+environment variable for the runtime dynamic libraries to be found:
+
+	/usr/i686-pc-mingw32/sys-root/mingw/bin	(win32)
+	/usr/x86_64-w64-mingw32/sys-root/mingw/bin (win64)
+
+Some functionality is not supported under MinGW:
+
+    - compressed file I/O
+    - nbest-optimize and lattice-tool -max-time option
+
+A. Stolcke
+$Id: README.windows-cygwin,v 1.10 2013/01/31 18:03:33 stolcke Exp $
--- a/language_model/srilm-1.7.3/lm/test/doc/README.windows-msvc
+++ b/language_model/srilm-1.7.3/lm/test/doc/README.windows-msvc
@@ -0,0 +1,63 @@
+
+Recommendations for compiling with Microsoft Visual C++
+The build procedure has been tested with the freely available
+Visual C++ 8 that can be downloaded from www.microsoft.com as 
+"Visual C++ 2005 Express Edition".
+
+0) Install the cygwin environment, as described in README.windows-cygwin .
+   Cygwin tools are needed to run the build process and generate program
+   dependencies.
+
+1) SRILM can be set to the cygwin path of the SRILM root directory
+   (e.g., /home/username/srilm)
+
+2) Make sure environment variables are set to locate MSVC tools and files:
+
+	PATH    should include MSVC_INSTALL_DIR/bin and 
+	        MSVC_INSTALL_DIR/Common7/IDE (for dll search)
+        MSVCDIR should be set to MSVC_INSTALL_DIR
+	INCLUDE should be set to MSVC_INSTALL_DIR/include
+	LIB     should be set to MSVC_INSTALL_DIR/lib
+
+   Note: PATH needs to use cygwin pathname conventions, but MSVCDIR,
+   INCLUDE and LIB must use Windows paths.  For example:
+
+	PATH="/cygdrive/c/Program Files/Microsoft Visual Studio 8/VC/bin:/cygdrive/c/Program Files/Microsoft Visual Studio 8/Common7/IDE:$PATH"
+	MSVCDIR="c:\\Program Files\\Microsoft Visual Studio 8\\VC"
+	INCLUDE="$MSVCDIR\\include"
+	LIB="$MSVCDIR\\lib"
+	export PATH MSVCDIR INCLUDE LIB
+
+   could be used in bash given the default installation location of Visual
+   C++ 2005 Express Edition under c:\Program Files\Microsoft Visual Studio 8.
+
+   Alternatively, you could use the vcvars32.bat script that comes with
+   MSVC to set these environment variables.
+
+3) Build in a cygwin shell with
+
+	make MACHINE_TYPE=msvc
+
+   or
+
+	make MACHINE_TYPE=msvc64
+
+   to generate 64bit binaries.
+
+   As with MinGW, some functionality is not supported:
+
+    - compressed file I/O other than gzip files
+    - nbest-optimize and lattice-tool -max-time option
+
+   Also note that make will try to determine if certain libraries
+   are installed on your system and enable the /openmp option if so.
+   This means that binaries built with the full Visual Studio compiler
+   might not run on systems that have only Visual Studio Express.
+   To avoid this disable /openmp by commenting out the corresponding
+   line containing "/openmp" in common/Makefile.machine.msvc.
+
+4) Run test suite with
+
+	cd test
+	make MACHINE_TYPE=msvc try
+
--- a/language_model/srilm-1.7.3/lm/test/doc/README.windows-msvc-visual-studio
+++ b/language_model/srilm-1.7.3/lm/test/doc/README.windows-msvc-visual-studio
@@ -0,0 +1,71 @@
+
+Recommendations for compiling with Microsoft Visual Studio 2005.
+
+Microsoft Visual Studio projects are available for Microsoft Visual
+Studio 2005. 
+
+The advantages of building directly in Visual Studio are:
+- Build is faster.
+- Integrated debugging.
+- Library projects can be included as dependencies of your own projects.
+
+The disadvantages of building directly in Visual Studio are:
+- Not all parts of the SRILM build process can be completed inside Visual Studio.
+- More cumbersome to build from command line.
+
+The build procedure has been tested with the Visual Studio 2005
+Professional.
+
+0) Open SRILM/visual_studio/vs2005/srilm.sln in Visual Studio 2005.
+
+1) Select either the "Release" or "Debug" targets.
+
+2) Build the solution.  The libraries will be built into
+   SRILM/lib/Release or SRILM/lib/Debug, and the executables will be built
+   into SRILM/bin/Release or SRILM/bin/Debug.
+
+This procedure will NOT produce the central include directory, release
+the SRILM scripts, or build the test environment.
+
+To produce the central include directory or release the SRILM scripts,
+you will have install the cygwin environment and use the included
+Makefiles as described in (3):
+
+(3) Follow steps (0) and (1) from README.windows-msvc:
+
+(3a) Install the cygwin environment, as described in README.windows-cygwin.
+   Cygwin tools are needed to run the build process and generate program
+   dependencies.
+
+(3b) SRILM can be set to the cygwin path of the SRILM root directory
+   (e.g., /home/username/srilm, /cygdrive/c/SRILM)
+
+(3c) To produce the central include directory and release the SRILM
+   scripts, do either:
+
+   % make MACHINE_TYPE=msvc dirs
+   % make MACHINE_TYPE=msvc init
+   % make MACHINE_TYPE=msvc release-headers
+   % make MACHINE_TYPE=msvc release-scripts
+   % cd utils/src; make MACHINE_TYPE=msvc release
+
+or 
+
+   % make MACHINE_TYPE=msvc msvc
+
+If you want to run the SRILM tests, you will have to follow the full
+instructions in README.windows-msvc.
+
+If you want to compile using a different version of MSVC, I suggest
+creating a new subdirectory of SRILM/visual_studio, e.g.,
+"SRILM/visual_studio/vs2010", and copying the contents of the "vs2005"
+directory into your new directory.  Then use MSVC to update your
+projects.  If you need to downgrade to MSVC 2003, it is possible to do
+this by manually editing the project files.
+
+I would like to thank Keith Vertanen (http://www.keithv.com) for
+contributing the original versions of these Microsoft Visual Studio
+projects to SRILM.
+
+Victor Abrash
+victor@speech.sri.com
--- a/language_model/srilm-1.7.3/lm/test/doc/README.x86
+++ b/language_model/srilm-1.7.3/lm/test/doc/README.x86
@@ -0,0 +1,21 @@
+
+Note for Intel x86 platforms:
+
+	The function KneserNey::lowerOrderWeight() seems to trigger a compiler
+	bug in gcc 2.95.3 with optimization, on the i686-pc-linux-gnu,
+	i386-pc-solaris2, and i686-pc-cygwin platforms (and therefore probably
+	on all Intel targets).    The problem manifests itself by the 
+	"ngram-count-kn-int" test in the test/ directory not terminating.
+
+	To work around this problem, compile lm/src/Discount.cc without global
+	optimization:
+
+		cd $SRILM/lm/src
+		rm ../obj/$MACHINE_TYPE/Discount.o
+		gnumake OPTIMIZE_FLAGS=-O1 ngram-count
+		gnumake release
+
+	As of Feb 2002, Cygwin ships with gcc 2.95.3 and therefore suffers from
+	this bug.  gcc 2.95.2 or lower and gcc 3.x versions of the compiler
+	don't seem to be affected, though.
+
--- a/language_model/srilm-1.7.3/lm/test/doc/TODO
+++ b/language_model/srilm-1.7.3/lm/test/doc/TODO
@@ -0,0 +1,9 @@
+
+compute local entropies, stddev in perplexity output
+
+implement TaggedNgram backoff-weight computation
+
+fix some cases of back-to-back DFs
+	FP-REP2
+
+general mechanism to skip words in ngram contexts, handle non-event tags.
--- a/language_model/srilm-1.7.3/lm/test/doc/asru2011-srilm.pdf
+++ b/language_model/srilm-1.7.3/lm/test/doc/asru2011-srilm.pdf
--- a/language_model/srilm-1.7.3/lm/test/doc/c++porting-notes
+++ b/language_model/srilm-1.7.3/lm/test/doc/c++porting-notes
@@ -0,0 +1,47 @@
+
+C++ porting notes
+-----------------
+
+I originally wrote this code using gcc/g++ as the compiler.
+
+Below is a list of changes I had to make to accomodate SGI, Sun and Intel C++
+compilers.
+
+o Explicitly instantiate templates only in g++ (#ifdef __GNUG__).
+
+o Avoid global static data referenced by template code. These symbols
+  become undefined in the instantiated template functions.
+  I use local static data, or global extern data instead (the latter
+  #ifndef __GNUG__).
+
+o Avoid non-static inline functions in templates (.cc files).  They
+  don't get properly instantiated by Sun CC and result in undefined
+  symbols.
+
+o Use Sun CC -xar option to build libraries. This includes needed
+  template instances in the library (see the new $(ARCHIVE) macro to
+  refer to the appropriate archive command).
+
+o To work around an internal compiler error in gcc 2.7.2, I had to
+  add empty destructors in a few classes. Howver, I no longer use gcc 2.7.2
+  in testing, so current version are no longer guaranteed to work with it.
+
+o Intel C++ doesn't support arrays of non-POD objects on the stack.  I had
+  to replace those with matching calls to new/delete.
+
+Compilers that work
+-------------------
+
+(as of last checking, meaning current versions may not work out of the box)
+
+ gcc/g++ 2.8.1, 2.95.3, 3.2 and 3.3
+ Sun SC4.0
+ SGI NCC
+ Intel C++ 7.1
+
+Compilers that don't work on this code
+--------------------------------------
+
+- Sun SC3.0.1
+- CenterLine (Sun version, as of 6/96)
+
--- a/language_model/srilm-1.7.3/lm/test/doc/lm-intro
+++ b/language_model/srilm-1.7.3/lm/test/doc/lm-intro
@@ -0,0 +1,184 @@
+
+SRI Object-oriented Language Modeling Libraries and Tools
+
+BUILD AND INSTALL
+
+See the INSTALL file in the top-level directory.
+
+FILES
+
+All mixed-case .cc and .h files contain C++ code for the liboolm
+library. 
+All lower-case .cc files in this directory correspond to executable commands.
+Each program is option driven. The -help option prints a list of 
+available options and their meaning.
+
+N-GRAM MODELS
+
+Here we only cover arbitary-order n-grams (no classes yet, sorry).
+
+	ngram-count		manipulates n-gram counts and
+				estimates backoff models
+	ngram-merge		merges count files (only needed for large
+				corpora/vocabularies)
+	ngram			computes probabilities given a backoff model
+				(also does mixing of backoff models)
+
+Below are some typical command lines.
+
+NGRAM COUNT MANIPULATION
+
+       ngram-count -order 4 -text corpus -write corpus.ngrams
+
+   Counts all n-grams up to length 4 in corpus and writes them to 
+   corpus.ngrams. The default n-gram order (if not specified) is 3.
+
+       ngram-count -vocab 30k.vocab -order 4 -text corpus -write corpus.ngrams
+
+   Same, but restricts the vocabulary to what is listed in 30k.vocab
+   (one word per line).  Other words are replaced with the <unk> token.
+
+       ngram-count -read corpus.ngrams -vocab 20k.vocab -write-order 3 \
+			       -write corpus.20k.3grams
+
+    Reads the counts from corpus.ngrams, maps them to the indicated
+    vocabulary, and writes only the trigrams back to a file.
+
+	ngram-count -text corpus -write1 corpus.1grams -write2 corpus.2grams \
+				       -write3 corpus.3grams
+
+    writes unigrams, bigrams, and trigrams to separate files, in one
+    pass.  Usually there is no reason to keep these separate, which is
+    why by default ngrams of all lengths are written out together.
+    The -recompute flag will regenerate the lower-order counts from the
+    highest order one by summation by prefixes.
+
+    The -read and -text options are additive, and can be used to merge
+    new counts with old.  Furthermore, repeated n-grams read with -read
+    are also additive.  Thus,
+
+	cat counts1 counts2 ... | ngram-count -read -  
+
+    will merge the counts from counts1, counts2, ...
+
+    All file reading and writing uses the zio routines, so argument "-"
+    stands for stdin/stdout, and .Z and .gz files are handled correctly.
+
+    For very large count files (due to large corpus/vocabulary) this 
+    method of merging counts in-memory is not suitable.  Alternatively,
+    counts can sorted and then merged.
+    First, generate counts for portions of the training corpus and 
+    save them with -sort:
+
+	ngram-count -text part1.text -sort -write part1.ngrams.Z
+	ngram-count -text part2.text -sort -write part2.ngrams.Z
+
+    Then combine these with
+
+	ngram-merge part?.ngrams.Z > all.ngrams.Z
+
+    (Although ngram-merge can deal with any number of files >= 2 ,
+    it is most efficient to combined two parts at a time, then
+    from the resulting new count files, again two at a time, using
+    a binary tree merging scheme.)
+
+BACKOFF MODEL ESTIMATION
+
+        ngram-count -order 2 -read corpus.counts -lm corpus.bo
+
+    generates a bigram backoff model in ARPA (aka Doug Paul) format 
+    from corpus.counts and writes is to corpus.bo.
+
+    If counts fit into memory (and hence there is no reason for the
+    merging schemes described above) it is more convenient and faster
+    to go directly from training text to model:
+
+	ngram-count -text corpus -lm corpus.bo
+      
+    The built-in discounting method used in building backoff models
+    is Good Turing.  The lower exclusion cutoffs can be set with
+    options (-gt1min ... -gt6min), the upper discounting cutoffs are
+    selected with -gt1max ... -gt6max.  Reasonable defaults are
+    provided that can be displayed as part of the -help output.
+
+    When using limited vocabularies it is recommended to compute the
+    discount coeffiecients on the unlimited vocabulary (at least for
+    the unigrams) and then apply them to the limited vocabulary
+    (otherwise the vocabulary truncation would produce badly skewed
+    counts frequencies at the low end that would break the GT algorithm.)
+    For this reason, discounting parameters can be saved to files and
+    read back in.
+    For example
+
+	ngram-count -text corpus -gt1 gt1.params -gt2 gt2.params \
+					-gt3 gt3.params
+
+    saves the discounting parameters for unigrams, bigrams and trigams
+    in the files as indicated.  These are short files that can be
+    edited, e.g., to adjust the lower and upper discounting cutoffs.
+    Then the limited vocabulary backoff model is estimated using these
+    saved parameters
+
+	ngram-count -text corpus -vocab 20k.vocab \
+		-gt1 gt1.params -gt2 gt2.params gt3.params -lm corpus.20k.bo
+
+MODEL EVALUATION
+
+    The ngram program uses a backoff model to compute probabilities and
+    perplexity on test data.
+
+	ngram -lm some.bo -ppl test.corpus 
+
+    computes the perplexity on test.corpus according to model some.bo.
+    The flag -debug controls the amount of information output:
+
+		-debug 0	only overall statistics
+		-debug 1	statistics for each test sentence
+		-debug 2	probabilities for each word
+		-debug 3	verify that word probabilities over the
+				entire vocabulary sum to 1 for each context
+
+    ngram also understands the -order flag to set the maximum ngram
+    order effectively used by the model.  The default is 3.
+    It has to be explicitly reset to use ngrams of higher order, even
+    if the file specified with -lm contains higher order ngrams.
+
+    The flag -skipoovs establishes compatibility with broken behavior
+    in some old software.  It should only be used with bo model files 
+    produced with the old tools.  It will
+
+     - let OOVs be counted as such even when the model has a probability
+       for <unk>
+     - skip not just the OOV but the entire n-gram context in which any
+       OOVs occur (instead of backing off on OOV contexts).
+
+OTHER MODEL OPERATIONS
+
+    ngram performs a few other operations on backoff models.
+
+	ngram -lm bo1 -mix-lm bo2 -lambda 0.2 -write-lm bo3
+
+    produces a new model in bo3 that is the interpolation of bo1 and bo2
+    with a weight of 0.2 (for bo1).
+
+	ngram -lm bo -renorm -write-lm bo.new
+
+    recomputes the backoff weights in the model bo (thus normalizing
+    probabilities to 1) and leaves the result in bo.new.
+
+API FOR LANGUAGE MODELS
+
+    These programs are just examples of how to use the object-oriented
+    language model library currently under construction.  To use the API
+    one would have to read the various .h files and how the interfaces
+    are used in the example progams.  No comprehensive documentation is
+    available as yet.  Sorry.
+
+AVAILABILITY
+
+    This code is Copyright SRI International, but is available free of
+    charge for non-profit use.  See the License file in the top-level
+    direcory for the terms of use.
+
+Andreas Stolcke
+$Date: 1999/07/31 18:48:33 $
--- a/language_model/srilm-1.7.3/lm/test/doc/malloc-notes
+++ b/language_model/srilm-1.7.3/lm/test/doc/malloc-notes
@@ -0,0 +1,12 @@
+
+size of ngram reading Switchboard trigram
+
+SunOS5.5.1 libc malloc		12M
+	   -lmalloc		12M
+
+IRIX5.1 libc malloc		12M
+	-lmalloc MX_FAST = 24	11M
+		 MX_FAST = 10   12M
+		 MX_FAST = 32   11M
+	
+
--- a/language_model/srilm-1.7.3/lm/test/doc/overview
+++ b/language_model/srilm-1.7.3/lm/test/doc/overview
@@ -0,0 +1,115 @@
+
+NEW SRI-LM LIBRARY AND TOOLS -- OVERVIEW 
+
+Design Goals
+
+- coverage of state-of-the-art LM methods
+- extensible vehicle for LM research 
+- code reusability
+- tool versatility
+- speed 
+
+Implementation language: C++ (GNU compiler)
+
+
+
+LM CLASS HIERARCHY
+
+   LM
+     Ngram			-- arbitrary-order N-gram backoff models
+	  DFNgram		-- N-grams including disfluency model
+	  VarNgram		-- variable-order N-grams
+          TaggedNgram		-- word/tag N-grams
+     CacheLM			-- unigram from recent history
+     DynamicLM			-- changes as a function of external info
+     BayesMix			-- mixture of LMs with contextual adaptation
+		
+
+
+
+OTHER CLASSES
+
+   Vocab			-- word string/index mapping
+      TaggedVocab		-- same for word/tag pairs
+
+   LMStats			-- statistics for LM estimation
+      NgramStats		-- N-gram counts
+          TaggedNgramStats	-- word/tag N-gram counts
+
+   Discount			-- backoff probality discounting
+      GoodTuring		-- standard
+      ConstDiscount		-- Ney's method
+      NaturalDiscount		-- Ristad's method
+
+
+
+HELPER LIBRARIES
+
+   libdstruct			-- template data structures
+
+      Array			-- self-extending arrays
+
+      Map
+         SArray			-- sorted arrays
+         LHash			-- linear hash tables
+
+      Trie			-- index trees (based on a Map type)
+
+      MemStats			-- memory usage tracking
+         
+
+   libmisc			-- convenience functions:
+					option parsing,
+					compressed file i/o,
+					object debugging
+ 
+
+
+TOOLS
+
+   ngram-count			-- N-gram counting and model estimation
+   ngram-merge			-- N-gram count merging
+   ngram			-- N-gram model scoring, perplexity,
+				   sentence generation, mixing and 
+				   interpolation
+
+
+
+LM INTERFACE
+
+LogP wordProb(VocabIndex word, const VocabIndex *context)
+LogP wordProb(VocabString word, const VocabString *context)
+
+LogP sentenceProb(const VocabIndex *sentence, TextStats &stats)
+LogP sentenceProb(const VocabString *sentence, TextStats &stats)
+
+unsigned pplFile(File &file, TextStats &stats, const char *escapeString = 0)
+setState(const char *state); 
+
+wordProbSum(const VocabIndex *context)
+
+VocabIndex generateWord(const VocabIndex *context)
+VocabIndex *generateSentence(unsigned maxWords,  VocabIndex *sentence)
+VocabString *generateSentence(unsigned maxWords, VocabString *sentence)
+
+Boolean isNonWord(VocabIndex word)
+Boolean read(File &file);
+void write(File &file);
+
+
+
+
+EXTENSIBILITY/REUSABILITY
+
+
+
+
+
+
+
+THINGS TO DO
+
+- Node array interface
+- General interpolated LMs
+- LM "shell" for interactive model manipulation and use (Tcl based)
+
--- a/language_model/srilm-1.7.3/lm/test/doc/time-space-tradeoff
+++ b/language_model/srilm-1.7.3/lm/test/doc/time-space-tradeoff
@@ -0,0 +1,22 @@
+
+Standard, speed-optimized libraries (using hash tables)
+
+bin/i386-solaris/ngram -debug 1 -memuse -lm $EVAL2000/data/lms/devel2000/swbd+ch_en+h4eval2000-m-pruned.3bo.gz -write-vocab /dev/null
+reading 34610 1-grams
+reading 4826134 2-grams
+reading 9733194 3-grams
+total memory 276788424 (263.966M), used 170254952 (162.368M), wasted 106533472 (
+101.598M)
+Time elapsed 67.06 user 70.26 system 5.90 utilization 113.57%
+
+
+"compact", space-optimized libraries (using sorted arrays)
+
+bin/i386-solaris_c/ngram -debug 1 -memuse -lm $EVAL2000/data/lms/devel2000/swbd+ch_en+h4eval2000-m-pruned.3bo.gz -write-vocab /dev/null
+reading 34610 1-grams
+reading 4826134 2-grams
+reading 9733194 3-grams
+total memory 175131744 (167.019M), used 170254952 (162.368M), wasted 4876792 (4.
+65087M)
+Time elapsed 75.17 user 81.84 system 4.47 utilization 114.83%
+