diff --git a/README.md b/README.md index 8186f5b..87dd546 100644 --- a/README.md +++ b/README.md @@ -74,10 +74,14 @@ To create a conda environment with the necessary dependencies, run the following ./setup.sh ``` +Verify it worked by activating the conda environment with the command `conda activate b2txt25`. + ## Python environment setup for ngram language model and OPT rescoring We use an ngram language model plus rescoring via the [Facebook OPT 6.7b](https://huggingface.co/facebook/opt-6.7b) LLM. A pretrained 1gram language model is included in this repository at [`language_model/pretrained_language_models/openwebtext_1gram_lm_sil`](language_model/pretrained_language_models/openwebtext_1gram_lm_sil). Pretrained 3gram and 5gram language models are available for download [here](https://datadryad.org/dataset/doi:10.5061/dryad.x69p8czpq) (`languageModel.tar.gz` and `languageModel_5gram.tar.gz`). Note that the 3gram model requires ~60GB of RAM, and the 5gram model requires ~300GB of RAM. Furthermore, OPT 6.7b requires a GPU with at least ~12.4 GB of VRAM to load for inference. Our Kaldi-based ngram implementation requires a different version of torch than our model training pipeline, so running the ngram language models requires an additional seperate python conda environment. To create this conda environment, run the following command from the root directory of this repository. For more detailed instructions, see the README.md in the [`language_model`](language_model) subdirectory. ```bash ./setup_lm.sh -``` \ No newline at end of file +``` + +Verify it worked by activating the conda environment with the command `conda activate b2txt25_lm`. diff --git a/setup.sh b/setup.sh index d01e6ea..c8b84b6 100755 --- a/setup.sh +++ b/setup.sh @@ -34,4 +34,8 @@ pip install \ transformers==4.53.0 \ tokenizers==0.21.2 \ accelerate==1.8.1 \ - bitsandbytes==0.46.0 \ No newline at end of file + bitsandbytes==0.46.0 + +echo +echo "Setup complete! Verify it worked by activating the conda environment with the command 'conda activate b2txt25'." +echo diff --git a/setup_lm.sh b/setup_lm.sh index 0bbdb10..17f109c 100755 --- a/setup_lm.sh +++ b/setup_lm.sh @@ -53,4 +53,8 @@ cd language_model/runtime/server/x86 python setup.py install # cd back to the root directory -cd ../../../.. \ No newline at end of file +cd ../../../.. + +echo +echo "Setup complete! Verify it worked by activating the conda environment with the command 'conda activate b2txt25_lm'." +echo