Merge pull request #2 from Neuroprosthetics-Lab/tscizzlebg-patch-1

Update README.md
This commit is contained in:
Nick Card
2025-07-02 21:46:23 -07:00
committed by GitHub
3 changed files with 15 additions and 3 deletions

View File

@@ -74,10 +74,14 @@ To create a conda environment with the necessary dependencies, run the following
./setup.sh
```
Verify it worked by activating the conda environment with the command `conda activate b2txt25`.
## Python environment setup for ngram language model and OPT rescoring
We use an ngram language model plus rescoring via the [Facebook OPT 6.7b](https://huggingface.co/facebook/opt-6.7b) LLM. A pretrained 1gram language model is included in this repository at [`language_model/pretrained_language_models/openwebtext_1gram_lm_sil`](language_model/pretrained_language_models/openwebtext_1gram_lm_sil). Pretrained 3gram and 5gram language models are available for download [here](https://datadryad.org/dataset/doi:10.5061/dryad.x69p8czpq) (`languageModel.tar.gz` and `languageModel_5gram.tar.gz`). Note that the 3gram model requires ~60GB of RAM, and the 5gram model requires ~300GB of RAM. Furthermore, OPT 6.7b requires a GPU with at least ~12.4 GB of VRAM to load for inference.
Our Kaldi-based ngram implementation requires a different version of torch than our model training pipeline, so running the ngram language models requires an additional seperate python conda environment. To create this conda environment, run the following command from the root directory of this repository. For more detailed instructions, see the README.md in the [`language_model`](language_model) subdirectory.
```bash
./setup_lm.sh
```
```
Verify it worked by activating the conda environment with the command `conda activate b2txt25_lm`.

View File

@@ -34,4 +34,8 @@ pip install \
transformers==4.53.0 \
tokenizers==0.21.2 \
accelerate==1.8.1 \
bitsandbytes==0.46.0
bitsandbytes==0.46.0
echo
echo "Setup complete! Verify it worked by activating the conda environment with the command 'conda activate b2txt25'."
echo

View File

@@ -53,4 +53,8 @@ cd language_model/runtime/server/x86
python setup.py install
# cd back to the root directory
cd ../../../..
cd ../../../..
echo
echo "Setup complete! Verify it worked by activating the conda environment with the command 'conda activate b2txt25_lm'."
echo