better data download & unzip instructions

This commit is contained in:
nckcard
2025-07-03 13:54:46 -07:00
parent 84289cb3d3
commit 6fee75015b
4 changed files with 26 additions and 8 deletions

View File

@@ -8,7 +8,7 @@ All model training and evaluation code was tested on a computer running Ubuntu 2
## Setup
1. Install the required `b2txt25` conda environment by following the instructions in the root `README.md` file. This will set up the necessary dependencies for running the model training and evaluation code.
2. Download the dataset from Dryad: [Dryad Dataset](https://datadryad.org/dataset/doi:10.5061/dryad.dncjsxm85). Place the downloaded data in the `data` directory. Be sure to unzip `t15_copyTask_neuralData.zip` and `t15_pretrained_rnn_baseline.zip`.
2. Download the dataset from Dryad: [Dryad Dataset](https://datadryad.org/dataset/doi:10.5061/dryad.dncjsxm85). Place the downloaded data in the `data` directory. See the main [README.md](../README.md) file for more details on the included datasets and the proper `data` directory structure.
## Training
### Baseline RNN Model
@@ -41,7 +41,7 @@ python language_model/language-model-standalone.py --lm_path language_model/pret
Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .txt file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.txt`.
```bash
conda activate b2txt25
python evaluate_model.py --model_path ../data/t15_pretrained_rnn_baseline --data_dir ../data/t15_copyTask_neuralData --eval_type test --gpu_number 1
python evaluate_model.py --model_path ../data/t15_pretrained_rnn_baseline --data_dir ../data/hdf5_data_final --eval_type test --gpu_number 1
```
### Shutdown redis

View File

@@ -16,7 +16,7 @@ from evaluate_model_helpers import *
parser = argparse.ArgumentParser(description='Evaluate a pretrained RNN model on the copy task dataset.')
parser.add_argument('--model_path', type=str, default='../data/t15_pretrained_rnn_baseline',
help='Path to the pretrained model directory (relative to the current working directory).')
parser.add_argument('--data_dir', type=str, default='../data/t15_copyTask_neuralData',
parser.add_argument('--data_dir', type=str, default='../data/hdf5_data_final',
help='Path to the dataset directory (relative to the current working directory).')
parser.add_argument('--eval_type', type=str, default='test', choices=['val', 'test'],
help='Evaluation type: "val" for validation set, "test" for test set. '

View File

@@ -81,7 +81,7 @@ dataset:
test_percentage: 0.1 # percentage of data to use for testing
feature_subset: null # specific features to include in the dataset
dataset_dir: ../data/t15_copyTask_neuralData # directory containing the dataset
dataset_dir: ../data/hdf5_data_final # directory containing the dataset
bad_trials_dict: null # dictionary of bad trials to exclude from the dataset
sessions: # list of sessions to include in the dataset
- t15.2023.08.11