better data download & unzip instructions

2025-07-03 13:54:46 -07:00
parent 84289cb3d3
commit 6fee75015b
4 changed files with 26 additions and 8 deletions
--- a/model_training/README.md
+++ b/model_training/README.md
@@ -8,7 +8,7 @@ All model training and evaluation code was tested on a computer running Ubuntu 2
 ## Setup
 1. Install the required `b2txt25` conda environment by following the instructions in the root `README.md` file. This will set up the necessary dependencies for running the model training and evaluation code.

-2. Download the dataset from Dryad: [Dryad Dataset](https://datadryad.org/dataset/doi:10.5061/dryad.dncjsxm85). Place the downloaded data in the `data` directory. Be sure to unzip `t15_copyTask_neuralData.zip` and `t15_pretrained_rnn_baseline.zip`.
+2. Download the dataset from Dryad: [Dryad Dataset](https://datadryad.org/dataset/doi:10.5061/dryad.dncjsxm85). Place the downloaded data in the `data` directory. See the main [README.md](../README.md) file for more details on the included datasets and the proper `data` directory structure.

 ## Training
 ### Baseline RNN Model
@@ -41,7 +41,7 @@ python language_model/language-model-standalone.py --lm_path language_model/pret
 Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .txt file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.txt`.
 ```bash
 conda activate b2txt25
-python evaluate_model.py --model_path ../data/t15_pretrained_rnn_baseline --data_dir ../data/t15_copyTask_neuralData --eval_type test --gpu_number 1
+python evaluate_model.py --model_path ../data/t15_pretrained_rnn_baseline --data_dir ../data/hdf5_data_final --eval_type test --gpu_number 1
 ```

 ### Shutdown redis
--- a/model_training/evaluate_model.py
+++ b/model_training/evaluate_model.py
@@ -16,7 +16,7 @@ from evaluate_model_helpers import *
 parser = argparse.ArgumentParser(description='Evaluate a pretrained RNN model on the copy task dataset.')
 parser.add_argument('--model_path', type=str, default='../data/t15_pretrained_rnn_baseline',
                    help='Path to the pretrained model directory (relative to the current working directory).')
-parser.add_argument('--data_dir', type=str, default='../data/t15_copyTask_neuralData',
+parser.add_argument('--data_dir', type=str, default='../data/hdf5_data_final',
                    help='Path to the dataset directory (relative to the current working directory).')
 parser.add_argument('--eval_type', type=str, default='test', choices=['val', 'test'],
                    help='Evaluation type: "val" for validation set, "test" for test set. '
--- a/model_training/rnn_args.yaml
+++ b/model_training/rnn_args.yaml
@@ -81,7 +81,7 @@ dataset:
  test_percentage: 0.1 # percentage of data to use for testing
  feature_subset: null # specific features to include in the dataset

-  dataset_dir: ../data/t15_copyTask_neuralData # directory containing the dataset
+  dataset_dir: ../data/hdf5_data_final # directory containing the dataset
  bad_trials_dict: null # dictionary of bad trials to exclude from the dataset
  sessions: # list of sessions to include in the dataset
  - t15.2023.08.11