From 2e7389030bb542439428590e7077ce0a6475d174 Mon Sep 17 00:00:00 2001 From: nckcard Date: Wed, 2 Jul 2025 15:24:41 -0700 Subject: [PATCH] additional small text changes --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4ee305d..8a357f0 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ The data used in this repository consists of various datasets for recreating fig - `t15_copyTask_neuralData.zip`: This dataset contains the neural data for the Copy Task. - There are more than 11,300 sentences from 45 sessions spanning 20 months. Each trial of data includes: - The session date, block number, and trial number - - 512 neural features (2 features [-4.5 RMS threshold crossings and spike band power] per electrode, 256 electrodes), binned at 20 ms resolution. The data were recorded from the speech motor cortex via four high-density microelectrode arrays (64 electrodes each). The 512 features are ordered as follows: + - 512 neural features (2 features [-4.5 RMS threshold crossings and spike band power] per electrode, 256 electrodes), binned at 20 ms resolution. The data were recorded from the speech motor cortex via four high-density microelectrode arrays (64 electrodes each). The 512 features are ordered as follows in all data files: - 0-64: ventral 6v threshold crossings - 65-128: area 4 threshold crossings - 129-192: 55b threshold crossings @@ -45,6 +45,7 @@ The data used in this repository consists of various datasets for recreating fig - The ground truth phoneme sequence label - The data is split into training, validation, and test sets. The test set does not include ground truth sentence or phoneme labels. - Data for each session/split is stored in `.hdf5` files. An example of how to load this data using the Python `h5py` library is provided in the `model_training/evaluate_model_helpers.py` file in the `load_h5py_file()` function. + - Each block of data contains sentences drawn from a range of corpuses (Switchboard, OpenWebText2, a 50-word corpus, a custom frequent-word corpus, and a corpus of random word sequences). Furthermore, the majority of the data is during attempted vocalized speaking, but some of it is during attempted silent speaking. - `t15_pretrained_rnn_baseline.zip`: This dataset contains the pretrained RNN baseline model checkpoint and args. An example of how to load this model and use it for inference is provided in the `model_training/evaluate_model.py` file. Please download these datasets from [Dryad](https://datadryad.org/stash/dataset/doi:10.5061/dryad.dncjsxm85) and place them in the `data` directory. Be sure to unzip both datasets before running the code.