switch from .txt to .csv

2025-07-06 21:52:59 -07:00
parent 4ee016b032
commit d83bcc0976
5 changed files with 1436 additions and 1439 deletions
--- a/model_training/README.md
+++ b/model_training/README.md
@@ -39,12 +39,12 @@ python language_model/language-model-standalone.py --lm_path language_model/pret
 If the language model successfully starts and connects to Redis, you should see a message saying "Successfully connected to the redis server" in the Terminal.

 ### Evaluate
-Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .txt file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.txt`.
+Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .csv file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.csv`.
 ```bash
 conda activate b2txt25
 python evaluate_model.py --model_path ../data/t15_pretrained_rnn_baseline --data_dir ../data/hdf5_data_final --eval_type test --gpu_number 1
 ```
-If the script runs successfully, it will save the predicted sentences to a text file named `baseline_rnn_{eval_type}_predicted_sentences_YYYYMMDD_HHMMSS.txt` in the pretrained model's directory (`/data/t15_pretrained_rnn_baseline`). The `eval_type` can be set to either `val` or `test`, depending on which dataset you want to evaluate.
+If the script runs successfully, it will save the predicted sentences to a text file named `baseline_rnn_{eval_type}_predicted_sentences_YYYYMMDD_HHMMSS.csv` in the pretrained model's directory (`/data/t15_pretrained_rnn_baseline`). The `eval_type` can be set to either `val` or `test`, depending on which dataset you want to evaluate.

 ### Shutdown redis
 When you're done, you can shutdown the redis server from any terminal using `redis-cli shutdown`.
--- a/model_training/evaluate_model.py
+++ b/model_training/evaluate_model.py
@@ -1,7 +1,7 @@
 import os
-import sys
 import torch
 import numpy as np
+import pandas as pd
 import redis
 from omegaconf import OmegaConf
 import time
@@ -262,13 +262,8 @@ if eval_type == 'val':
    print(f'Aggregate Word Error Rate (WER): {100 * total_edit_distance / total_true_length:.2f}%')


-# write predicted sentences to a text file. put a timestamp in the filename (YYYYMMDD_HHMMSS)
-output_file = os.path.join(model_path, f'baseline_rnn_{eval_type}_predicted_sentences_{time.strftime("%Y%m%d_%H%M%S")}.txt')
-with open(output_file, 'w') as f:
-    for i in range(len(lm_results['pred_sentence'])):
-        if i < len(lm_results['pred_sentence']) - 1:
-            # write sentence + newline
-            f.write(f"{remove_punctuation(lm_results['pred_sentence'][i])}\n")
-        else:
-            # don't add a newline at the end of the last sentence
-            f.write(f"{remove_punctuation(lm_results['pred_sentence'][i])}")
+# write predicted sentences to a csv file. put a timestamp in the filename (YYYYMMDD_HHMMSS)
+output_file = os.path.join(model_path, f'baseline_rnn_{eval_type}_predicted_sentences_{time.strftime("%Y%m%d_%H%M%S")}.csv')
+ids = [i for i in range(len(lm_results['pred_sentence']))]
+df_out = pd.DataFrame({'id': ids, 'text': lm_results['pred_sentence']})
+df_out.to_csv(output_file, index=False)
--- a/model_training/rnn_baseline_submission_file_valsplit.csv
+++ b/model_training/rnn_baseline_submission_file_valsplit.csv
--- a/model_training/rnn_baseline_submission_file_valsplit.txt
+++ b/model_training/rnn_baseline_submission_file_valsplit.txt
--- a/setup.sh
+++ b/setup.sh
@@ -21,6 +21,7 @@ pip install \
    redis==5.2.1 \
    jupyter==1.1.1 \
    numpy==2.1.2 \
+    pandas==2.3.0 \
    matplotlib==3.10.1 \
    scipy==1.15.2 \
    scikit-learn==1.6.1 \