switch from .txt to .csv

This commit is contained in:
nckcard
2025-07-06 21:52:59 -07:00
parent 4ee016b032
commit d83bcc0976
5 changed files with 1436 additions and 1439 deletions

View File

@@ -39,12 +39,12 @@ python language_model/language-model-standalone.py --lm_path language_model/pret
If the language model successfully starts and connects to Redis, you should see a message saying "Successfully connected to the redis server" in the Terminal.
### Evaluate
Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .txt file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.txt`.
Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .csv file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.csv`.
```bash
conda activate b2txt25
python evaluate_model.py --model_path ../data/t15_pretrained_rnn_baseline --data_dir ../data/hdf5_data_final --eval_type test --gpu_number 1
```
If the script runs successfully, it will save the predicted sentences to a text file named `baseline_rnn_{eval_type}_predicted_sentences_YYYYMMDD_HHMMSS.txt` in the pretrained model's directory (`/data/t15_pretrained_rnn_baseline`). The `eval_type` can be set to either `val` or `test`, depending on which dataset you want to evaluate.
If the script runs successfully, it will save the predicted sentences to a text file named `baseline_rnn_{eval_type}_predicted_sentences_YYYYMMDD_HHMMSS.csv` in the pretrained model's directory (`/data/t15_pretrained_rnn_baseline`). The `eval_type` can be set to either `val` or `test`, depending on which dataset you want to evaluate.
### Shutdown redis
When you're done, you can shutdown the redis server from any terminal using `redis-cli shutdown`.

View File

@@ -1,7 +1,7 @@
import os
import sys
import torch
import numpy as np
import pandas as pd
import redis
from omegaconf import OmegaConf
import time
@@ -262,13 +262,8 @@ if eval_type == 'val':
print(f'Aggregate Word Error Rate (WER): {100 * total_edit_distance / total_true_length:.2f}%')
# write predicted sentences to a text file. put a timestamp in the filename (YYYYMMDD_HHMMSS)
output_file = os.path.join(model_path, f'baseline_rnn_{eval_type}_predicted_sentences_{time.strftime("%Y%m%d_%H%M%S")}.txt')
with open(output_file, 'w') as f:
for i in range(len(lm_results['pred_sentence'])):
if i < len(lm_results['pred_sentence']) - 1:
# write sentence + newline
f.write(f"{remove_punctuation(lm_results['pred_sentence'][i])}\n")
else:
# don't add a newline at the end of the last sentence
f.write(f"{remove_punctuation(lm_results['pred_sentence'][i])}")
# write predicted sentences to a csv file. put a timestamp in the filename (YYYYMMDD_HHMMSS)
output_file = os.path.join(model_path, f'baseline_rnn_{eval_type}_predicted_sentences_{time.strftime("%Y%m%d_%H%M%S")}.csv')
ids = [i for i in range(len(lm_results['pred_sentence']))]
df_out = pd.DataFrame({'id': ids, 'text': lm_results['pred_sentence']})
df_out.to_csv(output_file, index=False)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -21,6 +21,7 @@ pip install \
redis==5.2.1 \
jupyter==1.1.1 \
numpy==2.1.2 \
pandas==2.3.0 \
matplotlib==3.10.1 \
scipy==1.15.2 \
scikit-learn==1.6.1 \