switch from .txt to .csv
This commit is contained in:
@@ -39,12 +39,12 @@ python language_model/language-model-standalone.py --lm_path language_model/pret
|
||||
If the language model successfully starts and connects to Redis, you should see a message saying "Successfully connected to the redis server" in the Terminal.
|
||||
|
||||
### Evaluate
|
||||
Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .txt file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.txt`.
|
||||
Finally, use the `b2txt25` conda environment to run the `evaluate_model.py` script to load the pretrained baseline RNN, use it for inference on the heldout val or test sets to get phoneme logits, pass them through the language model via redis to get word predictions, and then save the predicted sentences to a .csv file in the format required for competition submission. An example output file for the val split can be found at `rnn_baseline_submission_file_valsplit.csv`.
|
||||
```bash
|
||||
conda activate b2txt25
|
||||
python evaluate_model.py --model_path ../data/t15_pretrained_rnn_baseline --data_dir ../data/hdf5_data_final --eval_type test --gpu_number 1
|
||||
```
|
||||
If the script runs successfully, it will save the predicted sentences to a text file named `baseline_rnn_{eval_type}_predicted_sentences_YYYYMMDD_HHMMSS.txt` in the pretrained model's directory (`/data/t15_pretrained_rnn_baseline`). The `eval_type` can be set to either `val` or `test`, depending on which dataset you want to evaluate.
|
||||
If the script runs successfully, it will save the predicted sentences to a text file named `baseline_rnn_{eval_type}_predicted_sentences_YYYYMMDD_HHMMSS.csv` in the pretrained model's directory (`/data/t15_pretrained_rnn_baseline`). The `eval_type` can be set to either `val` or `test`, depending on which dataset you want to evaluate.
|
||||
|
||||
### Shutdown redis
|
||||
When you're done, you can shutdown the redis server from any terminal using `redis-cli shutdown`.
|
||||
|
@@ -1,7 +1,7 @@
|
||||
import os
|
||||
import sys
|
||||
import torch
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import redis
|
||||
from omegaconf import OmegaConf
|
||||
import time
|
||||
@@ -262,13 +262,8 @@ if eval_type == 'val':
|
||||
print(f'Aggregate Word Error Rate (WER): {100 * total_edit_distance / total_true_length:.2f}%')
|
||||
|
||||
|
||||
# write predicted sentences to a text file. put a timestamp in the filename (YYYYMMDD_HHMMSS)
|
||||
output_file = os.path.join(model_path, f'baseline_rnn_{eval_type}_predicted_sentences_{time.strftime("%Y%m%d_%H%M%S")}.txt')
|
||||
with open(output_file, 'w') as f:
|
||||
for i in range(len(lm_results['pred_sentence'])):
|
||||
if i < len(lm_results['pred_sentence']) - 1:
|
||||
# write sentence + newline
|
||||
f.write(f"{remove_punctuation(lm_results['pred_sentence'][i])}\n")
|
||||
else:
|
||||
# don't add a newline at the end of the last sentence
|
||||
f.write(f"{remove_punctuation(lm_results['pred_sentence'][i])}")
|
||||
# write predicted sentences to a csv file. put a timestamp in the filename (YYYYMMDD_HHMMSS)
|
||||
output_file = os.path.join(model_path, f'baseline_rnn_{eval_type}_predicted_sentences_{time.strftime("%Y%m%d_%H%M%S")}.csv')
|
||||
ids = [i for i in range(len(lm_results['pred_sentence']))]
|
||||
df_out = pd.DataFrame({'id': ids, 'text': lm_results['pred_sentence']})
|
||||
df_out.to_csv(output_file, index=False)
|
1427
model_training/rnn_baseline_submission_file_valsplit.csv
Normal file
1427
model_training/rnn_baseline_submission_file_valsplit.csv
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user