RE: Get the good type to pass to a transcriber model - slain - Apr-05-2024

I tested with this:
predicted_text = asr_model.transcribe([Path(])
It's not that:
TypeError: Object of type PosixPath is not JSON serializable
File "/home/ild/.local/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/", line 542, in _run_script
    exec(code, module.__dict__)
File "/home/ild/", line 28, in <module>
    predicted_text = asr_model.transcribe([Path(])
File "/home/ild/.local/lib/python3.12/site-packages/torch/utils/", line 115, in decorate_context
    return func(*args, **kwargs)
File "/home/ild/miniconda3/lib/python3.12/site-packages/nemo/collections/asr/models/", line 187, in transcribe
    fp.write(json.dumps(entry) + '\n')
File "/home/ild/miniconda3/lib/python3.12/json/", line 231, in dumps
    return _default_encoder.encode(obj)
File "/home/ild/miniconda3/lib/python3.12/json/", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
File "/home/ild/miniconda3/lib/python3.12/json/", line 258, in iterencode
    return _iterencode(o, 0)
File "/home/ild/miniconda3/lib/python3.12/json/", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '

RE: Get the good type to pass to a transcriber model - slain - Apr-05-2024

We can close the topic: I won't be able to go further due to hardware constraints.

Here is my code, CUDA is getting out of memory before being able to transcript it:
## Imports ##
import torch
import streamlit as st
from pathlib import Path
from tempfile import NamedTemporaryFile
from transformers import AutoModelForCTC, Wav2Vec2ProcessorWithLM
import nemo.collections.asr as nemo_asr
import torchaudio

## Initialisation ##
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained("nvidia/stt_fr_conformer_ctc_large")

## Affichage ##
st.title("Facilitateur de compte-rendus")
col1, col2 = st.columns(2)
audio_source=st.sidebar.file_uploader(label="Choisir votre fichier", type=["wav","m4a","mp3","wma"])

## Variables ##
suffix = ""
predicted_sentence = ""

## Traitement ##
#col1.subheader("Modèle utilisé : nvidia/stt_fr_conformer_ctc_large")
if audio_source is not None:
    suffix = Path(
    col1.write("Démarrage de la transcription")
#    predicted_text = asr_model.transcribe([Path(])
    with NamedTemporaryFile(suffix=suffix) as temp_file:
        predicted_text = asr_model.transcribe([])
    col1.write("Fichier transcrit :point_right:")
    col1.sidebar.download_button(label="Télécharger la transcription", data=predicted_text, file_name="transcript.txt",mime="text/plain")
If anyone has a 6+GB GPU or a good CPU with enough RAM and a long time to spend, you can feel free to test it.