• Spiralvortexisalie@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    ·
    4 months ago

    Not sure if you have tried/heard of Whisper. It automatically transcribes audio, I use it for meetings/lectures that don’t come with Closed Captioning, it supports audio/video files and a few languages. I had tried a few solutions with mixed results (e.g. Google is slow, many places limit lengths/sizes), IBM is supposed to be the best free/low cost cloud model but they would never approve my accounts. In the end locally with whisper in an Anaconda/Python environment was best cheap option for me.

    • weariedfae@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      4 months ago

      Not OP but I’ve been looking for one to help me with meetings and disorganized notes. How well would you say it works? Does it only transcribe or will it help organize notes (create categories, cluster analysis, tags, action items, whatever)?

      • Spiralvortexisalie@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        4 months ago

        Only transcription, it outputs to a few formats that amount to plain text with or without time coding including srt subtitles. It transcribes really well, one bit of note is that sometimes with more technical discussions I find better results using the smaller models. My best theory is the technical words are less likely to be assumed to be an accent/variation.

      • kakes@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        ·
        4 months ago

        I haven’t used it much, but I ran a podcast through it once to test it, and I was honestly impressed by the accuracy.